Decapod Design Planning (May 1, 2009)

Date: Breeze meeting held on May 1, 2009

  • Michelle
  • Jacob
  • James
  • Jess
  • Jonathan
  • Laurel

    Resources Discussion

    Which mailing lists to subscribe to?

    Decapod Working Sites

    • Fluid Wiki - for Fluid project Decapod working documentation
    • Decapod Google Groups Site - for more polished documents


    • Jacob - occupied w bug fixing until mid May.
    • James - LCA until mid next week (May 6th).
    • Michelle - Occupied until later time.
    • Jon - FT

    Community Contacts

    • Clayton Lewis- may have other resources / contacts
    • Gabe @ Internet Archive
    • Accessibility Services UofT
    • JSTOR (Ann Arbor, John Burns)
    • Developers in the Ocropus community -> learn more about their projects, and who they're developing for

    To Explore at a Later Time

    • Fisher Rare Book Library
    • Toronto Library / Archives?
    • Ontario Gov't Archives?
    • Google Books?

    Communities to Become Involved In

    • Ocropus
    • JSTOR



    • Learn more about the current state of digital documents archiving and OCR (gain an understanding of the de facto standards, state of the art)
    • What's the user experience like on currently available digital documents archiving solution
      • e.g., what tasks/processes are automated/manual?
    • What's the workflow like in these solutions?
      • Since we don't have access to most of these solutions, even a read through the literature on their websites, or better yet, their user manuals, might be a way to get at this

    Explore OCRopus

    • Map out features and functionality, discover what's possible and what's not
      • i.e., freedom and constraints. These will translate into boundaries in the design and also allow us to identify where we might need OCRopus or our own backend to implement things.
    • Install at least one copy at ATRC office


    • Contact JSTOR / John Burns. Discover opportunities for research.
    • Internet Archive @ ITS UofT
    • Ocropus -> discover developer projects.


    • Research and understand the current state of OCR, document digitization, and archiving
      • What has been accomplished so far, what are the limitations?
      • Applications of OCR in practice.
      • Where is current OCR innovation?
      • Who are the users and consumers of OCR systems?

    Contextual Inquiries (Future work)

    • Create some user profiles (talk to potential archivers, or those knowledgeable about the ones doing archiving, etc.) --> possibly build some personas
    • Understand and analyze the trunk of the archiving work process, and the task branches (e.g., cleaning up, re-scanning, etc.)
    • Observe scanners in their environment (CI), interview the scanners, etc.
    • Extract goals (e.g., page quotas), locate pain points, etc.
    • Build use case scenarios (process, context, user environment, user goals)
    • Possible CIs sites: ITS Internet Archives project, Accessibility Services / Students, JSTOR