Date: Breeze meeting held on May 1, 2009
Present:
- Michelle
- Jacob
- James
- Jess
- Jonathan
- Laurel
Resources Discussion
Which mailing lists to subscribe to?
- Ocropus - http://code.google.com/p/ocropus/
- Decapod External - http://groups.google.ca/group/decapod?hl=en
- Decapod Internal List
Decapod Working Sites
- Fluid Wiki - for Fluid project Decapod working documentation
- Decapod Google Groups Site - for more polished documents
Resources
- Jacob - occupied w bug fixing until mid May.
- James - LCA until mid next week (May 6th).
- Michelle - Occupied until later time.
- Jon - FT
Community Contacts
- Clayton Lewis- may have other resources / contacts
- Gabe @ Internet Archive
- Accessibility Services UofT
- JSTOR (Ann Arbor, John Burns)
- Developers in the Ocropus community -> learn more about their projects, and who they're developing for
To Explore at a Later Time
- Fisher Rare Book Library
- Toronto Library / Archives?
- Ontario Gov't Archives?
- Google Books?
Communities to Become Involved In
- Ocropus
- JSTOR
Tasks
Benchmarking
- Learn more about the current state of digital documents archiving and OCR (gain an understanding of the de facto standards, state of the art)
- What's the user experience like on currently available digital documents archiving solution
- e.g., what tasks/processes are automated/manual?
- What's the workflow like in these solutions?
- Since we don't have access to most of these solutions, even a read through the literature on their websites, or better yet, their user manuals, might be a way to get at this
Explore OCRopus
- Map out features and functionality, discover what's possible and what's not
- i.e., freedom and constraints. These will translate into boundaries in the design and also allow us to identify where we might need OCRopus or our own backend to implement things.
- Install at least one copy at ATRC office
Communicate
- Contact JSTOR / John Burns. Discover opportunities for research.
- Internet Archive @ ITS UofT
- Ocropus -> discover developer projects.
Research
- Research and understand the current state of OCR, document digitization, and archiving
- What has been accomplished so far, what are the limitations?
- Applications of OCR in practice.
- Where is current OCR innovation?
- Who are the users and consumers of OCR systems?
Contextual Inquiries (Future work)
- Create some user profiles (talk to potential archivers, or those knowledgeable about the ones doing archiving, etc.) --> possibly build some personas
- Understand and analyze the trunk of the archiving work process, and the task branches (e.g., cleaning up, re-scanning, etc.)
- Observe scanners in their environment (CI), interview the scanners, etc.
- Extract goals (e.g., page quotas), locate pain points, etc.
- Build use case scenarios (process, context, user environment, user goals)
- Possible CIs sites: ITS Internet Archives project, Accessibility Services / Students, JSTOR