Decapod Nigeria Trip 2 Nov - Dec 2011 Notes

Jiras and tasks have been distilled from this document and recorded in New Decapod Features to Roadmap

Installing and Configuring Decapod

Internet access

  • prone to outages
  • slow during peak hours (afternoons). ~10 KB/s?
  • even during off-hour times, the bandwidth is limited ~60 KB/s?

Effect on Decapod Installation
Installation can take 24 to 48 hours due to the following issues:

  • Packages slow to download from primary sites.
  • Yes/No prompts in Ocropus installation halts progress.
  • Final console output in installation process may read "Success" but only refers to the last package installed. No way to tell what was installed and what failed in the whole operation.

The Work

Post processing

  • Blemishes and artefacts removed from images. Processing can replace original, or create a new copy.
  • Priority: Out of scope.


  • Page ordering by file system naming.
  • May need a good way of sorting / ordering by name, possibly by other fields (i.e. modified date?)


  • 2 types of rotation operations: orientation, and skew.
  • Orientation adjust horizontal orientation to vertical.
  • Skew adjusts the vertical alignment of the image by small degrees.
    • thoughts: slider, overlaid grid, text field / spinner, revert.
  • Typically, deskewing an image can cause empty space to appear on parts (i.e. after an image is rotated clockwise, empty space is introduced to the top-left and bottom-right corners).


  • Sometimes cropping is done on an image that has not been deskewed. Deskewing a cropped image would traditionally introduce unwanted empty space around the edges of the image.

Quality Assurance

  • Manually examine digitized material for visual quality: clarity, contrast, artefacts.
  • Check image resolution / DPI value (i.e. check for 600 DPI)


  • How can DPI be calculated from a photograph or image? How do we determine "true DPI"?
  • "True DPI" = (horizontal pixels * vertical pixels) / (width of original in inches * height of original in inches)
  • Pass / fail pages, or entire work.


  • Work is often identified by a unique ID (i.e. Dublin core ID), title, and author.
  • Metadata for all works kept in a database like dspace
  • Some possible metadata specific to digitization:
    • Name of person doing the work
    • When work was completed
    • Name of reviewer
    • When work was last reviewed
    • QA:Pass / Fail
    • QA: Pass / Fail remarks
    • Equipment used


  • Exports can be to different formats, quality, and sizes.
  • Observed 4 main schemes: Web, local, high-quality, archive quality, and masters.
  1. Web format would be a reduced size and quality file(s). Good for screen reading, but not necessarily for printing.
  2. Local format would be higher resolution fit for screen reading and printing on common paper (i.e. A4 or Letter)
  3. Archive quality would be the best resolution available for archiving purposes.
  4. Masters format stores the original unmodified files.
  • Exported files should be identifiable to the work.
    • Filenaming to follow some sort of metadata?
    • i.e. using Dublin core ID: uimac139204.pdf or uimac139204_0001.tif
  • DPI an important setting to control: 600 DPI for archiving, 100 DPI for web.
  • Output formats: PDF, JPEG, PNG, TIFF.

Work Management

  • One person will likely work on just 1 piece at a time.
  • Many people can work on the same work station.
  • A person may take more than one session to complete a work.


  • Some materials are bounded compilations with multiple works. Each work may have a unique ID.
  • Some materials are so fragile that handling degrades the work.

Human Factors

Work can be tedious

  • Automate as much as possible.
  • Speed up processes where possible (i.e. better software / hardware, reducing quality, etc.)


  • Some digitization work is paid per page.
  • Operator has desire to increase speed and reduce tedium.
  • Manager has desire to increase quality, while balancing time and money.


  • At UI: 200000+ pages digitized in 3 months, by 10 people (not sure how many person hours). At current rate, will take 2+ years to digitize all the material they would like.