Decapod Nigeria Trip 2 Nov - Dec 2011 Notes
Jiras and tasks have been distilled from this document and recorded in New Decapod Features to Roadmap
Installing and Configuring Decapod
Internet access
prone to outages
slow during peak hours (afternoons). ~10 KB/s?
even during off-hour times, the bandwidth is limited ~60 KB/s?
Effect on Decapod Installation
Installation can take 24 to 48 hours due to the following issues:
Packages slow to download from primary sites.
Yes/No prompts in Ocropus installation halts progress.
Final console output in installation process may read "Success" but only refers to the last package installed. No way to tell what was installed and what failed in the whole operation.
The Work
Post processing
Blemishes and artefacts removed from images. Processing can replace original, or create a new copy.
Priority: Out of scope.
Ordering
Page ordering by file system naming.
May need a good way of sorting / ordering by name, possibly by other fields (i.e. modified date?)
Rotation
2 types of rotation operations: orientation, and skew.
Orientation adjust horizontal orientation to vertical.
Skew adjusts the vertical alignment of the image by small degrees.
thoughts: slider, overlaid grid, text field / spinner, revert.
Typically, deskewing an image can cause empty space to appear on parts (i.e. after an image is rotated clockwise, empty space is introduced to the top-left and bottom-right corners).
Cropping
Sometimes cropping is done on an image that has not been deskewed. Deskewing a cropped image would traditionally introduce unwanted empty space around the edges of the image.
Quality Assurance
Manually examine digitized material for visual quality: clarity, contrast, artefacts.
Check image resolution / DPI value (i.e. check for 600 DPI)
DPI
How can DPI be calculated from a photograph or image? How do we determine "true DPI"?
"True DPI" = (horizontal pixels * vertical pixels) / (width of original in inches * height of original in inches)
Pass / fail pages, or entire work.
Metadata
Work is often identified by a unique ID (i.e. Dublin core ID), title, and author.
Metadata for all works kept in a database like dspace
Some possible metadata specific to digitization:
Name of person doing the work
When work was completed
Name of reviewer
When work was last reviewed
QA:Pass / Fail
QA: Pass / Fail remarks
Equipment used
Exporting
Exports can be to different formats, quality, and sizes.
Observed 4 main schemes: Web, local, high-quality, archive quality, and masters.
Web format would be a reduced size and quality file(s). Good for screen reading, but not necessarily for printing.
Local format would be higher resolution fit for screen reading and printing on common paper (i.e. A4 or Letter)
Archive quality would be the best resolution available for archiving purposes.
Masters format stores the original unmodified files.
Exported files should be identifiable to the work.
Filenaming to follow some sort of metadata?
i.e. using Dublin core ID: uimac139204.pdf or uimac139204_0001.tif
DPI an important setting to control: 600 DPI for archiving, 100 DPI for web.
Output formats: PDF, JPEG, PNG, TIFF.
Work Management
One person will likely work on just 1 piece at a time.
Many people can work on the same work station.
A person may take more than one session to complete a work.
Materials
Some materials are bounded compilations with multiple works. Each work may have a unique ID.
Some materials are so fragile that handling degrades the work.
Human Factors
Work can be tedious
Automate as much as possible.
Speed up processes where possible (i.e. better software / hardware, reducing quality, etc.)
Economics
Some digitization work is paid per page.
Operator has desire to increase speed and reduce tedium.
Manager has desire to increase quality, while balancing time and money.
Statistics
At UI: 200000+ pages digitized in 3 months, by 10 people (not sure how many person hours). At current rate, will take 2+ years to digitize all the material they would like.