Post Decapod 0.4 examination and notes

0.4 functionality

  • Independent left-right (or L-R) remote capture from a set of supported cameras
  • L-R page ordering and rotation correction
  • L-R image stitching and binarization (same process?)
  • thumbnail generation
  • single image delete and reorder
  • output to type 1 image PDF

Testing genpdf

Using Aug 2, 2011 tip functionality

Observations:

  • automatic deskew correction if conditions are right
    • not sure what conditions need to exist for deskew to occur
    • does not happen with images -> possibly related to detected page geometry?
  • type 2 and 3 pdf generation functional. Quality of detected text not completely accurate (this is expected?).
  • binary version stored in Book Store
  • given a set of images that are handwritten with some colour (example), Type 2 PDF comes out greyscale with no selectable text (example).
  • given a set of images that are strictly colour images produce a Type 1 PDF:
    • decapod-genpdfEXP.py produces an empty PDF.
    • decapod-genpdf.py produces a greyscale PDF .
  • content of output PDFs are fit to A4 by default.
  • For type 3 PDF, genpdf automatically reverts to type 2 if character segmentation fails.
  • For type 1 PDF, binary and paragraph map generation, and line segmentation are always performed.
    • Consequence is that images with no detectable text are subjected to long processing.
  • Some PDF metadata generated - i.e. PDF author is "DECAPOD GenPDF", Producer is "ReportLab http://www.reportlab.com"

Future Improvements & Questions

  • Colour output option: binary, greyscale, and colour.
  • Straight Image PDF generation option (no segmentation requested).
    • Option to disable page & text segmentation.
  • PDF metadata options (user can specify their own metadata).
  • Ability to disable auto-deskew (so auto-deskew can happen earlier in the pipeline and not have to attempt deskew twice).
  • Does upgrading from Ocropus 0.4.4 improve text results? What work will we need to do to support a newer version of ocropus?
  • Is there a fail-quick method of detecting text on a page?

Example Test: Type 2 PDF Test - "Computer Generated 600DPI document"

  • Running: Tip version of Decapod with Ocropus 0.4.4
  • Command run: decapod-genpdfEXP.py
  • using first 10 images of the Nuforc computer generated document.
  • Generated PDF
  • Virtual machine spec: Intel 2.8GHz quad-core, 2GB RAM
  • PDF Generation took 10s.
  • Whole process took 372s.

Example Test: Type 3 PDF Test - "Computer Generated 600DPI document"

  • Running: Tip version of Decapod with Ocropus 0.4.4
  • Command run: decapod-genpdfEXP.py
  • using first 10 images of the Nuforc computer generated document.
  • Generated PDF
  • Virtual machine spec: Intel 2.8GHz quad-core, 2GB RAM
  • PDF Generation took 25s.
  • Whole process took 1097s.

Still to do

More testing of:

  • text detection / recognitionFont generation (type 3 PDF)
  • type 2 generation
  • determining the ideal quality of input image to get best results from genpdf.
  • changing character model may improve results?

General workflow

  1. Acquire images: remote capture, or file import.
  2. Page management and QA
  3. Output files

1. Acquire images

Remote Capture

  • 3 modes of remote capture: structured light, stereo, and L-R.
  • Each mode has its own calibration scheme.
  • Each 3D capture mode has its own dewarp scheme.

Import from file system

  • Reasonable assumptions?
    • Files numbered in sequence.
  • Questions
    • What if images are different resolutions? Unpredictable segmentation results.
    • What if pages are inconsistent sizes? Unpredictable segmentation results.
    • What if images contain a mix of page spreads and single pages? Okay if resolution and page sizes are consistent.

2. Page Management

  • During page management, present thumbnails and images as it would be on output to file.

Image with text case:

  • automatically deskew image
  • automatically binarize image? display images/thumbnails in binary? or we keep it colour and save any colour processing as part of an export "preview"?

Image with no recognizable text case:

  • manual deskew correction for image

Global functions

  • automatically 3D dewarp pages if necessary
  • Crop image(s)
  • Delete image(s)
  • Reorder image(s)

3. Output Files

  • choose colour depth of output: binary or colour (original)
  • choose pdf format: image, overlaid, or scalable.
  • Any way to fail quickly at page segementation? - this way Decapod can switch to image PDF output if it doesn't detect text.