Decapod Weekly Meeting Notes
2012 02 14
Hasan:
continued deformancy eval on type 4
code is more or less finished
need to merge it with older code
run it with nightly
using small data set of full book to egnerate results
60 pages
will send some images
Zeshan:
feature based dewapring
based on text lines
will be integrated into final product using SIFT
optimizing SIFT approach for documents
Martin:
continued on 3d dewarping and improving it.
results look okay for dewarping
texture mapping is difficult but progressing
Jon:
continuing work on Exporter UI implementation and help with implementing export UI
will start working on a decapod genpdf spec to help plan work better.
Justin:
implementing the export UI
found bug in Uploader
having problem with Mercurial, not allowing to add files
2012 02 07
martin:
2d dewarping algorithm implemented
algorithm can be used as a post processing function for 3d
distance to camera is neglected currently
with 3d dewarping, the page curl eliminated (see images)
working on a 3d approach hopefully for next week
the issue with 3d approach is texture mapping images to model
have a new approach but requires calibration using 6 points
these 6 points will give distance information
http://iupr-serv5.cs.uni-kl.de/scr1.png http://iupr-serv5.cs.uni-kl.de/scr2.png
Zeshan
looking into SIFT surface checking how parameters influence descriptors
Using feature mapping: match pixels between left and right image, and matching unique pixels
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
Hasan:
genpdf - working on performance evaluation for type 4
have some results. Code has yet to be integrated with genpdf.
working on genpdf to work on larger books 60+ pages to see how constructed fonts look
Jon:
exporter redesign
DPI testing this week
email list about Ocropus upgrade
Justin:
server code modifications complete with unit tests
waiting for code review
looking into cherrypy installer options
Colin:
will look into the server code.
2012 01 31
martin:
background and foreground extraction
working on 3d approach, not successful so far
determine content by looking at image data and not 3d approach.
zeshan
experimenting on dewarping
dewarping is working, but not working exactly.
hasan:
working performance measure for type 4
running into issues with large books
justin
code review of server-side code
jon to test it out
server side validation for image types
updating install scripts for cherrypy no package for 10.4, using PIP
jonathan
continuing work on exporter UI design
will email list about upgrade to Ocropus.
2012 01 24
Present: Martin, Zeshan, Justin, Jon, Hasan
Dewarping:
working on sift features and outliers
working well and should work on a wide range of books
Stereo features part is pretty much done.
Dewarping still needs to be done for stereo
Working on segmenting the book from the background into automated procedure.
Dewarping working with perfectly appplicable surfaces
not producing good output.
working on procedures to work on near perfect surfaces
perfectly applicable surfaces - page that easily be transformed back into a plane
in reality we won't have perfectly applicable surfaces.
Genpdf:
resolution option: -r option to control image resolution before embedding
available in genpdf
token based pdf performance index now available.
creating performance measure for type 4 PDF
DPI only.
Decapod UI:
server's Cherrypy upgrade complete and is essentially working to what it was in 05a
writing unit tests
continuing Decapod 0.5 design work.
2012 01 17
Present: Zeshan, Justin, Jon, Hasan
Dewarping
Stereo reconstruction has been tested on a larger dataset. It had some problems in surface fitting. The model have been generalized to be able to cope up with variety of documents.
For Structured Light, the work is in progress for the automatic extraction of region of interest and separation of pages.
For Dewarping, conformal mapping has been fully implemented and tested. It works with perfectly applicable surfaces* (not always possible to reconstruct perfectly applicable). At the moment Implementing another dewarping method for nearly applicable surfaces. It will help in determining the final model which should be incorporated in Decapod.
Upgrade Numpy 0.1.1 -> 0.9
Genpdf
completed objective performance evaluation img2pdfper.py measurement for type 3 or 4
creates a pdf file that also lists their performance index
larger number more variance between the original character and the generated token.
Generates an index number for entire image.
Server
changes for RESTful architecture
implementing new scheme for dynamic URLs
code cleanup
UI
Designs progressing on new Export UI.
Q:
Is it possible to create a test to detect handwriting? (This way Decapod can quickly decide whether or not Type 2 or higher will succeed or not).
A:
If the objective is to create segmentable hand written text, it's possible but requires work. You would create a model based on specific hand writing but it will only work for one particular hand writing style.
If the objective is to detect which type of PDF to generate, then there have been discussions on such a tool, but not in the near future.
Genpdf:
downscaling and upscaling PDF output to be investigated
colour output option to be investigated
2012 01 10
Present: Martin, Zeshan, Justin, Jon.
Dewarping
Experiments giving good results
Will put all software pieces into a software module
Some parts are still manual process, but can be automated.
tests run using data from structured light and stereo.
Decapod Webapp
going over notes from recent Nigeria trip and figure out what new features will be added to the roadmap
architecture: implementing restful architecture requires changing CherryPy. Upgrading causes the server to stop working. Working through that.
New Ubuntu
need to decide if we will move to Ubuntu 12.04 LTS
New Ocropus installer
a new installer is available. Supposed to be better.
likely will break genpdf. Will wait until Hasan is back before we decide to move to that version.
Depenedencies
need a list of dependencies.
2011 12
No weekly meetings due to holidays.
2011 11 08
Present: martin, hasan, zeshan, jon, justin.
genpdf (hasan, jon)
some PDF samples from latest version. Using 1-1-1.png and the photographed version of 1-1-1.png (2-1-1.png).
work continuing on issues already documented on genpdf (unnamed link, unnamed link)
added potrace library to code base.
3D models (martin and zeshan)
approximating real book surface based on stereo matching
smoothing out book shape and removing outliers
hope to get dewarping results soon
working on conformal mapping (See: http://en.wikipedia.org/wiki/Conformal_map) which will be part of the dewarping process
Test server (martin, hasan)
working continuing on test server
Decapod 0.5 (jon and justin)
re-prioritized features for the release to accomodate work needed for server architecture
Decapod 0.5a will be a simple application that imports images and outputs PDFs.
Decapod 0.5 will add page management.
Decapod Application Server (justin)
working on server architecture
Decapod Application Client (jon)
Working on UI implementation of Decapod 0.5a release
2011 10 25
Present: Martin, Hasan, Jon, Justin
genpdf (Hasan)
moved from autotrace to potrace
potrace produces much nicer results
experimenting with scaling images up then doing the trace. - get better results
straight lines end up looking better when scaled up then traced.
[see this PDF for the results|^gendpdf 2011-1024-28 (Decapod).pdf].
stereo (Martin)
working on stereo
no new results because of a report that was due last week
working on handling non-text regions next(i.e. images)
testing server (Martin)
still working debugging the testing machine.
removed most of the test images, but some images take more than 24 hours to generate a PDF.
structured light (Zeshan)
current RGB pattern can be occluded and in this case the algorithm can produce bad results
implementing more robust pattern detection that doesn't have this problem.
looking at a new dewarp algorithm as well.
Decapod Webapp
Import page is basically up and running, it will upload files to the server and store them on the file system named with UUIDs
Beginning to start the Image Management page, Currently working on rendering and reordering the thumbnails.
Need to think a bit about the model (the data to store the info about the pages of a book) and something to mediate the communication between the server and client on the client side
Ongoing UI styling and design
Cutting versions of Ocropus and genpdf
will need to cut tags of ocropus and genpdf soon
Jon will test genpdf a bit more and then determine a deadline to cut genpdf.
Martin to talk to Thomas about taggin ocropus versions because there has been a lot of refactoring in the past 3 months.
Administration
Propose to switch meeting dates to Tuesday same time
2011 10 18
Present: Martin, Zeshan, Hasan, Jon, Justin, Jess
stereo based reconstruction (Martin)
work is ongoing in creating good 3D models using stereo.
structured (Zeshan)
perfect 3d
working on improving orientation at capture time so it can capture at different angles
can share repo so we can test it out.
nightly test server (Martin)
memory consumption of tests crashes server
will stabilize it next week
in meantime scaled back the number of evaluations.
PDF generation (Hasan)
corrected errors mislaignment and type 3 and 4.
uploaded to bitbucket
improving quality of font reconstruction
considering to switch from autotrace to a different library
Decapod Server (Justin)
working on uploading images to the server. Progress is good.
Decapod UI (Jon and Justin)
working on design changes for the main Image Management UI
created a new Import UI
going to wire up the Import UI to the server so we can have uploading working.
Decapod Extension (Jess)
Proposing a no-cost extension to Nov 2012
financials have all been sorted out.
Notes from conversations between 2011 09 20 - 2011 10 14
A test server has been set up to test builds of genpdf against the Decapod Image Testing Suite.
Hasan, Justin, and Jonathan now coding actively on BitBucket.
Hasan's genpdf: https://bitbucket.org/hasan/genpdf/overview
Justin's bitbucket: https://bitbucket.org/jobara/
Jonathan's bitbucket: https://bitbucket.org/jhung/
Testing latest versions of Hasan's genpdf code.
identified some issues which has been addressed by Hasan's latest commit. Will need to be tested to verify.
Further testing required for Type 1 and Type 2 PDF to ensure reliability for 0.5 deliverable.
Decapod 0.5 underway!
See Decapod 0.5a Planning for the plan.
Migration of git has been scrubbed. We will continue to use Mercurial and bitbucket as a method of collaboration.
2011 09 19
Present: Martin, Zeshan, Hasan, Jonathan, Justin
Genpdf:
Current version in repository is not working.
Hasan has been working on fixes and will email Jon a copy.
Ubuntu version:
Currently 11.04 32bit is the platform to use.
However, the version may change depending on the version of scipy required for 3D modelling and dewarping (Martin and Zeshan to inform on this).
3D model:
Continued good progress on structured light dewarping.
Samples:
Structured light capture process may need a calibration step.
Stereo progressing, but results are not yet as good as structured light.
2011 08 15
Present: Martin, Zeshan, Hasan, Jonathan
Decapod test suite:
Jon is working on a test suite of images to use with Decapod.
Not all images are "good" or supported examples - the intention is to test the breadth of support for Decapod and be able to document what works and what doesn't.
http://wiki.fluidproject.org/display/fluid/Decapod+Image+Testing+Suite
3D models:
dense disparity was generated using the Fuji W3 in an uncalibrated setup, structured light experiments are yet to come.
results are online on Google+
now to get a mesh and start dewarp.
Genpdf:
Hasan testing genpdf. Found only working with Type 1 PDFs, and not type 2.
Jon suggested to try decapod-genpdfEXP.py inside the decapod-genpdf directory and use ocropus 0.4.4
Hasan is making contact w Michael Cutter to help understand work done
Dewarping
Zeshan has dense maps and working on dewarping
hope to have something by end of week
Administrative
Tom back in october 13/14/15
majority of IDRC team away due to Masters program launch
2011 08 08
Present: Martin, Zeshan, Hasan, Justin, Jonathan
3D progress:
Performing experimentation on pixel depth calculation
we have reasonable sparse 3d reconstructions using stereo images from the W3 (structured light is currently not yet in a usable state) and currently work on producing a dense mesh out of the 3d points for texture mapping & dewarping
dense reconstruction is currently not yet working good enough for using it in the pipeline, but we are in parallel trying to get some reasonable results there
Shan switched to work on dewarping of stereo images to get something deliverable for the demo (we'll investigate structured light after we have something demonstrable there)
Aim to have something deliverable for Decapod extension meeting (sometime in September 2011?)
Gen PDF testing
progressing on testing gen pdf using clean computer generated documents
results aren't particularly good for Type 2 or Type 3 PDF
To do Martin to find out from Ocropus devs what test samples would yield the best results
To do Jonathan to post findings to list and find out if we need to upgrade to Ocropus tip to get better results
Roadmap discussion
Jonathan went over the Milestones, in particular how it relates to deliverables for Nigeria.
Introducing Hasan
new to the iupr team
will be working on font grouping and pdf generation
Hasan has been added to the Decapod mailing list
2011 07 11
Present: Martin, Zeshan, Jess, Thomas
Decapod Fluid Toronto team is participating in the a11y hackathon
Solutions out there that smush books:
Roebook
Ion
like book liberator, but still uses mechanical pressure on book to flatten rather than stereo capturebook liberator
Martin:
received the stereo camera – the W3?; doing calibration; new laptop set it up
will have stereo by September.
Zeshan:
waiting for projector to come and taking images from web and building software can acquire own images. process those images – structured light using existing algorithms
Thomas:
doing work on OCRopus
get done before summer ends – eliminate C++ library since it's not needed and replace with 1 or 2 smaller python libraries. image processing code that python already covers or is easier manipulated in python.
C++ command line programs will disappear
thinks everything uses Python CLI programs anyway
3D stuff should have new functionality by Sept. and old part should still be working…
flat originals should be able to capture just fine with Stereo stuff.
if layout is other than single or double column. One strategy with other layouts could be to give the user a tool where they can label the layout.
The solution can find text lines, but can't find reading order. If the user has labeled the reading order that would help.
For Decapod we're focusing on just generating PDFs.
always will get a PDF that looks ok – we promised to generate a nice PDF
OCR and layout won't work well – didn't promise great OCR because we can't do that.
even w/o layout analysis – it's still searchable.
layout is needed for reflow – reformatting for mobile e.g.
2011 07 04
Present: Martin, Zeshan, James, Jonathan
JH testing Decapod to determine its current capabilities.
Book Liberator no longer an active project.
Book Saver product similar to Book Liberator, may be useful for left-right capture.
JH to send email to determine specifications on product.
Fuji W3 camera received:
possible to create a preset calibration profile that should more or less work for all W3 cameras.
Logitech C900 HD webcams purchased to use with structured light (1080P, 10MP) - used for capturing book details.
Projector has not arrived.
Awaiting laptops on July 17th to begin testing the W3 and webcam hardware.
Questions to be answered:
Should we update to latest gphoto (currently on gphoto 2.4.9)?
Is Ocropus 0.4.4 the version we should still be using?
2011 06 27
Focus for development for next year by TMB's team will be structured light capture.
See Decapod User Workflow with Structured Light for overview how this process may look like.
Aim to have an initial python script that "does something" in 3 months.
Will be using a Fuji W3 3D camera to see if it's viable for Decapod's current Stereo capture method.