Experiences with Pandoc
Software tools
For generating EPUB 3 from HTML
- pandoc (github, recommended version) - https://github.com/jgm/pandoc
- github installation guide - https://github.com/jgm/pandoc/wiki/Installing-the-development-version-of-pandoc
- pandoc project website including latest release - http://johnmacfarlane.net/pandoc/
Converting HTML to EPUB
We are using Pandoc to convert HTML to EPUB. The following is the command being used:
pandoc 01-velocity.html -o 01-velocity.epub -w epub3 -f html -R
You can have multiple input files separated by spaces and output to a single EPUB file.
- Dev version of pandoc has some bug fixes related to MathML and Video elements.
- pandoc on github: https://github.com/jgm/pandoc
- Follow this guide to install from github https://github.com/jgm/pandoc/wiki/Installing-the-development-version-of-pandoc
Notes:
- Self closing HTML void elements (i.e. <source> elements) should include the optional trailing "/" slash. Omitting this may cause certain epub readers (like Readium and iBooks) to improperly interpret the markup and incorrectly insert closing </source> tags.
- controls attribute for <video> element must include a value. Otherwise an error may appear in the reader system. i.e. make sure to use <video controls="controls"> and not <video controls>
Media Overlays
Pandoc cannot currently handle including media overlays into an EPUB archive. The general workaround is to use pandoc to create the EPUB file without the media overlay and then add the media overlay to the archive manually. This section describes how we did that.
- Decide what level of granularity you want the highlighting to happen at: word, sentence, paragraph, etc.
- Ensure there's an ID attribute on any HTML element you want highlighted.
- NOTE: Pandoc currently moves and removes IDs inappropriately. See below for a workaround for this.
- NOTE: Pandoc currently moves and removes IDs inappropriately. See below for a workaround for this.
- Record an audio narration of the text. We used the free tool Audacity http://audacity.sourceforge.net/
- Identify start and end timecodes for the blocks of audio corresponding to the granularity level you chose:
- In Audacity, select the wave segment for the audio in question
- Insert "label" using ?? (the first time you do this, Audacity will automatically create a Label track).
- Name label using exact ID of the associated HTML element.
Export Audacity's label file. The output will look something like this:
0.185760 9.102222 c01p0002 9.380862 11.702857 c01h02 11.702857 15.185850 c01list001item001
where each line consists of <start timecode> <end timecode> <label>.
Convert timecodes into SMIL
<par>
elements as per EPUB overlay specification using the awk program included:> awk -f convert.awk -v htmlFile=01-velocity.html -v audioFile=audio/01-velocity.mp3 01-velocity-timecodes.txt > 01-velocity.smil
Add the appropriate SMIL header and footer to the output of the awk script, as well as any desired
<seq>
elements.<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0"> <body> ... paste the output of the awk script here ... </body> </smil>
- Use pandoc to create EPUB from HTML, etc. (see Converting HTML to EPUB above).
Unzip the EPUB to access the manifest file, etc.
> unzip velocity.epub
- Edit the manifest file
content.opf
as necessary:add duration metadata to the top of the document, inside the
<metadata>
element:<meta property="media:duration">0:00:59.000</meta> <meta property="media:duration" refines="#ch001_overlay">0:00:59.000</meta>
- add
<item>
elements for the new files, ensuring to include the correct mime type:- the SMIL file
- the audio recording(s)
<item id="ch001_overlay" href="01-velocity.smil" media-type="application/smil+xml"/> <item id="ch001_overlay_mp3" href="audio/01-velocity.mp3" media-type="audio/mpeg" />
add a
media-overlay
attribute to<item>
s for the html file(s), referencing the ID of the relevant SMIL file:<item id="ch001_xhtml" href="ch001.xhtml" media-type="application/xhtml+xml" properties="mathml" media-overlay="ch001_overlay" />
Add the overlay-related resources and the edited manifest back into the resource using
zip
on the command line:> zip -X9Dr velocity