Creating epub files

With nothing but free tools.

I’ve discussed the epub eBook format here before when describing how I created some epub children’s books from Project Gutenberg files for the OLPC XO. In another discussion of the format, I once saw someone complain that Adobe’s strong support of it was based on the fact that their tools are the only ones that can create epub files, but this is only true if we add a few qualifications: their tools are the only commercial ones that can create epub files for now.

**It's easy to create epub eBooks with free software if you don't need commercial tools to create some XML (mostly XHTML) files and zip them up together. I certainly don't.**

For all I know, other commercial tools can create them by now, but more importantly, you can easily create epub eBooks with free software if you don’t need commercial tools to create some XML files (mostly XHTML) and zip them up together. I certainly don’t. The epub eBooks Tutorial is a good place to start, and don’t miss the Epub Format Construction Guide, especially on the tricky zipping issues described below. The latter also points to the Info-zip free Windows zip utility and wisely skips the Tutorial’s recommendation to store certain files in an OEBPS directory in the zip file, a practice that is just a convention developed for a related format that predates epub.

The trickiest part of creating an epub file is that the one-line mimetype file, which looks like this:

application/epub+zip

This mimetype file must be first in the zip file, uncompressed, with no space or any other characters after that final “p”. The Epub Format Construction guide shows the following command as an example for creating an epub file called EpubGuide-hxa7241.epub with mimetype first,

zip -Xr9D EpubGuide-hxa7241.epub mimetype *

but I had better luck creating such a file in two stages, like this:

zip -q0X  EpubGuide-hxa7241.epub mimetype
zip -qXr9D  EpubGuide-hxa7241.epub *

Either way, remember that this is far and away the most difficult part of creating an epub file, and it’s not very difficult, especially considering that once you have a mimetype file that works (which you can pull from another epub file) you can use it in all of your epub files with no changes.

To automate a little quality checking of your epub file, an open source utility called epubcheck is now available. It checks the XML files inside the epub file for consistency of internal references, for conformance to the relevant RELAX NG schemas, and for problems with the mimetype file described above. I only recently learned that java jar files can be pulled apart like zip files, so the following two commands will list the epubcheck jar file’s contents and pull out one of the listed files (the RELAX NG schema for the Open Packaging Format):

jar -tf epubcheck-0.9.2.jar
jar -xf epubcheck-0.9.2.jar com/adobe/epubcheck/rng/opf.rng

XML files are easier to create if you use schemas to guide their creation, and RELAX NG is the best schema language, so it’s worth pulling all of the RNG Files out of the epubcheck jar file to use when creating the files you’ll put in your epub file.

Most of those files are straightforward XHTML of the content you’ll put in your eBook. I’ve created epub files from XHTML that was sitting on my hard disk and from Project Gutenberg files, although a little tagsoup cleanup of these files is worth it to automate the handling of some otherwise annoying quirks you might come across—Project Gutenberg (X)HTML isn’t always very consistent.

The other files in an epub eBook are a table of contents file, a list of the files in use (including image files), and a pointer to the file with the list of files. The tutorials above go into more detail about these, but if you pull these XML files out of any epub file and look them over, their workings are pretty self-evident.

So if you’re interested in eBooks, get an epub file or two (plenty of classics are available at feedbooks), unzip them, review the pieces as you look through the epub eBooks Tutorial and Epub Format Construction Guide, and then make a few of your own and see how they look on some of the free eBook readers out there such as Adobe Digital Editions and FBReader. If you’re a publisher wondering about how to approach the eBook market, start making epub prototypes of some of your titles. In a future posting I’ll write about ways to make use of these prototypes as you lay the groundwork for actually selling them.

2 Comments

By John Cowan on March 14, 2008 10:42 AM

How widely accepted is epub, and how much does it deviate from OEBEPS? I’m not asking out of randomness; there are Reasons.

By Bob DuCharme on March 14, 2008 1:03 PM

Hi John,

I never looked too closely at OEBPS, but as I understand it, the main difference is that OEBPS books were not zipped up into a single file. There’s more at https://www.idpf.org/forums/viewtopic.php?t=22.

epub acceptance: most people who follow the eBook market closely seem pretty confident that the major eBook readers (except maybe for Kindle, where they do things completely their own way) will be supporting epub Real Soon Now. In the talk before mine at the O’Reilly TOC conference Adobe’s Bill McCoy made it clear that Adobe considers PDF and epub to be the main electronic delivery platforms of the future. Any platform that can run Adobe Digital Editions can display epub files, and Adobe people at the conference were also saying something to the effect of “we can’t talk about new platforms that we’re porting Adobe Digital Editions to just yet, but isn’t it great that Apple has an iPhone SDK out now?”

Even while other formats are still around, epub’s status as an open, well-documented format, and especially Hachette’s attitude (“Every one of our partners (Sony, Amazon, eBooks.com, etc.) will only be receiving the .epub format from us. We will not be doing any special proprietary conversions for anyone, which includes the Kindle. It will be up to each partner to convert to whatever proprietary format can handle the .epub format…”) means that it’s gaining traction as a lingua franca common format for B2B exchange of ebooks.