Creating an e-book: Tips on formatting and converting your document

Your company needs an e-book and the project has landed in your lap? These tips and tools can help you get the job done right.

1 2 3 4 5 Page 2
Page 2 of 5

Source formats

The creation of any e-book starts with a source document: a manuscript that you have written or that someone else has provided to you. Right there, the problems begin, since even a "clean" document can pose conversion difficulties. Your goal is to ensure that the document's formatting will be preserved intact.

Odds are most documents used as a source for an e-book will have to go through at least two conversions: first, into a format that the conversion software can use, and then into the actual e-book format -- or formats. Sometimes this can be cut down to one stage, but it's best for the time being to assume you'll need two steps to do the job completely.

Here's a rundown of the most likely formats you'll start with:

HTML

I already mentioned this in the previous section, but it bears repeating: If you're looking for a standard, HTML is more or less it. For one, it's ubiquitous; almost every text-processing program can generate or read HTML. It also supports many features e-books will use: hyperlinks, font control, section headings, images and so on.

The tricky part is if you weren't working with HTML in the first place. If you're collating posts from a blog or a wiki and assembling them into an e-book, you won't have to put up with quite as much drudgery. But if you're starting with a Microsoft Word (DOC or DOCX) or Open Document Format (OpenDocument or ODF) document, your best bet is to export it directly from the source application into HTML. (Word users should do a "Save as..." using the "Web Page, Filtered (HTML)" option, which strips out most of Word's generated cruft.)

Exporting to HTML from your source program helps preserve the most crucial formatting and typically also preserves sections and chapters: outline headers are turned into h1/h2/h3 tags, which most conversion programs correctly recognize. Some are even able to auto-generate tables of contents from those tags. That said, I've had good results using Word to generate TOCs before I send the document to the e-book program, since Word typically gives you a broader range of formatting options.

Microsoft Word (DOC or DOCX)

If you're dealing with an original manuscript, odds are it's probably going to be in Microsoft Word format. Proprietary as Word may be, almost every device on the face of the Earth can read or write Word documents. And the format has native support for most everything you could think of: formulas, chaptering, footnotes, indexes -- in other words, anything that might show up in an e-book.

That said, Word documents are best seen as a starting point for an intermediate conversion format, most likely HTML, rather than a format that can be converted directly into an e-book. In fact, most e-book conversion programs don't accept Word natively as a source document type. They may accept Word's sibling format, RTF, but that is already at least one stage of conversion away from the original and increases the chance that certain features might not make it through the conversion process. For example, RTF does support features like sections and footnotes, but the Calibre e-book creation suite, for one, didn't process them correctly when I tested it for this article.

OpenDocument (ODF)

OpenDocument, or ODF, is the format used by OpenOffice.org. (Microsoft Word also supports ODF, although it isn't the default format for Word -- it's just one of the formats it reads and writes.) Third-party OpenOffice offers extensions that let you export directly to e-pub formats; there are also a number of standalone applications, such as ODFToEPub, that will do the same. If you're already in the habit of creating your documents in ODF, your path to creating a finished e-book may be slightly shortened because of this.

PDF

Adobe's PDF format is used so consistently as an e-book format that it would be foolish not to mention it. Many programs (such as Word and OpenOffice.org) export directly to PDF, and the files can be opened and read in many applications. In fact, before dedicated e-reader devices made significant inroads into the market, most e-books were just PDF distillations of their print counterparts.

However, it's generally not a good idea to try to use PDF as a source format. Because it's designed to precisely reproduce printed pages, a PDF document needs to be taken apart and put back together if it's being used as a source format for a non-PDF e-book. As a result, PDF should only be used as a source for other e-book formats if you have no choice.

1 2 3 4 5 Page 2
Page 2 of 5
  
Shop Tech Products at Amazon