Thursday, April 10, 2014

Calibre

Anyone with a Kindle - or probably another kind of ebook reader - should have a copy of the excellent Calibre program. As its website states, Calibre is a free and open source e-book library management application developed by users of e-books for users of e-books. It has a cornucopia of features divided into the following main categories:
  • Library Management
  • E-book conversion
  • Syncing to e-book reader devices
  • Downloading news from the web and converting it into e-book form
  • Comprehensive e-book viewer
  • Content server for online access to your book collection
  • E-book editor for the major e-book formats
I use it for converting ebooks into the format required by Kindle (MOBI); when the source is an EPUB file, the conversion is painless, but when the source is a PDF file, the results are variable in quality.

The other day, I converted a PDF file to MOBI; whilst most of the resulting ebook was readable, there were several constant errors in the conversion: for example,  'fl' was always rendered as '>'. Thus I would read about someone's in>uence and their >ight from persecution. Another error had 'ff' rendered as '?', so I would read about the e?ects of doing something. At first, I put up with this, but it made reading a painful experience. I wondered whether there was a solution to my problem.

I discovered that Calibre can edit ebooks; I also discovered that the version of Calibre which I was using didn't support this function. Updating Calibre is fairly painless, so I downloaded the latest version and ran the installer.

I then discovered that only EPUB and EZW formats can be edited; I had only MOBI (presumably I discarded the original PDF after the first conversion). So first I had to convert the MOBI to EPUB before I could begin editing. Internally, EPUB seems to be HTML, which is amenable to editing.

Calibre has a 'find and replace' function but I had to use a little care, especially when replacing '?'. After a little experimentation, I discovered that it would be best to find all instances of '?a' and replace them with 'ffa', then replace '?e' with 'ffe', etc. In other words, six searches (a,e,i,o,u,y) per error.

While I was editing, I also removed some of the HTML formatting. At some stage, a stray command caused several pages to be rendered in italics, making those pages more difficult to read. It was a simple matter to remove all the italics tags from the source. Had I more patience, I also would have removed the title string which appeared on every page, as well as improving the general formatting. These are minor problems with which I can live.

After completing these tasks, I then converted the EPUB file back into MOBI, then transferred the file to the Kindle. Reading the book was now much easier.

There is another general problem with PDF files which would require much more editing: sometimes the source is presented in two columns. This gets rendered as one line from one column followed by one line from the other column; the entire passage is unreadable. I imagine that this is easily solved but requires some time to unravel the columns.

No comments: