There is a reason business is booming for us and has been for years—automated ebook conversions suck big time. Okay, maybe "big time" is pushing it a bit too far, after all, the automation does still save countless hours of manual labor. The problem is, until PDF to Word converters adopt a fairly good artificial intelligence algorithm, a human being will be needed to review the entire document and make corrections where a computer cannot. In this article, I will show you what you need to know when converting your PDF file to a Word document (.doc or .docx) file.
This article assumes that you already have a Word document that was created from a .pdf file. If you are not sure, read on!
Perhaps you convert PDF to Word documents on occasion and never had much of a problem. If this is the case, I can virtually guarantee you that the PDF files you are working with are PDF files made from editable document files (such as Word) with very few advanced layout features (i.e., callouts, wrapped images, etc.) and not PDF files made from scanned images. When you save a Word doc as a PDF file, there is far less of a loss in information, meaning that reverse conversion from that PDF back to the Word document will still have some issues, but issues that are not too difficult to address, and thus a relatively painless experience. But creating a PDF from scanned book is like taking a photograph of each page. The software interprets the page as an image and not text. To understand the image as text, OCR (optical character recognition) software must be run on the image to interpret the image as text. Assuming a clean scan of the pages, even the best OCR software at 99.9% accuracy will screw up 1 out of 1000 words. In a 100,000 word book, this means you will have 100 messed up words! Not very professional, and quite a nightmare.
At the time of this writing, OCR software used to convert scans into text do not contain enough AI (artificial intelligence) to have a good contextual understanding of words. Therefore, if the image looks like an "iv" to the software, it will interpret as "iv" even though in context it might be "We ivill succeed and we will prosper!" This is not a real brain-buster for humans—not even an 8-year-old one. Yet machines struggle and usually fail. Fortunately, this is an error that any decent spell checker would pick up since "ivill" is not a recognized word. But many errors are recognized words or they are in names that are ignored by the spell checker.
Another reason machines fail is because of poor quality scans/images, small text, unorthodox fonts, and generally not being able to recognize letters from its rather limited library of knowledge on how to recognize letters. This is where the human mind excels. This failure on the machine's part is the reason that form spam software works so well (often referred to as "Captcha"). It is (usually) easy for the human eye to detect the characters but virtually impossible for machines.
Now that you have your Word document that was created from a PDF here is what you need to do in addition to the standard formatting that you would otherwise do for Word document before converting it to an ebook. Let me stress that you should read every word in the document to ensure it is correct. If you were scanning hundreds of books for free public access, this level of proofing would clearly be an overkill, but if this is your book that you are selling online (i.e., people are paying money for), you owe it to your readers to ensure they are buying an error-free (or virtually error-free) book.
If the document is a real mess, we often use what we call the "nuclear" option to remove all the formatting. We call it this because it's like nuking a city and starting over from scratch. What you will have is a plain text document with all of the words and none of the formatting (you still need to fix the errors with the incorrect words). Here is the process:
PDF to Word conversions do not have to be a nightmare, even if from a scanned source. It does take time, however. If you are willing to put in the time, you can have a wonderful looking and working document ready to be converted to an ebook. If you're not willing to put in the time or deal with the many issues that can arise from a PDF to Word conversion and would rather pay someone to deal with this, well, that is why we're in business!
Over 450,000 books were self-published in 2013. No matter how outstanding your self-published book may be, it is not difficult to realize that it can get lost in the sea of books published each year. Competition for readers is tough, so your marketing has to be tough, as well. Don't let your book be one of many that remain dormant on the virtual shelves. Complete this quick form to see how we can work within your budget to market your book efficently and effectively!
When you take advantage of this limited-time special offer, you will get free Press Release Distribution for your book.
* Price does not include the press release writing service—you can write the release yourself or hire our PR specialist to do it for you. All press releases must comply with the editorial guidelines. The Press Release Distribution must redeemed within 90 days of the initial order. Your press release must include a link to your book on the eBookIt.com bookstore, and no other links to other bookstores (although you certainly can mention other bookstores). Price is $125 less if submitting a valid .epub, and $100 more if submitting only a .pdf.
Checkout our new webhosting division for authors at http://www.hostingauthors.com. HostingAuthors.com was created by an author for authors, and comprises the set of web tools needed for any author to most effectively market their books and promote themselves on the Internet.