Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

is there something good for pdf conversion to epub?



Calibre [1] can do a reasonably good job on most types of PDF files, but a lot depends on the type of PDF file you want to convert. PDF is essentially a container format, and as expected, it can contain a whole lot of different types of data such as images, text, fonts, scripting, and much more. The results you'll get from Calibre (or any other conversion tool) will depend heavily on the types of data within the PDF file you want to convert, and also on what kind of output you want to generate.

[1] http://calibre-ebook.com/


Not really. The problem is that PDF is basically a destination format. Converting to PDF strips all of the semantics out of it, leaving you with plain text, fonts, and boxes. The latest versions of the official Adobe Acrobat Reader are able to convert PDF to Doc but I have no idea what the quality is like.


Every time I have used Acrobat to convert PDF to Word, the only usable parts have been the tables. The rest is generally garbage.

Fortunately, the tables were the only parts I wanted! I needed to get them from the PDF into text (csv) form. So, from Word, I copied the tables, pasted them into Excel, and saved that as csv. Easy as 1-2-3-4-5!


It's probably possible to do, but nobody's needed one badly enough to do it.


There is actually an Apache project that can extract the text from a PDF. It does a passable job, but like I said all of the formatting is gone.

http://pdfbox.apache.org/userguide/text_extraction.html


There is a very good pdf-to-html converter at [0], so it's a two-step process.

[0] https://github.com/coolwanglu/pdf2htmlEX





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: