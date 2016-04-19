Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Does software exist to digitize scanned books and articles?
1 point by chmaynard 16 minutes ago | hide | past | web | 1 comment | favorite
I've noticed that when I view a PDF of an old book or article, often I can't select and copy text. I assume this is because (1) text selection is disabled somehow, or (2) the document is essentially just a collection of images. Does software exist that can convert a printed page with a lot of math notation into a truly digital document? I'm looking for the same level of quality as TeX. Thanks!





Yes, PDFMiner in Python https://github.com/euske/pdfminer

Apache PDFBox in Java https://pdfbox.apache.org

Previous discussion https://news.ycombinator.com/item?id=11327493

For a list of others, see http://okfnlabs.org/blog/2016/04/19/pdf-tools-extract-text-a...

