OCRmyPDF can also OCR .png and output .txt and .pdf

		OCRmyPDF can also OCR .png and output .txt and .pdf
		2 points by Cognotes 8 months ago \| hide \| past \| favorite

		To deal with bad pdfs I split each pdf to single .png pages with ghostscript, then ocrmypdf those .png files which outputs them in pdf with text layer. With the —sidecar option it also outputs a .txt file. I then concatenate all the single page pdfs and dump the .txt in a database for better searching.