
Ask HN: Recommended OCR? - shiny
I'm OCRing a bunch of TIFF files with tesseract, and while it works to some degree, it's nowhere near as accurate as I'd like it to be.  Perhaps I'm doing something wrong and I could tune it to my liking, but I can't find too many resources on tesseract.  Am I missing something?<p>Any other recommendations for OCRs?  Ideally it would be free, but I'm willing to pay if it's not too pricey.<p>I've been trying out the trial version of FineReader, and it seems to work pretty well, so I may go with that.<p>Any help is greatly appreciated.
======
MaxGabriel
I've had really great success with finereader. I tried out every free OCR tool
I could find and after poor results went for finereader.

Spend some time on their website so you get the right product, they have
multiple prices for the same products, too. I got the latest Finereader (after
a coupon code I found on google) for between 130-150.

(I'm mostly scanning books)

------
ig1
Finereader is what Project Gutenberg has been using for the last decade or so.

------
hebz0rl
what about gocr? its opensource see <http://jocr.sourceforge.net/>

~~~
hijimayor
tesseract is way better than gocr, and is also open source

------
usermac
Fujitsu ScanSnap

------
hijimayor
One thing that improves Tesseract's performance dramatically is giving it
grayscale tif images. Do

mogrify -type Grayscale *.tif

and run them through tesseract to see the difference. No idea why no one
mentions this in the documentation.

~~~
mgedmin
I've found that Tesseract's accuracy was much improved when I converted my
grayscale images to bitmaps with GIMP's Threshold tool.

(Those pictures were not-very-hight quality snaps of a few pages of a book
taken with a 3.2 Mpix digital camera.)

~~~
hijimayor
Tesseract really wasn't made for camera pics, but it could use some help with
the thresholding :)

