
Ask HN: OCR Solutions - vinnyglennon
Does anyone have a recommendation for an OCR solution that can take in bank accounts and extract the data from it reliably? Ideally as SASS.<p>Tesseract OCR(backed by Google) is not accurate enough for my needs.<p>I looked into outsourcing it via Mechanical Turk and http:&#x2F;&#x2F;arcgate.com&#x2F;services&#x2F;<p>Best solution so far seems to upload the doc to Google Drive and download the extracted text; http:&#x2F;&#x2F;computers.tutsplus.com&#x2F;tutorials&#x2F;how-to-ocr-documents-for-free-in-google-drive--cms-20460
======
staticautomatic
It really depends on how structured and diverse your input images/documents
are.

I've worked with just about all the tools out there and my conclusion is:

OCR engines by themselves don't differ much in accuracy. The vast majority of
my tests involving tesseract, ABBYY's Cloud OCR API, Microsoft's Cloud OCR
third party API, etc., have all produced nearly identical results.

If you're extracting data from predictable, structured or semi-structured
input images/documents, the best approach by far is to use data extraction
software with zonal or relational OCR capabilities, like ABBYY FlexiCapture or
Nuance Omnipage. Neither offer cloud API's but they do sell SDK's if you want
to build something out yourself. They are expensive, however. I believe the
majority of, say, automatic invoice recognition systems use these or something
like them on the back-end. ABBYY is very lenient with the duration of trial
licenses. You can purchase a license to FlexiCapture's standalone product and
automate around it, which I've done successfully, but it processes everything
sequentially and can't multiprocess (you need the SDK for that). OmniPage is
much cheaper but they are way less lenient about extending the trial license.

The technically correct explanation for why is frankly above my head, but I'll
represent that adding zonal/relational OCR into the mix dramatically increases
accuracy in almost very application. The difference between it and
programatically parsing the text or hOCR output from a given engine is night
and day.

Unfortunately, the above tools are all tethered to Windows VM's. If you need
to run on Linux you could try to build something with OpenKM (Java) or OCRopus
(multiple language bindings). For my own uses, the build/buy analysis seemed
to always favor buying, even something turnkey.

I would consider a BPO solution (human workers on the back-end) to be a last
resort. When I ran a battery of tests with Mechanical Turk, I found the
response times too variable and long to work well in any situation where you
need to reliably get the data quickly.

Let me know if you want to talk further and I'll shoot you an email.

------
gerh12
For a commercial solution, Abbyy is the best. For a free solution,
[https://ocr.a9t9.com/](https://ocr.a9t9.com/) is almost as good. It uses
Microsoft OCR and can easily scan bank statements or receipts.

~~~
staticautomatic
Maybe it's just my particular use case, but I tried that Microsoft one and the
output was identical to Tesseract and ABBYY Cloud OCR.

