
Examples to compare OCR services: Amazon vs. Google vs. Microsoft - wbharding
https://www.amplenote.com/blog/2019_examples_amazon_textract_rekognition_microsoft_cognitive_services_google_vision
======
danso
One of the things that Textract purports to do is also detect structured data,
e.g. if the scanned images has tables, like a spreadsheet.

I tested it out when it became available to the general public, and included
the API call and JSON response for their default and relatively simple
example:

[https://gist.github.com/dannguyen/1a71c8fb98ddd1df6abdc08bc7...](https://gist.github.com/dannguyen/1a71c8fb98ddd1df6abdc08bc746397a)

I tried it against what I consider one of the harder real-world types of
scanned tabular data, a Senate personal financial disclosure form [0], and it
didn't do great. In fact, I found that it did substantially worse than ABBYY
FineReader. Or rather, ABBYY did much better than I would've expected for any
software, even managing to read the vertically-oriented column headers
accurately [1].

[0]
[https://gist.github.com/dannguyen/1a71c8fb98ddd1df6abdc08bc7...](https://gist.github.com/dannguyen/1a71c8fb98ddd1df6abdc08bc746397a#is-
textract-our-saint-and-savior-of-pdf-to-csv-data)

[1] [https://github.com/dannguyen/abbyy-finereader-ocr-
senate#les...](https://github.com/dannguyen/abbyy-finereader-ocr-senate#less-
simple-table)

------
cdolan
I was hoping that they had taken the time to test tables, nested tables, and
irregular table rows with headers every so often.

A good example of what I’m talking about are invoices. In my line of work we
extract data from thousands of different types of invoices that are the only
place to find key data from your services provider (waste & recycling bills,
soon other parts of a building’s expenses), and normalize that information.

There is a long-tail of industries that are years away from having an API to
transact information…but that information is available on their monthly
statements in some sort of crazy table format!

In my experience Amazon Textract has been the best in terms of processing
speed, ease of use, and table extraction accuracy. However post processing is
almost always needed with any OCR implementation.

Edit: Its important to note that Microsoft and Google don’t even support table
extraction in the APIs listed in this article!

~~~
andrejk
Azure has a separate service to read forms and formatted docs.
[https://azure.microsoft.com/en-us/services/cognitive-
service...](https://azure.microsoft.com/en-us/services/cognitive-
services/form-recognizer/)

~~~
bpchaps
Do you know how good it is? I have a LOT of structured documents that I need
to OCR.

~~~
CorneliaKara
MSFT person here - give it a try! sign up, and you get a free trial that can
allow you to easily benchmark.

~~~
Havoc
Interesting. That is actually something I've been looking for and I do have a
msdn sub.

fyi that defaulted to indian rupee as default currency for me (UK based & zero
indian connections). Weird

------
fartcannon
Maybe compare to some of the free-non-hosted options?

Like where are we with OCR? Last I checked it was CTC magic. Any progress?

~~~
steventhedev
Tesseract[0] is the classic example. There's a bunch of advice for improving
your accuracy with it, like making your images larger (literally just scale it
up x2 or x4).

It would be interesting to the benchmark from the article repeated with
different scaling options (or other preprocessing, depending on platform).

[0]: [https://github.com/tesseract-
ocr/tesseract](https://github.com/tesseract-ocr/tesseract)

~~~
jordoh
Running tesseract (4.0.0 using the LSTM engine) on the same images leaves a
lot to be desired for handwriting, but does well on the (non-handwriting)
website image (the source images are linked in the "OCR Image Processing
Results" section).

~~~
ocrcustomserver
From the Tesseract FAQ:

"Can I use Tesseract for handwriting recognition?

You can, but it won’t work very well, as Tesseract is designed for printed
text. Look for projects focused on handwriting recognition."

[https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-
us...](https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-use-
tesseract-for-handwriting-recognition)

------
bduerst
Would have been nice to see % accuracy as well as or instead of just % words
matched. It seems like the author decided to forego false positives.

You can train models that are good at finding words in images but can have
terrible accuracy at perceiving the actual text of the word.

~~~
ocrcustomserver
Typically OCR accuracy is measured in two ways, CER (Character error rate) and
WER (Word error rate). If just one number is provided, it's typically CER.

"Finding words in images" is a bit ambiguous. It can mean "word spotting"
(intended for retrieval rather than transcription) but also "text
segmentation" (part of preprocessing step before OCR).

------
SethTro
They tested a single image of each type. :/

------
ysleepy
I wonder if some implementations just use nearest neighbour words to increase
accuracy in the common case of normal text. - Decreasing performance on random
strings considerably.

The tested corpus only contains relatively common words, so this aspect is not
tested.

~~~
jordoh
Have a specific image you'd be interested in seeing tested? The article only
contains a few examples that could be freely used, but images with sparse
random text (e.g. [1]) do tend to have good results across all the services.

[1] [https://www.gettyimages.com/detail/news-photo/ken-griffey-
jr...](https://www.gettyimages.com/detail/news-photo/ken-griffey-jr-of-the-
seattle-mariners-makes-a-hit-during-news-photo/91124776)

------
DanHulton
Dang, I was hoping to see a comparison to ABBYY's FineReader Online in there:
[https://finereaderonline.com](https://finereaderonline.com) Maybe in the 2020
review?

~~~
jordoh
I tried running the source images through FineReader Online, but the images
with handwriting resulted in "was not processed: the recognized document
contains errors". The website image worked, but was missing a few elements,
like the other headings on the line with "Minimalist editor".

------
gbolcer3
Nice! I did the same thing for ASR technologies a while back. Results here.
[https://drive.google.com/a/bolcer.org/file/d/1CJTHikHldMYTMv...](https://drive.google.com/a/bolcer.org/file/d/1CJTHikHldMYTMvhDsfx7dr-
IXUTrlp-8/view?usp=drivesdk) (methodology: same parts of speech analysis on
exact transcripts ran through different services).

All OCR services, BTW, have the same threading problem. They work really well
for sequential text, but as soon as you start getting into more complicated
"marketing" formats, they don't work at all.

They best use of the online OCR services I found was figuring out published
dates for news articles that typically don't show up in anything other than
images. Even w/ both Azure, AWS, and Google, I still needed a post-capture
regex to figure the stuff out.
[https://drive.google.com/a/bolcer.org/file/d/16vujemgD91Ebuu...](https://drive.google.com/a/bolcer.org/file/d/16vujemgD91EbuuI98KCXjUuIGHBGR_3e/view?usp=drivesdk)

We've got a long ways to go.

------
poxrud
Once a year when I do my taxes I go through every single bank transaction and
classify it into a proper column in a spreadsheet. The problem is that my bank
provides CSV transaction data for the past 6 months only. But it does provide
years of bank statements in PDF format.

I decided I can save a lot of time by using Amazon Textract to extract the
tables from the PDF's and convert them to CSV files.

The problem is that while Textract works really well for well defined tabular
data it does not work for tables where the rows and columns are implied with
white space, instead of lines. When I reached out to aws they confirmed this
problem and suggested that I draw the table lines into the PDF and then run
textract again on this modified pdf. This felt like a dirty hack so I did not
proceed with this suggestion.

Textract is a great tool when it works well, but unfortunately when it doesn't
there are no ways to make adjustments in order to improve the results. In the
end I managed to complete my project and get a lot better results by using the
excellent Camelot Python table extraction library.

~~~
ocrcustomserver
"The problem is that while Textract works really well for well defined tabular
data it does not work for tables where the rows and columns are implied with
white space, instead of lines."

This is what Tabula and Camelot call "Stream" and "Lattice" parsing methods.

Whitespace between cells is Stream, demarcated lines is Lattice.

------
georgewsinger
Vicarious has text recognition technology that is invariant to, i.e., letter
spacing, but they haven't released it due to fear of CAPTCHA fraud. See, i.e.,
[here]([https://i.imgur.com/lN4AzmE.png](https://i.imgur.com/lN4AzmE.png)).

I wonder if they could modify it to release a near 100% accurate OCR service?

~~~
ipsum2
Letter spacing seems like an arbitrary distinction. IF you design a neural net
that contains operators that make certain things invariant (e.g. size) then of
course it would perform better than a neural net without it.

------
tiernano
unless i am having a hard time reading the text on the site, they all are
$1500 for 1M images, but Amazon Textract, which has the second lowest average
result, is "cheapest" compared to Azure Congnitive Services, which has the
highest average result... Did i miss something? How is 4 items, all the same
price, ranked arseways?

~~~
osrec
For 1 million images, it's $1500 across all, but as you scale to 5m images,
the prices must differ.

~~~
tiernano
it doesn't mention that... thats what confused me...

~~~
osrec
Look at the table sub-headings (the blue row)

~~~
tiernano
(faceplam) missed that! thanks!

------
ndm000
I’m new to the OCR space. Given the the successes in self-driving (given no
systems are production ready, Waymo and Tesla have MVPs), what makes reading
text so hard that cloud providers struggle to have human-level accuracies?

~~~
rhizome
Same problems as with self-driving cars and speech recognition: it's very very
hard (if not impossible) for software to parse the universe of scenarios. Said
another way, it's never accurate enough to "work" when it's trained on the
outside world. Usable output requires the kind of accuracy you only get when
the process is trained in controlled environments (limited "vocabulary":
fonts, layouts, languages, words, accents, geography, etc.).

------
catchmeifyoucan
Are there any open source alternatives that are competitive in this space?

~~~
noahster11
Google is behind tesseract[0], not sure if they use this for Google Cloud
Vision OCR

[0]:
[https://opensource.google.com/projects/tesseract](https://opensource.google.com/projects/tesseract)

------
dzink
Toutanova, one of the co-authors of the BERT paper worked for Microsoft for
12+ years until shifting to Google recently.

------
m0zg
This is one of those things that badly needs an open source solution, and for
which technology exists to solve it really well, but nobody wants to do it
because it's a ton of really boring, high-maintenance work.

------
citizenpaul
I did this a while back with speech to text services. Microsoft's offerings by
far beat out the others in terms of accuracy and performance.

------
hn23
Malicious gossip has it that you have to perform these tests with
Google/Amazon when the Mechanical Turk is not sleeping.

------
georgewsinger
For what it's worth the Wolfram text recognition function does a poor job
parsing the whiteboard photo:
[http://www.wolframcloud.com/obj/user-900a994f-78ab-4931-b18e...](http://www.wolframcloud.com/obj/user-900a994f-78ab-4931-b18e-353eff4d0b33/text_recognition.png)

(Not a critique of Wolfram technologies here; their services are otherwise
amazing).

~~~
aw3c2
FYI that link just gives a log-in screen.

------
jeffrogers
How does iOS Vision + Core ML compare?

------
johnmalatras
My app, Quotable, uses Azure if you were curious about perf. Works great!

------
WalterBright
It's nice to see progress on handwriting recognition.

------
LeonB
Comparison should include the Apple Newton.

~~~
mkl
That couldn't do OCR at all, right? It had no camera. It could do handwriting
recognition on stylus input (where it has full perfect stroke data, not just
pixelated images _of_ strokes), which is a very different and easier problem.

~~~
ocrcustomserver
The Apple Newton was capable of online OCR (doesn't have to do with internet
connectivity in this case).

As you mention, online OCR is when you input the strokes directly on the
device vs offline OCR where the input is an image.

Some trivia:

The first version of handwriting recognition engine of the Newton was
developed by ParaGraph International (founded by the founder of Evernote).
Another version (Print Recognizer) was later developed by Apple.

[https://web.archive.org/web/20120324055221/http://www.beanbl...](https://web.archive.org/web/20120324055221/http://www.beanblossom.in.us/larryy/Yaegeretal.AIMag.pdf)

