
EasyOCR: Ready-to-use OCR with 40 languages - vortex_ape
https://github.com/JaidedAI/EasyOCR
======
dclusin
From what I can tell (without having read the research papers) it looks like
this is just an easy to use package for sparse scene text extraction. It seems
to do okay if the scene has sparse text but it falls down for dense text
detection. The results are going to be pretty bad if you try and do a task
like "extract transactions from a picture of a receipt." Here's an example of
input you might get for a production app: [https://www.clusin.com/walmart-
receipt.jpg](https://www.clusin.com/walmart-receipt.jpg)

Notice the faded text from the printer running out of ink and the slanted
text. From limited experience each of these are thorny problems and the state
of the art CV algorithms won't help you escape from having to learn how to
algorithmicly pre-process images and clean them up prior to feeding them into
a CV algorithm. You might be able to use Google's Cloud OCR but that charges
per image, although it is pretty good. Even if you use that you've graduated
to the next super difficult problem which is Natural Language Processing.

Once you have the text you need to determine if it has meaning to your
application. That's basically what NLP is about. For the receipts example, how
do you know you're looking at a receipt? What if its a receipt on top of a
pile of other receipts? How do you extract transactions from the receipt? Does
a transaction span multiple lines? How can you tell? etc etc etc.

~~~
D13Fd
I'm just happy to see some advancement in open source OCR for Python. Last
time I had a Python project that needed OCR, I found that the open-source
options were surprisingly limited, and it required some effort to achieve
consistently good results even with relatively clean inputs.

Honestly I was kind of surprised that good basic OCR isn't a totally solved
issue with an ecosystem of fully open-source solutions by now.

~~~
vortex_ape
> Honestly I was kind of surprised that good basic OCR isn't a totally solved
> issue with an ecosystem of fully open-source solutions by now.

Yes! Can anyone comment on why this is the case, since OCR is proclaimed to be
a solved problem?

I've always wondered why Google Lens works "out of the box" and shows great
accuracy on extracting text from images taken using a phone camera, but open-
source OCR software (Tesseract, Ocropy etc.) needs a lot of tweaking to
extract text from standard documents with standard fonts, even after heavily
pre-processing the images.

PS: Has Google released any paper on Google Lens?

~~~
craftinator
I've been wondering this ever since I used Lens. My hobby applications doing
OCR always fall way short of Len's magic.

~~~
vortex_ape
Yeah! And Lens is not the only closed-source OCR solution that works. I've
gotten great accuracy using ABBYY and docparser.com in the past. But one needs
to pay per page after the free trial ends :(

~~~
claudeganon
I’ve found that none of the open source stuff works well for Japanese language
documents. Most of the time, I’ve just ran them through Adobe Acrobat’s OCR
and dumped the results into a text file. There are still mistakes, but it at
least returns a passable result compared to others.

------
oefrha
Looking at the Chinese example, it’s kinda funny it managed to output
Traditional Chinese characters when the image contains Simplified Chinese; the
SC and TC versions look pretty different (园 vs 園, 东 vs 東).

~~~
kevin_thibedeau
They're rendering Unicode without any markup for language variant.

~~~
oefrha
No, these are completely different, standalone code points, not variant forms
of the same code point.

What's actually happening seems to be that the ch_tra model can recognize
simplified too and output the corresponding traditional version if the
character isn't in the traditional "alphabet"; it doesn't work so well in the
other direction.

Example recognizing a partial screenshot of
[https://chinese.stackexchange.com/a/38707](https://chinese.stackexchange.com/a/38707)
(anyone can try this on Google Colab, no hardware required; remember to turn
on GPU in Runtime -> Change runtime type):

    
    
      import easyocr
      import requests
    
      zhs_reader = easyocr.Reader(['en', 'ch_sim'])
      zht_reader = easyocr.Reader(['en', 'ch_tra'])
      image = requests.get('https://i.imgur.com/HtrpZCZ.png').content
      print('ch_sim:', ' '.join(text for _, text, _ in zhs_reader.readtext(image)))
      print('ch_tra:', ' '.join(text for _, text, _ in zht_reader.readtext(image)))
    

Results:

    
    
      ch_sim: One simplified character may mapping to multiple traditional ones: 皇后->皇后,後夭->后夭 豌鬟->头发,骏财->发财 As reversed, one traditional character may mapping to multiple simplified ones too: 乾燥->干燥, 乾隆->乾隆 嘹望->嘹望,嘹解->了解
      ch_tra: One simplified character may mapping to multiple traditional ones: 皇后->皇后,後天->后天 頭髮->頭發,發財->發財 As reversed, one traditional character may mapping to multiple simplified ones too: 乾燥->干燥, 乾隆->乾隆 瞭望->瞭望, 瞭解->了解
    

Compare to the original text:

    
    
      One simplified character may mapping to multiple traditional ones:
    
      - 皇后 -> 皇后，後天 -> 后天
      - 頭髮 -> 头发，發財 -> 发财
    
      As reversed, one traditional character may mapping to multiple simplified ones too:
    
      - 乾燥 -> 干燥，乾隆 -> 乾隆
      - 瞭望 -> 瞭望，瞭解 -> 了解
    

Of course, automatic character-to-character conversion from simplified to
traditional can be wrong due to ambiguities; excellent examples from above: 头发
=> 頭發 (should be 頭髮), 了解 => 了解 (should be 瞭解).

~~~
divingdragon
This approach seems a bit weird to me. While I appreciate them separating the
models of Traditional and Simplified Chinese, I think I might prefer them to
be combined (perhaps even including Japanese Kanji), and instead provide a way
for the user to specify which language or regional variant is expected so
characters matching the expected variant are simply given a higher score.

~~~
oefrha
Without delving into implementation details, I suspect the ch_tra model was
simply trained on a dataset including simplified images with traditional
labels.

------
porker
What are people using in mobile development (native iOS/native Android/Cross-
platform e.g. React-Native) when you want accurate extraction from a fixed
format-source?

E.g. poor-quality images of ID cards or credit cards, where the position of
data is known.

~~~
oefrha
iOS has the Vision framework, can’t say whether it’s accurate enough for your
use case.

[https://developer.apple.com/documentation/vision/recognizing...](https://developer.apple.com/documentation/vision/recognizing_text_in_images)

------
baicunko
This is something that I find really interesting. Open-source OCR is lagging
behing commercial applications and seeing someone trying ideas is always
beneficial. Kudos!!

------
barbs
Has anyone made a desktop app with a really simple UI for detecting text in
images? I'm thinking something that lives in the taskbar, lets you make a box
around the text you want to read, and then returns it as plaintext?

In my job as a support engineer I sometimes get screenshots of complex
technical configurations and end up having to type them in one character at a
time, so this would be really handy.

Looks like maybe I could just create a wrapper around EasyOCR.

~~~
nelcevest
I was looking exactly for the same thing, and found this wonderful script:

[http://askubuntu.com/a/280713/81372](http://askubuntu.com/a/280713/81372)

I put it in a custom keyboard shortcut, so I just press it, draw an onscreen
rectangle around any non-selectable text, and in a few seconds it goes to the
clipboard.

------
polote
What would be the advantage compared to something like Tesseract ?

~~~
jonatron
Tesseract isn't very accurate, especially with text in photos. It works OK for
scanned documents, but that's about it.

~~~
sireat
Tesseract can be very accurate (>99%), especially when you train it for your
particular data set.

This does involve creating your own labeled data.

I got this 99% accuracy by performing incremental training using latest
Manheim model as a base. I added about 20k lines which is not really that
much. [https://github.com/tesseract-
ocr/tesseract/wiki](https://github.com/tesseract-ocr/tesseract/wiki)

The hard part was crowd sourcing those 20k lines :)

Tesseract might not be best for photos as you said but I did not have major
problems.

Of course some documents the source is so bad that a human can't achieve 99%.

Tesseract used to be quite average before they moved onto LTSM models a few
years ago.

~~~
dclusin
Care to share resources/lessons learned for training tesseract with custom
data? I'm using it for a side project and would love to hear about your
insights.

~~~
sireat
I followed the resources here: [https://github.com/tesseract-
ocr/tessdoc/blob/master/Trainin...](https://github.com/tesseract-
ocr/tessdoc/blob/master/TrainingTesseract-4.00.md)

Also this: [https://github.com/UB-
Mannheim/tesseract/wiki](https://github.com/UB-Mannheim/tesseract/wiki)

The original data was here: [https://github.com/tesseract-
ocr/langdata_lstm](https://github.com/tesseract-ocr/langdata_lstm)

I did use another data source from Manheim but can't locate it right now.

Using vanilla Ubuntu 18.04

I looked at the example training files and made a small script to convert my
own labeled data to fit the format that tesseract requires.

I did do a bit of pre-processing adjusting contrast.

All the data munging was done on Python (Pillow for image processing, Flask
for collecting data into a simple SQLite DB before converting back to format
that Tesseract requires).

Python was not necessary just something that felt most comfortable to me. I am
sure someone could do it using bash scripts or node.js or anything else.

EDIT: To make life easier for my curators I did run Tesseract first to
generate prelabeled data for my training set. It was about 90% accurate to
start with.

So the process was: Tesseract OCR on some documents to be trained -> hand
curation (2 months)-> train (took about 12 hours) -> 99% (on completely
separate test set)

~~~
dewhelmed
If you don't mind disclosing, what was your particular use-case (the labeled
dataset you trained on)?

~~~
sireat
It was for digitizing 19th century books written in a font and language not
supported in a vanilla Tesseract.

------
zurn
Does this require a Nvidia GPU? Some modules seem to import PyTorch and Cuda
libs:
[https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/dete...](https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/detection.py#L2)

~~~
catalogia
The CPU fallback taking on the order of tens of seconds on my modest i5-5250U
for a few images of street signs I've thrown at it. Good enough for my
purposes at least.

------
pjc50
I'm going to test this on my shopping receipts, if I can get over the Windows
packaging hassle, and report back ..

------
unnouinceput
Compared to ABBYY, how does this thing fare? I don't have the time right now
to do this test and if anyone here did it I'd be thankful to share.

~~~
eastendguy
It can not compete with cloud services from Abbyy, Google, OCR.space and
others. But it runs locally and is open-source.

It works for sparse text on images, and for that specific use case it is
better than Tesseract.

~~~
unnouinceput
I have Abby installed locally as well. I don't use it's cloud components. And
I can set my own server exposing ABBYY's API's to rollout my own cloud server
instead of theirs, if needs be.

------
siver_john
I've recently become interested in OCR due to using Kaku on Android for trying
to get better at reading Japanese. So thanks Hacker News for showing me a new
version. I'd love any comments about other resources that may be good for
learning. Especially because for funsies I'd like to try and develop my own.

~~~
yorwba
For learning, you could try training yourself on datasets for handwritten
character recognition:
[http://etlcdb.db.aist.go.jp/](http://etlcdb.db.aist.go.jp/)

~~~
siver_john
Thanks! A dataset was one of the things I was dreading searching for/building.
(Maybe this was super easily searchable and I'm just a goon, again I'm early
stages of passing interest).

------
dharma1
this is great, been waiting for a newer machine learning based library as an
option to Tesseract.

a bit out of topic - but does anyone happen to know if there is an open-
source, new school OCR library for music notation?

------
gooftop
Anyone know how this (EasyOCR) compares with a service like AWS Textract?

~~~
saradhi
What metrics do you want the comparison on?

Cost: AWS is not free vs Open Sourced

Time: AWS averages under 10 seconds vs 140 seconds on a standard Dell 7480 & 9
seconds on a GPU Google colab

Character Accuracy: Almost same on a high quality input. No comparison with
AWS on a blurred camera photo like this
[https://github.com/ExtractTable/ExtractTable-
py/blob/master/...](https://github.com/ExtractTable/ExtractTable-
py/blob/master/samples/BlurryImage.jpg)

------
joelthelion
Has anyone tried it? How good is it?

~~~
MayeulC
It doesn't work that bad on a few French examples I had lying around: It's
doing quite well on scanned documents, even quite dense ones. Handwriting
doesn't work well at all, even for simpler cases. It managed to recognize a
few words from a blackboard picture, but that's hardly usable.

However, it looks like my simple example of an old "S note" export (like a
lowish resolution phone screenshot) confused it a bit:

    
    
        Reglementation -> Reglemantation
        km -> kn
        illimitée -> illiritée
        limite -> liite
        baptême -> bapteme
        etc.
    

Overall, it works, and it is quite easy to install and use. I'd have to
compare it with tesseract, but I think it's a bit better. A lot slower, though
(I only have AMD devices, no CUDA). It's underusing my CPU, and maybe leaking
memory a bit, though I didn't clean up.

Take that with a grain of salt, that was a quick try, I haven't tried to tune
anything.

------
say_it_as_it_is
The README presents an unfinished example. How does one work with the result?

------
geonnave
From the title I thought it was OCR implemented in 40 programming languages.

------
Yajirobe
How do I train this on my own custom dataset/font?

