
Show HN: Keras-OCR is an end-to-end, trainable OCR pipeline - faustomorales
https://github.com/faustomorales/keras-ocr
======
faustomorales
Hi HN! I made this because I wanted a toolkit for training custom OCR models
that included both text detection and recognition along with the necessary
tools to create synthetic data. Existing synthetic data generators had more
dependencies and set-up than I felt was absolutely necessary so I took a
different tack that limited dependencies to PIL only.

Some use cases for this package:

\- You can use the pretrained (trained by others!) models for OCR (see the
README for an example) on English text. [0]

\- You can fine-tune a version of the detection and recognition models on a
different alphabet / language (see the tutorial [1]).

\- You can just use the data generator with backgrounds and fonts (I provide a
packaged set of both) to create images with character-level annotations for
some other model [2].

I'd really like to continue improving the image generator to render more
realistic images while retaining the existing mix of simplicity / flexibility.
Ideas welcome!

[0] [https://keras-
ocr.readthedocs.io/en/latest/examples/using_pr...](https://keras-
ocr.readthedocs.io/en/latest/examples/using_pretrained_models.html)

[1] [https://keras-
ocr.readthedocs.io/en/latest/examples/end_to_e...](https://keras-
ocr.readthedocs.io/en/latest/examples/end_to_end_training.html)

[2] [https://keras-
ocr.readthedocs.io/en/latest/examples/end_to_e...](https://keras-
ocr.readthedocs.io/en/latest/examples/end_to_end_training.html#generating-
synthetic-data)

~~~
sansnomme
You should market this more, there is a severe need for a Tesseract
replacement, it has been falling behind current state of the art, especially
compared to many cloud offerings.

~~~
faustomorales
Tesseract is a great solution for scanning books -- but agree that it doesn't
work very well for most other use cases, especially when compared to cloud
providers. FWIW, I have started trying to compare keras-ocr against the cloud
options. [1]

[1] [https://github.com/faustomorales/keras-ocr#comparing-
keras-o...](https://github.com/faustomorales/keras-ocr#comparing-keras-ocr-
and-other-ocr-approaches)

