
OCR and Medieval Manuscripts: Establishing a Baseline (2015) - benbreen
https://brandonwhawk.net/2015/04/20/ocr-and-medieval-manuscripts-establishing-a-baseline/
======
mdani
In my experience, tesserect 4 with LSTM is far more accurate in recognizing
characters. It was not there in 2015, so the post would probably need an
update or a follow up.

[https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-
LST...](https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM)

------
brandonwhawk
HI everyone, thanks for the traffic & comments on this post. I wrote it just
at the start of my research hoping to get feedback & collaborative partners to
push the idea forward. Since this post, I’ve been continuing research on the
topic using better tools & approaches with the good folks of
[https://rescribe.xyz/](https://rescribe.xyz/). We’ve especially been pursuing
tools with neural networks & machine learning (as some of you mention, this is
related to Tesseract). We’ve had some positive results & good findings that
show there are many more possibilities for using OCR to read medieval
manuscripts. We’re currently writing up our results, so watch for that!

------
novc
Take a look at Mike Kestemont, Vincent Christlein, Dominique Stutzmann,
Artificial Paleography: Computational Approaches to Identifying Script Types
in Medieval Manuscripts
([http://www.journals.uchicago.edu/doi/pdfplus/10.1086/694112](http://www.journals.uchicago.edu/doi/pdfplus/10.1086/694112))

------
RandomBookmarks
2015.... A quick test gives me much better results today with Google cloud
vision: [https://ocr.space/compare-ocr-software](https://ocr.space/compare-
ocr-software)

~~~
Omnipresent
...that link is not to google cloud vision

~~~
RandomBookmarks
I should have explained this. The linked page allows you to try out several
online OCR services instantly and compare their results with an overlay. This
includes Google Cloud vision and MS Azure. My idea was that anyone can use
this link to verify my test results. In other words, this link is much more
useful for non-developers than the official API docs at
[https://cloud.google.com/vision/](https://cloud.google.com/vision/) (which
anyone can find easily anyway)

------
Omnipresent
Can deep learning OCR not help with this?

