Hacker News new | past | comments | ask | show | jobs | submit login
OCR and Medieval Manuscripts: Establishing a Baseline (2015) (brandonwhawk.net)
56 points by benbreen on Nov 16, 2017 | hide | past | favorite | 7 comments

In my experience, tesserect 4 with LSTM is far more accurate in recognizing characters. It was not there in 2015, so the post would probably need an update or a follow up.


HI everyone, thanks for the traffic & comments on this post. I wrote it just at the start of my research hoping to get feedback & collaborative partners to push the idea forward. Since this post, I’ve been continuing research on the topic using better tools & approaches with the good folks of https://rescribe.xyz/. We’ve especially been pursuing tools with neural networks & machine learning (as some of you mention, this is related to Tesseract). We’ve had some positive results & good findings that show there are many more possibilities for using OCR to read medieval manuscripts. We’re currently writing up our results, so watch for that!

Take a look at Mike Kestemont, Vincent Christlein, Dominique Stutzmann, Artificial Paleography: Computational Approaches to Identifying Script Types in Medieval Manuscripts (http://www.journals.uchicago.edu/doi/pdfplus/10.1086/694112)

2015.... A quick test gives me much better results today with Google cloud vision: https://ocr.space/compare-ocr-software

...that link is not to google cloud vision

I should have explained this. The linked page allows you to try out several online OCR services instantly and compare their results with an overlay. This includes Google Cloud vision and MS Azure. My idea was that anyone can use this link to verify my test results. In other words, this link is much more useful for non-developers than the official API docs at https://cloud.google.com/vision/ (which anyone can find easily anyway)

Can deep learning OCR not help with this?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact