Neural network OCR in JavaScript

x5n1 · on June 3, 2015

Someone needs to take this and build a captcha service like Google did with reCaptcha and release the results for free. That way we can actually have a free OCR that works very well.

murbard2 · on June 3, 2015

You mean this https://code.google.com/p/tesseract-ocr/ ?

smt88 · on June 3, 2015

In typical Google fashion, most of their improvements to tesseract are now closed-source. It's barely been updated since 2012. It's also not a works-out-of-the-box kinda tool. You have to do a lot of training, which is pretty buggy and often not trivial. After struggling with it for weeks (undocumented bugs in training, mostly), I just went with a commercial solution with actual customer support.

zo1 · on June 3, 2015

What solution did you go with?

x5n1 · on June 3, 2015

they do not provide their training set that they garnered from their ReCaptcha service.

verelo · on June 3, 2015

This is true, and in my experience with Tesseract, while a great project, is almost useless without an amazing training set. The effort to create this set is not insignificant, in fact its actually likely the hardest part of any OCR project (more than building the code that surrounds the rest of your product, at least in the early days anyway)

midgetjones · on June 3, 2015

This is so cool. I can't believe you don't have any tests; I'd be terrified of breaking something in that code.

supercoder · on June 4, 2015

Isn't it just one big test of success rate ?

dharma1 · on June 3, 2015

awesome. Has anyone done OCR with Caffe btw?

_vvhw · on June 3, 2015

This looks great. How does the approach compare to Tesseract? Would it be possible to beat the accuracy of Tesseract with this? Are there any numbers on how long it would take to process an image once trained?

megalodon · on June 4, 2015

Thanks for the suggestion to measure the image processing time, that could be interesting. To be honest I haven't yet tried Tesseract so I can't make any comparisons.

However, I do believe that converting the network input to character codes (output layer size 4 bits for digits, 8 bits for letters) instead of using a softmax layer (would need 10 bits for digits, 26 bits for letters) is a novel approach that really improves performance.

amelius · on June 3, 2015

Hopefully not used to solve captchas.