Someone needs to take this and build a captcha service like Google did with reCaptcha and release the results for free. That way we can actually have a free OCR that works very well.
In typical Google fashion, most of their improvements to tesseract are now closed-source. It's barely been updated since 2012. It's also not a works-out-of-the-box kinda tool. You have to do a lot of training, which is pretty buggy and often not trivial. After struggling with it for weeks (undocumented bugs in training, mostly), I just went with a commercial solution with actual customer support.
This is true, and in my experience with Tesseract, while a great project, is almost useless without an amazing training set. The effort to create this set is not insignificant, in fact its actually likely the hardest part of any OCR project (more than building the code that surrounds the rest of your product, at least in the early days anyway)
This looks great. How does the approach compare to Tesseract? Would it be possible to beat the accuracy of Tesseract with this? Are there any numbers on how long it would take to process an image once trained?
Thanks for the suggestion to measure the image processing time, that could be interesting. To be honest I haven't yet tried Tesseract so I can't make any comparisons.
However, I do believe that converting the network input to character codes (output layer size 4 bits for digits, 8 bits for letters) instead of using a softmax layer (would need 10 bits for digits, 26 bits for letters) is a novel approach that really improves performance.