B. It is 6:30AM and before I've had my coffee...but am I reading right that he isn't using Tesseract? I know he says it that it was a bad idea to even try compiling it, but then spends a large part of the post talking about how great Tesseract is...just wanted to make sure I didn't miss a: "Well, finally bit the bullet and successfully got Tesseract compiled"
C. If not using Tesseract, then what is the rate of accuracy of what he's using (GOCR and Ocrad) compared to Tesseract? I see that GOCR was recently updated to 0.5 (though not uploaded to SourceForge yet, according to the notes http://jocr.sourceforge.net/)
FWIW, Tesseract is at 3.02 and its latest release notes are dated 10/23/2012...While doing things in straight JS has a lot of value in web apps...Tesseract, from my experience, is really far ahead of its OSS peers, and further along than a lot of commercial packages. I'm not sure the conveniences of pure JS OCR outweigh the necessity for accuracy in this domain
My attempts at full words took a lot of tweaking of the form of the letters to get a correct match. Many letters were not identified at all.
So why did they provide an interface that lets you scribble? I dunno. For fun, I guess.
(Though the author does say "Ocrad does seem to vastly outperform GOCR when it comes to letter sketches on a canvas, so that's the one I'm focusing on here." which suggests that recognizing hand-drawn letters is something the author's interested in.)
this is often good thinking, but it is fallacious.
for high performance or critical code i've found many situations where platform or ancient libraries or software have enormous bugs, memory leaks or are trivially outperformed
memcpy is perhaps the best example of this, yes, memcpy... e.g. http://software.intel.com/en-us/articles/memcpy-performance and skim reading that I already know how to outperform their implementation - even if its a tiny bit. (and no i'm not thinking cache hints which suddenly seem to be flavour of the month now that script kiddies have discovered them...), but i've also found bugs in increasing numbers over the years, the latest flavours of Microsoft madness (WinRT) is pretty leaky and hand tying whilst slowing you down - Objective-C/Cocoa Touch isn't far behind with overkill super generic late binding interfaces that spunk my performance up the wall and a reference counting system which has caused me more trouble than new and delete ever have... not to mention various bugs, especially with the wide character and unicode support in their c std lib... don't get me started on *nix - even make has a serious fail by using timestamps to detect changes!
all code is made by progammers, most programmers are terrible, some are merely bad