

Pure JS OCR via Emscripten - DrinkWater
http://antimatter15.com/wp/2013/12/ocrad-js-pure-javascript-ocr-via-emscripten/

======
antimatter15
Hey, I'm the author here. My web host is having some problems right now, and
I'd always intended to post a link to the demo instead, so I've put up a
Cloudflare redirect straight to the demo.

------
binarymax
Seems to be a link to a blog post which went down, and cloudflare didnt have a
working version...so here is the link to the GH project:

[http://antimatter15.github.io/ocrad.js/demo.html](http://antimatter15.github.io/ocrad.js/demo.html)

~~~
davedx
I've kicked the tires of the live demo, and found that it seems to only work
well if you draw seriffed letters. My initial attempt at a B resulted in a 'g'
result. I then tried again to make it as perfect as possible, and got an '8'.
So then I tried again, adding serifs (I think?) at the top and bottom
extending to the left, and finally got a 'B'.

My attempts at full words took a lot of tweaking of the form of the letters to
get a correct match. Many letters were not identified at all.

~~~
gjm11
It's intended for recognizing typeset characters, not handwriting.

So why did they provide an interface that lets you scribble? I dunno. For fun,
I guess.

(Though the author does say "Ocrad does seem to vastly outperform GOCR when it
comes to letter sketches on a canvas, so that's the one I'm focusing on here."
which suggests that recognizing hand-drawn letters is something the author's
interested in.)

------
SunboX
If we are on optical recognition, did someone try compiling ZBar [1] to
JavaScript? I did it but failed to get it reading canvas pixels. :/

[1] [http://zbar.sourceforge.net/](http://zbar.sourceforge.net/)

~~~
jankey
Try
[https://github.com/LazarSoft/jsqrcode..](https://github.com/LazarSoft/jsqrcode..).

~~~
SunboX
It's only a QR-Code reader ;) ZBar supports all kind of Barcodes. But this one
seems interesting:

[https://github.com/EddieLa/BarcodeReader](https://github.com/EddieLa/BarcodeReader)

------
jheriko
I'm always curious... why not port the original rather than use emscripten?
Naively you should get better performance and maintainability... is it just a
time saving thing?

~~~
marcosscriven
Actually Emscripten outputs a strict subset of JS dubbed asm.js. Using this
allows some really significant speed improvements in execution, due to
simplified type checking.

My understanding then is that for certain things this could well be faster
than a hand written Javascript port.

~~~
Ravengenocide
asm.js can quite easily be optimized, but V8 is not yet optimized for it. So
depending on the case it might run faster or slower depending on what
JavaScript engine is running it.

------
NatW
Guess you could use it as part of a captcha-defeating toolchain.

~~~
yogo
This won't beat anything but a very very simple captcha. From what I gathered
it is using feature extraction only. This means you can't train it with a lot
of data to increase its accuracy.

------
imdsm
Source:
[https://github.com/antimatter15/ocrad.js](https://github.com/antimatter15/ocrad.js)

------
mrfusion
Would it be possible to use this in an iOS or android app? Or is there a
better way to get OCR in an app.

~~~
auvrw
of course, but as the article says, tesseract, which was developed at HP in
the 80's and more recently adopted by google, is really the engine of choice.
you might check out

[https://github.com/rmtheis/tess-two](https://github.com/rmtheis/tess-two)

------
danso
A. Cool

B. It is 6:30AM and before I've had my coffee...but am I reading right that he
isn't using Tesseract? I know he says it that it was a bad idea to even try
compiling it, but then spends a large part of the post talking about how great
Tesseract is...just wanted to make sure I didn't miss a: "Well, finally bit
the bullet and successfully got Tesseract compiled"

C. If not using Tesseract, then what is the rate of accuracy of what he's
using (GOCR and Ocrad) compared to Tesseract? I see that GOCR was recently
updated to 0.5 (though not uploaded to SourceForge yet, according to the notes
[http://jocr.sourceforge.net/](http://jocr.sourceforge.net/))

FWIW, Tesseract is at 3.02 and its latest release notes are dated
10/23/2012...While doing things in straight JS has a lot of value in web
apps...Tesseract, from my experience, is really far ahead of its OSS peers,
and further along than a lot of commercial packages. I'm not sure the
conveniences of pure JS OCR outweigh the necessity for accuracy in this domain

~~~
jahewson
Tesseract is certainly further along than its OSS peers but it's not even
close to commercial packages. The most promising OSS project I've seen is
another Google-sponsored effort, OCRopus
[http://code.google.com/p/ocropus/](http://code.google.com/p/ocropus/) but it
is very much ongoing research.

~~~
tobltobs
Ocropus is using Tesseract as OCR engine.

------
ye
An OCR article without a single image.

