

Show HN: Neural network optical character recognition - megalodon
https://github.com/mateogianolio/mlp-character-recognition

======
bradneuberg
Are the reported success metrics on the training or testing set? The website
says its on the training set, which shouldn't be a valid metric of success
since neural networks can easily overfit to their training data (one of their
downsides if you aren't careful).

Having the output layer be an 8-bit character representation though is very
clever, rather than a softmax layer with each node being the relative
probability of a given character. That probably lowers the number of free
parameters you have to train, which probably speeds up training and can help
prevent overfitting. I'm interested in knowing what the true success rate is
with this approach as it seems clever.

Btw, what's your loss function on the output layer?

~~~
megalodon
Yes, the success metrics are measured on the training set.

I just ran a measurement with a separately generated testing set consisting of
52k characters which yielded a success rate of 98.53% on letters a-z. This
turns out to be almost exactly the same as the rate presented on the repo page
(98.52%). Will upload the result for your perusal as soon as possible.

You will have to browse the synaptic neural network library [1] for an answer
to your second question.

[1]: [https://github.com/cazala/synaptic](https://github.com/cazala/synaptic)

EDIT: A separate testing set is now generated in addition to the training set.
I updated the success rates of the examples in the readme accordingly.

------
frik
A training set like Google's _Recaptcha_ data would be useful. Maybe Project
Gutenberg, Wikipedia and other open source projects should start an open
Recaptcha-like service to collect such data based on scanned
documents/books/etc.

~~~
beagle3
Google has released street view house number data. It's mostly digits, but
it's a start.

~~~
frik
600,000 digit images (labeled data):

[http://ufldl.stanford.edu/housenumbers/](http://ufldl.stanford.edu/housenumbers/)

~~~
z92
Won't an easier way will be to copy a large text from net, paste it into MS
Word. Export each page as image. And then train it using that image and the
known text.

~~~
frik
It won't help that much, as every character/letter/digit will be of the same
font face, e.g. Arial. Then it can only recognize scans with Arial font.

One needs a real world training set, a lot of variation, like house numbers
and scanned books with labeled datasets.

~~~
megalodon
I'm planning on adding a JSON config file to be able to tweak the network and
add a lot more variation to the generated training set. Meanwhile, if you want
to train it with several different fonts you can just add them to the 'fonts'
array in captcha.js.

------
singularity2001
Shameless plug: Similar thing for GPU [https://github.com/pannous/caffe-
ocr](https://github.com/pannous/caffe-ocr)

