Hacker News new | past | comments | ask | show | jobs | submit login

Is this the same one as featured here:

http://recaptcha.net/digitizing.html

If so, I'm not impressed.




What exactly do you expect? Some of those old documents they're trying to digitize are in such bad shape that you practically need an electron microscope to decipher them. Document recognition in its full generality is still an open problem. The examples shown on that page constitute highly adversarial challenges. For simpler examples of the kind that would prevail with recently printed material, much better results can be achieved.


> Some of those old documents they're trying to digitize are in such bad shape that you practically need an electron microscope to decipher them.

Red herring. I'm talking about the examples pictured.

Do you know _for a fact_ that this is the same software package?

I don't want to waste my time arguing about why it doesn't live up to my expectations, if it's not.


It's not obvious what you're asking but the answer's probably "no".

That page features the output of reCAPTCHA and compares it against an unnamed standard OCR.

The standard OCR does poorly, but it's on tricky documents selected to show the benefits of reCAPTCHA. It doesn't say it's this OCR code, nor does that page really tell you anything about how good it is, if it was.

The other thing you might be saying is that you think the reCAPTCHA output isn't very impressive either. As well as the human element, reCAPTCHA claims to use several standard OCRs to process their document and combine the output is some way. It's possible that the Google code is one of those that they use, but if so it's only part of the process.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: