Hacker News new | past | comments | ask | show | jobs | submit login
ReCAPTCHA: simultaneously protecting against bots and digitizing books (recaptcha.net)
37 points by parenthesis on Nov 23, 2008 | hide | past | favorite | 14 comments



While I'm not sure this is news (well not anymore) I love ReCAPTCHA.

What I especially like about ReCAPTCHA is the engineering elegance. The second word is hard to break by hackers because it already failed a machine OCR algorithm (obviously that doesn't solve "Turking").

I also like that they have an audio CAPTCHA making their solution accessible by default. This is something a lot of home grown solutions lack. The Web should remain a place for everyone with or without a visual impairment.


"The second word is hard to break by hackers because it already failed a machine OCR algorithm"

But, from what I understand, the hackers don't have to get that word correct. The system only checks the answer to the known one, right?


I think the known ones were previously decoded by humans not machines. I could be wrong. But that would make it a bit crap as you say.


"The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one."


WHY IS THIS ON THE FRONT PAGE?

reCAPTCHA has been around for ages. This isn't something new. Everyone knows about it. It's also widely implemented.

I would like to add however, I hate reCAPTCHA. If you are a web developer who has added reCaptcha to your site, remove it. If you are thinking about adding it in, don't do it! reCAPTCHA takes the annoying in a CAPTCHA and multiplies it by two. No longer are the days where you have to solve one illegible CAPTCHA, now you have to do two. Plus, one of them is so bad that OCR software couldn't recognize it.

It's not your or your users job to digitize books. You aren't doing the world some great justice or charity by implementing reCAPTCHA on your site. If you want to digitize books, go help develop better OCR software. If you want to stop spam, there are better, less invasive methods.


Plus, one of them is so bad that OCR software couldn't recognize it.

... that's the point?


they're not colourful or textured or on a background or anything. If OCR software can't recognize black type (of a familiar typeface) on white, it must be pretty messed up.

I'm just saying it that making the user do this twice is just plain unusable and annoying.


Anecdotally I've found ReCAPTCHA's CAPTCHA easier to fill in than other CAPTCHA that do strange distortions to obfuscate perfectly rendered text.

It seems the human brain is much more adept at decoding the black and white noise resolved in a crappy scan than the artificial manipulations applied by a computer algorithm.

It also seems the opposite is true for computers.


It's on the front page because people thought it was more interesting than everything not on the front page.

Plus, one of them is so bad that OCR software couldn't recognize it.

Wait, what?! I think you don't quite understand the premise of CAPTCHAs.


I was wondering whether or not to flag this, but decided against it on the off chance that people hadn't heard of it.

I find ReCAPTCHAs very easy to fill out, myself. I've never had one wrong yet. If it's doing good at stopping bots, then it's absolutely a prime solution.


It's the weekend. Anyone with reasonable news is waiting until tomorrow to release it.


I fucking despise ReCAPTCHA. The conceited 'digitizing books' canard is so deviously yuppie-guilt-tripping. I've never found any publishings as to the results -- have they actually done anything with the returned text?

CAPTCHAs themselves are rarely the best option -- they're only useful for unauthenticated submissions to extremely prominent sites (Google Accounts, Craigslist) and to extremely popular/poorly-written software (Movable Type, Wordpress).

If I'm already authenticated, or if I'm authenticating from a high-value OpenID provider (Google, etc.) I should not be solving any goddamn CAPTCHAs. If I'm anonymous (or effectively so with an unknown OpenID provider) it's ok to CAPTCHA me, but before you implement them think first about whether anyone's actually going to ATTACK YOU WITH A BOTNET. Unless you're a high-value target for griefing, the only way that's going to happen is if your site is an identical clone in a decrepit monoculture.


Come to think of it I almost never see any that wouldn't be straightforwardly OCRable pre-mutilation...

It'd be hilarious if the whole OCR thing was a gimmick and nothing more!


Your two comments don't add a lot of value to the discussion. (If you'd like, I can go into details as to why they don't.) If you care about that kind of thing, then please try to communicate more calmly and clearly in the future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: