

ReCAPTCHA: simultaneously protecting against bots and digitizing books - parenthesis
http://recaptcha.net/

======
sh1mmer
While I'm not sure this is news (well not anymore) I love ReCAPTCHA.

What I especially like about ReCAPTCHA is the engineering elegance. The second
word is hard to break by hackers because it already failed a machine OCR
algorithm (obviously that doesn't solve "Turking").

I also like that they have an audio CAPTCHA making their solution accessible
by default. This is something a lot of home grown solutions lack. The Web
should remain a place for everyone with or without a visual impairment.

~~~
aneesh
"The second word is hard to break by hackers because it already failed a
machine OCR algorithm"

But, from what I understand, the hackers don't have to get that word correct.
The system only checks the answer to the known one, right?

~~~
sh1mmer
I think the known ones were previously decoded by humans not machines. I could
be wrong. But that would make it a bit crap as you say.

~~~
aneesh
"The user is then asked to read both words. If they solve the one for which
the answer is known, the system assumes their answer is correct for the new
one."

------
evdawg
WHY IS THIS ON THE FRONT PAGE?

reCAPTCHA has been around for ages. This isn't something new. Everyone knows
about it. It's also widely implemented.

I would like to add however, I _hate_ reCAPTCHA. If you are a web developer
who has added reCaptcha to your site, remove it. If you are thinking about
adding it in, _don't do it_! reCAPTCHA takes the annoying in a CAPTCHA and
multiplies it by two. No longer are the days where you have to solve _one_
illegible CAPTCHA, now you have to do _two_. Plus, one of them is so bad that
OCR software couldn't recognize it.

It's not your or your users job to digitize books. You aren't doing the world
some great justice or charity by implementing reCAPTCHA on your site. If you
want to digitize books, go help develop better OCR software. If you want to
stop spam, there are better, less invasive methods.

~~~
andreyf
_Plus, one of them is so bad that OCR software couldn't recognize it._

... that's the point?

~~~
evdawg
they're not colourful or textured or on a background or anything. If OCR
software can't recognize black type (of a familiar typeface) on white, it must
be pretty messed up.

I'm just saying it that making the user do this _twice_ is just plain unusable
and annoying.

~~~
sh1mmer
Anecdotally I've found ReCAPTCHA's CAPTCHA easier to fill in than other
CAPTCHA that do strange distortions to obfuscate perfectly rendered text.

It seems the human brain is much more adept at decoding the black and white
noise resolved in a crappy scan than the artificial manipulations applied by a
computer algorithm.

It also seems the opposite is true for computers.

------
blasdel
I fucking despise ReCAPTCHA. The conceited _'digitizing books'_ canard is so
deviously yuppie-guilt-tripping. I've never found any publishings as to the
results -- have they actually done _anything_ with the returned text?

CAPTCHAs themselves are rarely the best option -- they're only useful for
unauthenticated submissions to extremely prominent sites (Google Accounts,
Craigslist) and to extremely popular/poorly-written software (Movable Type,
Wordpress).

If I'm already authenticated, or if I'm authenticating from a high-value
OpenID provider (Google, etc.) I should not be solving any goddamn CAPTCHAs.
If I'm anonymous (or effectively so with an unknown OpenID provider) it's ok
to CAPTCHA me, but before you implement them think first about whether
anyone's actually going to _ATTACK YOU WITH A BOTNET_. Unless you're a high-
value target for griefing, the only way that's going to happen is if your site
is an identical clone in a decrepit monoculture.

~~~
blasdel
Come to think of it I almost never see any that wouldn't be straightforwardly
OCRable pre-mutilation...

It'd be hilarious if the whole OCR thing was a gimmick and nothing more!

~~~
palish
Your two comments don't add a lot of value to the discussion. (If you'd like,
I can go into details as to why they don't.) If you care about that kind of
thing, then please try to communicate more calmly and clearly in the future.

