

Breaking and Fixing CAPTCHA - DigiHound
http://www.extremetech.com/computing/103283-breaking-and-fixing-captcha

======
nowarninglabel
>Wave distortion is the most effective The swirly, almost-melding-into-each-
other effect that Google uses in its captchas is very hard to segment and
recognize.

I'd argue that it is nearly as effective at preventing human readers from
correctly interpreting the CAPTCHA as well. I am not proud of my success rate
on Google's CAPTCHAs. It would seem the better route for CAPTCHA makers would
be to figure out what makes recaptcha work so well, without having to resort
to such levels of swirly distortion.

~~~
showerst
I wonder if there's enough demand for someone to start a service that provides
those visual-style captchas "Click the two puppies in this 3x3 grid of kitten
pictures", but with a large enough sample size / solid enough presentation to
make it useful.

Granted, that approach takes up more screen real estate by definition, but is
very hard to break, and can be made to look more professional than kitties and
puppies.

~~~
cbr
The article claims that a 1% failure rate is considered too high. Random
guessing gets you over 1% success on your example. Making it larger helps, but
makes it harder too.

The bigger problem is: where do you get a giant source of kitten and puppy
pictures that the spammers don't also have access to? It needs to be large
enough that it's not worth it for the spammers to sit down and manually
categorize them. It needs to be difficult to replicate because if you're just
taking 10K pictures of each from an image search, the spammers can do that
too.

~~~
jakubw
Well, a few months ago I had my go at a service that would use data from
people working on machine learning systems so I guess that part is not a big
deal. The bigger problem is people are accustomed to typical 'write down the
word' CAPTCHAs and yet, they're fed up with them, so anything different would
only be more annoying and hardly more secure. It's also hard to think of a
challenge type that would have enough combinations for blind guessing to be
ineffective and not be of the text-based type we all know. You'd need dozens
of kitten and puppy pictures in one challenge to have a reasonable failure
rate and no one will be interested in solving those.

------
alttag
The trouble with CAPTCHAs is two-fold: First, that even a small success rate
(say, 10%) renders them ineffective to someone with enough bandwidth. Second,
and perhaps more importantly, it doesn't matter how good recognition
algorithms get! The easiest way to get around a CAPTCHA is have a secondary
system that farms out CAPTCHAs to low-wage workers (Amazon turk?) or as games
or as a mild inconvenience in the way hedonic pleasure-seeking (pr0n). If a
porn sit and a hacker combine, the hacker can have the CAPTCHAs solved
manually by the porn site's users for free. No need to write a fancy
recognition algorithm (although the success rates for most public decoders are
already sufficiently high).

In short, CAPTCHAs are bothersome because they only deter, not prevent access,
and are thus only slightly more effective than a "No trespassing" sign or
robots.txt file.

~~~
showerst
I think that depends on the value of the service that they're blocking.

I have a trivially breakable CAPTCHA on my web blog, but it cut automated
contact form/reply spam down from dozens per day to zero.

They definitely have a place, just not for anything super high value.

~~~
pilom
I feel in that case, a honeypot field is a better technique. It has
approximately the same spam-rejection rate and doesn't hinder your real human
users.

------
nekitamo
No need to break captchas, just get humans to type them for you at $1.39 /
1000 captchas. <http://deathbycaptcha.com/user/login>

Also the article mentions that reCaptcha is the most secure implementation of
captchas so far. This is semi-true. Up until mid-August of this year (if I
remember correctly) reCaptcha was quite trivial to break. Then in late-
August/early-September someone apparently kicked their ass into high gear and
they released several different variants of reCaptcha only weeks apart
(unfilled in letters and lines that ran through multiple words). Again these
were trivial to crack by simply modifying the old OCR to work remove the new
distortions.

Then in late September (I believe, I don't record these dates anywhere) they
finally settled on a wave distortion that I have yet to figure out how to
break (due to lack of skill, time, and interest). You can check it out here:

<http://www.google.com/recaptcha/demo/ajax>

But all spammers that need to crack reCaptcha usually do so for account
creation. In this case, their margins are high enough for them to simply use
deathByCaptcha. So, captchas are defeated, not due to the security of their
implementation, but due to economics.

------
cbr
Single page: [http://www.extremetech.com/computing/103283-breaking-and-
fix...](http://www.extremetech.com/computing/103283-breaking-and-fixing-
captcha?print)

------
adbge
If you are thinking about putting together your own CAPTCHA solving program,
as hinted at near the end of the article, keep in mind that it may be illegal
to break a CAPTCHA without explicit permission under the DMCA. I'm not a
lawyer, though.

More info: <http://www.chillingeffects.org/anticircumvention/>

------
pilom
The answer is use other techniques to block spam bots. Honeypot fields (user
invisible fields, bots assume to fill in all fields in the source regardless
of if those fields are rendered for a user) and timing data entry (no human
can fill in a form with a few fields in .3 seconds like a bot can) both are
not broken (yet) and do not have a measurable negative affect on conversions.

~~~
jakubw
I'm surprised these techniques are still effective, especially the hidden
field trick. It's not rocket science to embed a proper Web layout engine into
an application these days so the bots should already be doing that rather than
working with plain HTML.

~~~
mike-cardwell
I've been doing the hidden field trick for the best part of a year now and
it's still working brilliantly.

------
chmike
This web site is a total mess when accessed with an iPad. It is not readable.
They automatically switch to mobile media. When trying to read page 2 it
switch from one media type to the other back and forth. It's very frustrating,
because this article seems really interesting.

------
ims
This just occurred to me -- sites that use CAPTCHA are probably difficult for
blind and/or elderly users. How are sites dealing with this, if at all?

~~~
jakubw
Most sites aren't. reCAPTCHA has an accessible version, though, featuring a
voice recording of a sequence of digits distorted by noise and background
voices.

------
plasma
reCaptcha instantly makes my blood pressure rise, it's too cryptic.

