
How Not to invent the next-gen CAPTCHA - spydez
http://lbrandy.com/blog/2009/10/how-not-to-invent-the-next-gen-captcha/
======
blahedo
Yeah, the problem with most captchas is that they're visual and they're
working in an area where computers are starting to get really good. The main
thing that computers are still pretty bad at is understanding completely
arbitrary text, so the way to go has to be in that direction (which also has
the benefit of being handicap-accessible). Ask questions, in text, that have
textual answers. Throw in a random-number generator so that the questions
aren't just a memorisable list. And most important, make EVERY USER of the
captcha system ("user" here meaning the owner of the blog or the site, not the
visitor to the site) able to edit the question list. Any system that is
uniform across all its users will become a target for spam-hackers to break.
But if you can change the question? Maybe even just _rephrase_ the question?
Way, way harder. People are still smarter than computers, we just have to give
them the chance to actually do their thing.

I wrote a plugin years ago for MovableType as a proof of concept (which I
still use on my own blog): <http://www.blahedo.org/botblock/> Even a user that
doesn't want to touch any of the code (even though it's pretty easy) can
always edit "Add one to this number:" to "What number comes after this
number:" or somesuch. To solve the "more humans on this end" problem, it seems
like you have to let them modify the very questions themselves.

~~~
jerf
I spent a couple of years in college working on a learning content management
system that had a large bank of randomized questions. Believe me, that's
perfectly defeatable. I have existence proofs.

The question "RAND(1,100) + RAND(1,100) = ?" may represent 10,000 distinct
questions, but the effort to answer them is only marginally greater than the
effort you spent in writing it. Basically _every_ CAPTCHA approach based on
"I'll just have a bank of X" (questions, images, etc) will fail, because the
spammers can classify faster than you can add to the set.

Note the "conventional" CAPTCHA, which has stood the test of time, doesn't
have a "bank" of anything, it generates fresh stuff all the time. ReCAPTCHA
has a bank, but it's structured to be way larger than any set of questions you
will ever pull, and is also cleverly set up so that they still benefit a bit
even if it is "broken".

~~~
blahedo
Arguing about "RAND(1,100)..." is beside the point. First of all, adding two-
digit numbers is _not_ something that's trivially easy for a lot of humans.
But more importantly, it's not leveraging any level of textual natural
language understanding. It's true that any spammer that decides to target your
question will be able to write a rule for their rulebase that will defeat your
captcha, but the whole thing that makes the spammer's task economical is that
they spend zero or a very tiny amount of person-time per spammed site. This
throws a spanner in the works.

I'm also not sure I'd say that the "conventional" variety has "stood the test
of time". I still see sites using them, but many of them are now so hard for
humans to make out that you have to make multiple tries. And that's if you're
a human with good eyesight and full cognition. The audio captchas out there
are loud and obnoxious and mostly incomprehensible.

------
swombat
Interesting to note that the article Louis refers to has been pulled. I guess
I'd pull that article too after it was pointed out how monumentally stupid the
whole thing was...

------
dpcan
The real solution might be to have Captcha technology built right into the
browser that can physically detect whether or not a human is interfacing with
the form.

~~~
docmach
How would this work? Spammers aren't going to use a web browser with this
feature, so it would only bother real people.

