

 Yahoo CAPTCHA Cracked. - hhm
http://www.0x000000.com/?i=502

======
derefr
A CAPTCHA method that just came into my head at this moment, and thus is
completely unfounded and horrible: "Please compose a haiku on the topic of
'togetherness' [or something equally vague]. Your work will then be passed
through a Bayesian filter specifically trained on that topic."

------
dawnerd
I really like to use reCpatcha. Sure it's not the most secure, but it's for a
good cause, and it keeps out the basic bots.

For anyone who doesn't know what recaptcha is: <http://recaptcha.net/>

~~~
nickb
I wrote about reCaptcha before and the big problem with it is that it can be
hard. And by hard I mean that the majority of people can't solve it. If you
have a big site and you have a spam issue, then by all means, use it. But if
you're a small site that's trying to grow and a site that considers every
potential user to be precious, stay away from it because the last thing you
want is to frustrate people!

Since you will not be a target early on, consider not using captcha at all
since spammers don't target sites that have no traffic.

~~~
thomasswift
do you have a link? i love to read your thoughts. I have been thinking about
using recaptcha, primarily because of the little refresh button and of course
the captcha.

~~~
nickb
it's somewhere on N.YC but I can't find it anymore :(

------
rms
I'm surprised the Russian hackers gave away their implementation for free. I
suspect it was an unintentional leak.

Anyone else remember that Chinese website that was selling captcha hacks for
20 different types of captchas?

------
dejb
Oh my god! The singularity has begun!

------
ajkirwin
A long time ago I decided, that if I were to ever have a captcha system, it
wouldn't use ones of that style, which are getting so.. randomized these days,
it's hard for a human to read.

Frankly, I think knowledge or logic based captchas are the way to go.

"Todd is three times as old as Jane is. When Jane was ten years younger, Todd
was five times as old as Jane is. How old is Todd?" for example.

Sure, takes longer to think about, but if someone can write a script to start
parsing logic puzzles like that, quickly, and use it to defeat website signup
authentication methods?

I'll buy that person a pint.

~~~
marcus
I'd take you on that bet. Write the system and I'll write the solver. I drink
Guinness :)

A better system would be to ask the user to identify the gender of a person
based on an image to tell whether the picture is of a dog or of a cat and so
on, these tasks are trivial for a person and very very hard for a program.

~~~
apgwoz
My introduction to AI class required us to write a classifier for people in
the news (20 different images of 10 different people). We were given the
location of the people's faces, but we were able to get 60% accuracy using
SVMs and the 32x32 block of pixels (nose, eyes, mouth region). This was the
"baseline" system. Some systems were getting nearly 85+%. I must admit though,
that this was a restricted dataset, but the faces were not all looking
straight ahead the way eigenfaces are, and I'm sure with enough data, and
enough features that sort of CAPTCHA could be defeated a large percentage of
the time.

~~~
marcus
You are confusing two different tasks, identifying a person out of a small
comparison group is relatively easy - just deconstruct the face & compare
certain facial features. There was a ton of research on the subject and even
some working commercial products (my schools AI lab uses one we built as a
lock).

Identifying gender is significantly harder.

You want something a lot harder, have them click on the picture of the more
attractive person, use data from a hotornot type site (just make sure the data
isn't public). Good luck solving that with Support Vector Machines. If you
want to generate more data just use build RE-CAPTCHA type system.

~~~
apgwoz
I don't see why it'd be harder with good features, and after looking at the
article again, 35% accuracy was considered a success. Obviously, I'm not as
qualified as you in this sense, but it seems logical based on results I've
seen (again, admittedly not the same quality as you've probably seen).

~~~
hhm
I checked on the "obviously, I'm not as qualified as you in this sense" by
looking at the user info, and I remembered that "Ideas to monetize new
artifical intelligence" thread... so Marcus, sorry if it's offtopic, but how
did you solve that problem? Are you doing captchas maybe? :)

~~~
marcus
Doing CAD - Computer Assisted Diagnosis - working on improving early cancer
detection in Mammography.

~~~
marcus
In a way I understand trying to apply the algorithm manually for each client
is wasteful, negotiation with each client is tiring especially when its with a
7B company.

I'm thinking about using the idea I got of building a web-service around it,
and letting people find their own uses for it.

Considering applying to YC with the idea.

~~~
hhm
Thanks a lot for your reply and details! I was very interested on it.

