What's up, bot? Google tries new Captcha method

pavel_lishin · on April 18, 2009

I think one fascinating aspect of this whole thing is that by creating tougher and tougher captchas, Google and the like are actually helping make great strides in the field of image recognition by forcing spammers to adapt.

snprbob86 · on April 18, 2009

The truth of the whole thing is that Google primarily relies on _behavior_ to detect bots. Captures are just an added layer of defense and serve as a pretty good way of seeding the behavior database. If some user appears to be a bot, serve them a captcha. If they fail, then that behavior was bot-like. Google is using captchas as a way to learn image recognition as well as bot behavior patterns. They know full well that they won't be able to rely on captchas for ever, but they will sure as hell collect as much data out of them as possible.

rarestnews · on April 18, 2009

Actually I'm really tired of that Google's attempts. The other day I tried to find a something like a .txt file containing all the verbs. I tired searching for "finding resting searching" without quotes, Google kept giving me "find rest search", so I tried quoting it: " "finding" "resting" "searching" ", and adding more and more. Most of those were in anchors, not in text, I added: "allintext:"finding" ..." and here it is: "Looks like you are a bot! I can't allow you search for that." :)

Sometimes it's really tempting to write "Google" into Google. :)

nebula · on April 19, 2009

Dear God, what's wrong with a query like 'allintext:"finding"' ? I got curious and tried just this query, and sure enough Google said that the query looks like an automated one. It means it's not the pattern of the queries that you have sent that triggered this 'you are a bot' response from Google. It's just the one query allintext:"finding". Removing quotes allowed it to be processed.

I thought it might have to do with the fact that a single word is enclosed in quotes; i.e., when a person is searching generally there is no reason why one would put a single word in quotes. On the other hand an automated search might enclose search text in quotes by default without parsing the text and figuring out whether it contains multiple words or a single word. But it turns out that is not the reason. A query like allintext:"Google sucks" still elicits the 'you are a bot response'. It looks like they ban all queries that enclose text in quotes for an allintext. Might be a bug.

lacker · on April 20, 2009

Actually when a single word is quoted in a google search it does something - it turns off the matching of words against similar words.

For example if you search for the misspelled [netflixs] you will still get netflix.com, but you can turn that off by quoting the word with the query ["netflixs"]. The plus sign works similarly.

Compare:

http://www.google.com/search?q=netflixs

http://www.google.com/search?&q=%22netflixs%22

jrp · on April 19, 2009

http://wiki.answers.com/Q/Is_there_any_text_file_containing_... http://www.apsaulters.net/downloads.html

omarchowdhury · on April 18, 2009

A lot of spam operations use humans to solve captchas, so this isn't much of a change for them (actually could make it easier for them).

rarestnews · on April 18, 2009

Also it's sometimes used as porn-bait (i.e. enter captcha to enter the "free" site).

Human solving is pretty cheap too. I've heard quotes about $2 for 1000 captchas. I guess with 1 click instead of 5-7 letters + enter - it's going to be cents per thousand soon enough, cause you don't even need to know keyboard well or type fast enough. So, I'd say that might be a step backwards.

DenisM · on April 18, 2009

I've heard this pron meme before, but didn't see any actual proof. Do you have sources?

rarestnews · on April 18, 2009

You can't be that lazy :)

http://google.com/search?q=porn+captcha http://news.bbc.co.uk/2/hi/technology/7067962.stm http://www.concurringopinions.com/archives/2007/10/creative_... (note the quoted part from article) and probably a thousand others

Also I personally did stumble to Google's captcha in quite a few sites (borderline porn) posing it as their own for registration purposes (sometimes monochromed though). Google has easily recognizable captcha.

DenisM · on April 19, 2009

A few years back (probably in 2006 or 2007) I talked to a bunch of researches who were working on image recongition and they told me they searched high and low for any proof of this and didn't find it. So I have had a cached opinion since then until about 5 minutes ago. :-)

Hexstream · on April 19, 2009

Yes, however spammers are known to operate on very tight margins (they make a few pennies per million users or something) so even slight additional operating costs can seriously affect their bottom line, possibly sending a profitable operation into the red.

So eventually forcing spammers to employ humans instead of bots is not for naught.

omarchowdhury · on April 19, 2009

This is not true.

It depends on what type of spam we're talking about.

The obvious here is social network spam (you don't enter captchas to send emails), which is where captchas are most used against spammers, can be very profitable ($xxx per x,xxx-xx,xxx users).

amix · on April 18, 2009

I have implemented something similar called "Visual captcha", where the user has to pick the cat out from 6 randomized Flickr pictures. The code is freely available and should not be hard to integrate into your own projects. Read more on: http://amix.dk/blog/viewEntry/19338

whughes · on April 19, 2009

What is to stop a bot from just clicking randomly? A 1/6 chance of registering an account is good enough for botting purposes. That's the problem with these multiple-choice solutions. They simply aren't strong enough protection against a flood of tries.

liuliu · on April 18, 2009

the real challenge is to collect enough pictures so that your database could not be broken by brute-force. Of course you can use Flickr's tag to identify it, but there are many mis-matches. Google's method solved this problem because image orientation is a heavily studied problem in computer vision which could take advantage of computer to help creating large visual database and then prune out the easy case for computer.

thorax · on April 18, 2009

Flickr is harder, I'd use a different database.Doing image searches using existing online images is becoming easier (e.g. TinEye). An evil bot would just do image searches against flickr for each of those images and read/scan the tags there to be sure.

henning · on April 18, 2009

Is that really effective? Just take some face detection code and have it learn what cat fur looks like instead.

amix · on April 18, 2009

It's ok effective judging by my usages and I could just iterate over this solution if someone brute forced it or implemented a "cat fur detector" (which for my uses is pretty unrealistic).

Battling spammers is really a battle where you implement better protections and they implement better attacks. The conclusion so far has been that it isn't possible to check-mate them - - and it's very unlikely it will ever happen as a lot of spammers use "human bots"...

Using visual captcha is much more user friendly thought, so it's a win for the users.

stevenrace · on April 19, 2009

There is actually a 'Google Tech Talk' on detecting cats (versus other shapes) from images:

http://www.youtube.com/watch?v=-w72_VwSj6A

  Is it really that easy?  I would think flagging for texture would be difficult, as it's
 likely indistinguishable from noise (data, shadows, clothing..).  Approximating shape/avg.
 contour is likely easier than looking for fur patterns.

  Any suggestions on books/papers/etc on image recognition?

qeorge · on April 18, 2009

That's a cool idea, but they'll still have to provide an audio CAPTCHA for blind users, which is easy to solve.

markbao · on April 18, 2009

Really?

https://www.google.com/accounts/NewAccount

DavidSJ · on April 18, 2009

Actually, I think the noise proves his point. The fact that adding so much noise was necessary means computers are pretty good at solving that problem, and it has to be made next to impossible for humans before computers can no longer solve it either.

ctingom · on April 18, 2009

Wow, I could barely make out the audio captcha!

pavel_lishin · on April 18, 2009

But you could still make it out, right?

antiismist · on April 18, 2009

I couldn't figure out the audio captcha. People who are blind probably have better hearing, so maybe it's easier for them.

harpastum · on April 18, 2009

I've heard a lot of anecdotal evidence that young blind people have better-than-average hearing, but what about the elderly?

I'm not sure anyone I know over 70 years old could pass either of those captchas. I guess the only consolation is that most people that age often have younger people sign up for them.

tlrobinson · on April 19, 2009

I certainly couldn't. Perhaps they rely on blind people having heightened auditory senses.

qeorge · on April 18, 2009

Wow, that's incredible how much noise they've added. I couldn't make it out myself. Its bordering on unusable IMHO.

Still, I wouldn't be surprised if a computer could get 1 in 4, which is plenty to be effective. They're still only using the 10 digits for the signal.

vaksel · on April 18, 2009

can't someone figure out an anti-spam method that doesn't require a captcha? I mean its not like captchas even work, I'm yet to see one that works 100%

jonknee · on April 18, 2009

Craigslist's phone verification works well. The downside is it's expensive to run and hard if not impossible to scale globally.

Goladus · on April 18, 2009

The best you'll ever get is human to human contact.

But that doesn't scale cheaply.

shader · on April 19, 2009

And there is the essence of the need for captchas ;)

You need to scale, so you employ computers instead of people. Now you have to have computers run a reverse turing test. Otherwise, it's much easier for spammers to scale as well.

It's easy to cut down on spam if you don't accept any incoming mail.