Hacker News new | past | comments | ask | show | jobs | submit login
What's up, bot? Google tries new Captcha method (cnet.com)
58 points by nickb on April 18, 2009 | hide | past | favorite | 33 comments



I think one fascinating aspect of this whole thing is that by creating tougher and tougher captchas, Google and the like are actually helping make great strides in the field of image recognition by forcing spammers to adapt.


The truth of the whole thing is that Google primarily relies on _behavior_ to detect bots. Captures are just an added layer of defense and serve as a pretty good way of seeding the behavior database. If some user appears to be a bot, serve them a captcha. If they fail, then that behavior was bot-like. Google is using captchas as a way to learn image recognition as well as bot behavior patterns. They know full well that they won't be able to rely on captchas for ever, but they will sure as hell collect as much data out of them as possible.


Actually I'm really tired of that Google's attempts. The other day I tried to find a something like a .txt file containing all the verbs. I tired searching for "finding resting searching" without quotes, Google kept giving me "find rest search", so I tried quoting it: " "finding" "resting" "searching" ", and adding more and more. Most of those were in anchors, not in text, I added: "allintext:"finding" ..." and here it is: "Looks like you are a bot! I can't allow you search for that." :)

Sometimes it's really tempting to write "Google" into Google. :)


Dear God, what's wrong with a query like 'allintext:"finding"' ? I got curious and tried just this query, and sure enough Google said that the query looks like an automated one. It means it's not the pattern of the queries that you have sent that triggered this 'you are a bot' response from Google. It's just the one query allintext:"finding". Removing quotes allowed it to be processed.

I thought it might have to do with the fact that a single word is enclosed in quotes; i.e., when a person is searching generally there is no reason why one would put a single word in quotes. On the other hand an automated search might enclose search text in quotes by default without parsing the text and figuring out whether it contains multiple words or a single word. But it turns out that is not the reason. A query like allintext:"Google sucks" still elicits the 'you are a bot response'. It looks like they ban all queries that enclose text in quotes for an allintext. Might be a bug.


Actually when a single word is quoted in a google search it does something - it turns off the matching of words against similar words.

For example if you search for the misspelled [netflixs] you will still get netflix.com, but you can turn that off by quoting the word with the query ["netflixs"]. The plus sign works similarly.

Compare:

http://www.google.com/search?q=netflixs

http://www.google.com/search?&q=%22netflixs%22



A lot of spam operations use humans to solve captchas, so this isn't much of a change for them (actually could make it easier for them).


Also it's sometimes used as porn-bait (i.e. enter captcha to enter the "free" site).

Human solving is pretty cheap too. I've heard quotes about $2 for 1000 captchas. I guess with 1 click instead of 5-7 letters + enter - it's going to be cents per thousand soon enough, cause you don't even need to know keyboard well or type fast enough. So, I'd say that might be a step backwards.


I've heard this pron meme before, but didn't see any actual proof. Do you have sources?


You can't be that lazy :)

http://google.com/search?q=porn+captcha http://news.bbc.co.uk/2/hi/technology/7067962.stm http://www.concurringopinions.com/archives/2007/10/creative_... (note the quoted part from article) and probably a thousand others

Also I personally did stumble to Google's captcha in quite a few sites (borderline porn) posing it as their own for registration purposes (sometimes monochromed though). Google has easily recognizable captcha.


A few years back (probably in 2006 or 2007) I talked to a bunch of researches who were working on image recongition and they told me they searched high and low for any proof of this and didn't find it. So I have had a cached opinion since then until about 5 minutes ago. :-)


Yes, however spammers are known to operate on very tight margins (they make a few pennies per million users or something) so even slight additional operating costs can seriously affect their bottom line, possibly sending a profitable operation into the red.

So eventually forcing spammers to employ humans instead of bots is not for naught.


This is not true.

It depends on what type of spam we're talking about.

The obvious here is social network spam (you don't enter captchas to send emails), which is where captchas are most used against spammers, can be very profitable ($xxx per x,xxx-xx,xxx users).


I have implemented something similar called "Visual captcha", where the user has to pick the cat out from 6 randomized Flickr pictures. The code is freely available and should not be hard to integrate into your own projects. Read more on: http://amix.dk/blog/viewEntry/19338


What is to stop a bot from just clicking randomly? A 1/6 chance of registering an account is good enough for botting purposes. That's the problem with these multiple-choice solutions. They simply aren't strong enough protection against a flood of tries.


the real challenge is to collect enough pictures so that your database could not be broken by brute-force. Of course you can use Flickr's tag to identify it, but there are many mis-matches. Google's method solved this problem because image orientation is a heavily studied problem in computer vision which could take advantage of computer to help creating large visual database and then prune out the easy case for computer.


Flickr is harder, I'd use a different database.Doing image searches using existing online images is becoming easier (e.g. TinEye). An evil bot would just do image searches against flickr for each of those images and read/scan the tags there to be sure.


Is that really effective? Just take some face detection code and have it learn what cat fur looks like instead.


It's ok effective judging by my usages and I could just iterate over this solution if someone brute forced it or implemented a "cat fur detector" (which for my uses is pretty unrealistic).

Battling spammers is really a battle where you implement better protections and they implement better attacks. The conclusion so far has been that it isn't possible to check-mate them - - and it's very unlikely it will ever happen as a lot of spammers use "human bots"...

Using visual captcha is much more user friendly thought, so it's a win for the users.


There is actually a 'Google Tech Talk' on detecting cats (versus other shapes) from images:

http://www.youtube.com/watch?v=-w72_VwSj6A

  Is it really that easy?  I would think flagging for texture would be difficult, as it's
 likely indistinguishable from noise (data, shadows, clothing..).  Approximating shape/avg.
 contour is likely easier than looking for fur patterns.

  Any suggestions on books/papers/etc on image recognition?


That's a cool idea, but they'll still have to provide an audio CAPTCHA for blind users, which is easy to solve.



Actually, I think the noise proves his point. The fact that adding so much noise was necessary means computers are pretty good at solving that problem, and it has to be made next to impossible for humans before computers can no longer solve it either.


Wow, I could barely make out the audio captcha!


But you could still make it out, right?


I couldn't figure out the audio captcha. People who are blind probably have better hearing, so maybe it's easier for them.


I've heard a lot of anecdotal evidence that young blind people have better-than-average hearing, but what about the elderly?

I'm not sure anyone I know over 70 years old could pass either of those captchas. I guess the only consolation is that most people that age often have younger people sign up for them.


I certainly couldn't. Perhaps they rely on blind people having heightened auditory senses.


Wow, that's incredible how much noise they've added. I couldn't make it out myself. Its bordering on unusable IMHO.

Still, I wouldn't be surprised if a computer could get 1 in 4, which is plenty to be effective. They're still only using the 10 digits for the signal.


can't someone figure out an anti-spam method that doesn't require a captcha? I mean its not like captchas even work, I'm yet to see one that works 100%


Craigslist's phone verification works well. The downside is it's expensive to run and hard if not impossible to scale globally.


The best you'll ever get is human to human contact.

But that doesn't scale cheaply.


And there is the essence of the need for captchas ;)

You need to scale, so you employ computers instead of people. Now you have to have computers run a reverse turing test. Otherwise, it's much easier for spammers to scale as well.

It's easy to cut down on spam if you don't accept any incoming mail.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: