Hacker News new | past | comments | ask | show | jobs | submit login
Are You a Robot? Introducing “No CAPTCHA ReCAPTCHA” (googleonlinesecurity.blogspot.com)
694 points by r721 on Dec 3, 2014 | hide | past | web | favorite | 420 comments

"If we can collect behavioural data from you, and it matches closely the behaviour of other humans, you are a human. Otherwise you're not a human."

Does anyone else get that feeling from the description of what Google is doing? I've tripped their "we think you are a bot" detection filter and been presented with a captcha countless times while using complex search queries and searching for relatively obscure things, and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human. I have JS and cookies disabled so they definitely cannot track my mouse movements and I can't use this way of verifying "humanness", but what if they get rid of the regular captchas completely (based on the argument that eventually the only ones who will make use of and solve them are bots)? Then they'll basically be saying "you are a human only if you have a browser that supports these features, behave in this way, and act like every other human who does." The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

(Or maybe I'm really a bot, just an extremely intelligent one. :-)

> and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human.

Insulting? how is that insulting? You are entitled. You are entitled you block scripts, use Google's FREE service to perform any search query to search the web while blocking any program that attempts to identify you as not a bot.

But if while using their free service they cannot identify you as a human given the factors they measure(becuase you actively disabled the programs that measure such factors), then I see nothing wrong in them trying alternative ways (which were a standard before).

I think you are making a storm in a teacup. If you feel offended by the way their website work, just don't use them. I don't see any red flags at all.

You are entitled.

Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.

Been useful at changing their behaviour/beliefs? No, for the same psychological reasons any direct contradiction isn't.

Been useful at communicating that they feel someone is confusing expectations with rights? Yes, that's why there is a word for it.

Also, I get it's annoying to hear entitled, ad hominem, logical fallacy, privilege, and other currently trendy words being over used and often misused. But I'll take it over what there was before that, which was no discussion at all at that level of abstraction most of the time. The places where these words are overused are places where the community is learning these concepts.

> Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.

In this case, calling someone entitled is actually a compliment, not an ad hominem insult or put-down, because it acknowledges the poster's humanity.

The proper response in this case would be, "Why, yes, I probably am a bit entitled, like most people. Thank you for recognizing that I am human."

Do you mean condescension? Not trying to be a smart ass, I just got a chuckle out of the word choice.

There you go, being condensing again.

You need to chill out!

Is this Reddit? What year is it?

>Has calling someone entitled ever been useful?

Not useful, and more over, it's ad hominem.

It's only ad hominem if it's used as support for an argument not directly related to the other's personality. It's not ad hominem in this case, because "you're entitled" is being used to support the claim, or as a parallel claim, that it's unreasonable to be insulted by a computer program thinking you're another computer program.

Claim: Turing was wrong about the Church-Turing Thesis because he was a homosexual -> ad hominem

Claim: Turing was immoral because he was a homosexual -> not ad hominem (although still not a good argument)

Saying "just don't use Google/Bing/search engines" is like saying "don't use the airlines".

The security checks suck big time, but really, these services are must-have and there is no better alternative.

Complaining and hoping for some change is all that's left.

Sorry, as much as I tried to like it, the results just don't match what I need...

Because they don't make any money, because only the sorts of shitty practises people often complain about are what enables making money providing free services.

If enough people were actually willing to pay to use a search engine you could have an awesome search engine with none of that.

They insert their own referrals for various sites, they do make money.

https://startpage.com - they offer anonymized google results & don't log IP.

DDG is excellent most of the time, but way behind when doing video or map searches. I have it as my default in most places, but sometimes I just have to fall back to Google.

Agreed. Additionally, DDG is useless when searching for current news events. Google is great at flowing in headlines for recent events with searches, which often leads me to include the "!g" bang if searching for a recent issue/store.

I don't disagree, but it did trump the point of the comment it replied to.

DDG will get the job done.

And DDG makes it super easy to search google with a bang:

"!g hacker news"

Yup. The bang feature makes DDG amazing. I use it as my default because it's a great way to quickly do tailored searches. It's very easy to say, search Hacker News (!hn), python documentation (!py3), R sites (!rseek) or just pull up a wikipedia page (!w).

You can use Disconnect Search that make your google searches anonymous: https://search.disconnect.me/

I wish that were true. I tried to switch to them after they did their 'next' a few months back but the results were most of the time not nearly as good as Google's for my queries and I had to switch back.

Part of the reason google is so good is that they track you.. if you disable traffic, they aren't quite as good.

Google's services are not free.

You trade your data, screen real estate, and attention for their service. This is worth a lot - Google is worth a lot. They didn't do it by giving out services for free.

Google's services are free. By trying to redefine what free actually means into 'has no cost whatsoever to anyone' you ruin the word.

If you trade a fish for a service it isn't free. If you trade a gold bar for a service it isn't free. If you trade a service for a service it isn't free.

If you trade data for a service it is not free.

To consider "free" a function only of fiat currency is naive, both of history and of economics.

Google search is not free.

If it is I have no idea how they made so much money...

Or maybe you can tell me what 'free actually means'?

By that broad definition, there is practically no free website on the web (analytics, logs, etc.). Actually, by posting this on HN, I just "traded data, screen real estate and attention". Would you argue HN isn't free as well? I get what you mean but I don't think this is how the word free is commonly used.

Besides, I believe the original point made still makes sense if "free" is assumed to mean "non paid".

A note that screen real estate and attention here pertains mostly to paid impressions - be they advertisements or politicizing messages. When it comes to content sought by the user, it's hard to say that the user is giving the service provider real estate and attention. It is only when the service provider is showing content not for the benefit of the consumer but for their own self that the attention and real estate can be thought of as 'rented' to them.

I would agree with the assertion that there are practically no free websites on the web. Since when did we convince ourselves we can get things for free?

There are major exceptions. Wikipedia is for the most part free. It does not advertise to you, nor does it siphon and sell your data. It does not track you around the web, it does not sell your Wikipedia viewing behavior to urchin for cents. It is driven by donations.

HN also appears legitimately free to me. As far as I know YCombinator does not mine or sell your data or collect data other that what is required for the forum to be a forum. YCombinator makes its money by other means. It certainly benefits by cultivating a technical online community, which is why I think it does it - though what influence YC can/does project on the community could be thought of as a social cost (I know very little to nil about whether or how much this is done).

Google, however, is not one of these cases. Nor is most of the web.

I'm not sure if the original point still makes sense with 'non paid' (nor am I sure 'non paid' is right). The original point uses 'free' (in caps) to emphasize a sense of charity they use to inform their 'entitled' argument. First, their argument is essentially 'What you expect this to be free? You are entitled!' Second, I'm not sure that replacing the term will work, unless it also communicates charity.

The point here is that the exchange does not constitute charity. Google thinks the trade is a very good deal. Presumably internet surfers do too. But there is an exchange and that needs to be recognized.

Anyway this means that any term that communicates 'charity' will be ignorant of the conditions of how Google's service works - and I would have posted the same misgivings.

Google Search is free to search. It's not free to advertise on. Searchers are not Google's customer.

In this sense the searcher, her data, her screen real estate and her attention are the product Google offers to advertisers.

These are the things the searcher trades for the service.

If a fisherman gives you a fish in exchange for writing your name and time of your visit down in his logbook, I consider the fish to be free for all intentions and purposes.

I would agree with that.

I would also posit that Google looks and does nothing like that fisherman.

I know what you are saying, but actually this service is free. Blocking bots effectively is in both the website owner and Google's interests, because bots disrupt the value propositions of both. And you could argue that it is in the website reader's interests too, by extension.

Given their scale and resources, Google are able to provide a far more effective bot detector than any of us could do on our own. I for one am delighted they are providing this very valuable service.

Not sure what blocking bots have to do with the freeness of the service. Perhaps you'd like to reply on one of the comments further down to get into why you believe the service is free?

You may argue that the trade is in the website reader's best interest. This is a different argument than whether it is free.

My real estate and attention are given to them because I came to their service asking to fill my screen according to my query.

I can agree Google is not providing a free pure-search-results service, but they do provide a free search results + ads service. Whether getting relevant results + [relevant] ads is a worth anything to you - even $0 - is a separate question, but it's a stretch to frame as an exchange. It's like taking a free hot dog and complaining it's not free because you traded your time & taste buds eating the bun while you only wanted the sausage... [I'd buy it more for e.g. youtube pre-video ads, where you are forced to give attention and time to the ad first.]

Now my data is a better point. Very valid for the totality of google services; quite weak for logged-out search use. If you work answering questions, and recording the questions that get asked and where they came from, then yes I did hand you this data but it's almost inherent in asking the question.

[Disclaimer: I'm a xoogler. And all this is nit-picking.]

free as in beer: Free in the sense of costing no money; gratis.[1]

[1] http://en.wiktionary.org/wiki/free_as_in_beer

I don't understand how this clarifies the term free? Free as in beer is used to specify the freeness of a product or service, rather than the freeness of 'freedom', say from authoritarianism.

This conversation is about the meaning of 'money' modulo this understanding of free-as-in-beer - i.e. whether non-fiat scarce resources (user data/screen real estate) count as money.

free has 20+ definitions.[1] A discussion about what we mean when we say Google's services are (or are not) free is using free the same way we use it in free as in beer [thus its definition is relevant].

Colloquially we usually use free to mean not having a financial cost. Another word or phrase is usually used when referring to non-monetary costs. i.e. I would say "Google is free" but I would never say "Google costs nothing."

[1] http://en.wiktionary.org/wiki/free

The top sentence is granted - not sure it was ever in question.

The bottom part you use personal anecdotes to support the claim that a broader 'we' do something. I'm not sure, as my personal experience differs. But it does get to exactly what I was saying in the above comment - what the discussion centers about is what counts as 'money' (as you say "referring to non-monetary costs").

I think the place we differ is whether non-fiat scarce resources count as money. I think they do. Historically they have. In economics literature and practice they do.

Or perhaps the reservation is that the scarce resources in this instance are 'soft' resources like attention, screen real estate and personal data? Much of what is traded by financial institutions (for example) today are very virtual - trades of risks, credits (promises), futures, bets. Even real estate is traded on the idea that it occupies space of human attention and investment - not necessarily because it can be used as a means to 'produce' something. I'm hesitant to draw firm lines between these soft assets - I'm not sure where I could sensibly draw them.

Either way, I'm glad we agree that Google costs something. I do think that the OP intended their use of free (in capitals and context) to mean "Google costs nothing."

Perhaps the downvoter would be kind enough to clarify why they think these comments do not contribute to the conversation.

I didn't downvote but you may want to read HN's Guidelines[1], particularly: Resist complaining about being downmodded. It never does any good, and it makes boring reading.

[1] https://news.ycombinator.com/newsguidelines.html

The challenge (no complaint here, though I do believe it was down(modded?) merely because of disagreement and not for relevance or quality) was meant to incite more on topic discussion.

It's interesting I've never read the guidelines before now. Was refreshing to have taken a look, although it's mostly common sense and etiquette.

They are if you block ads, scripts, and cookies.

You're generalising. This argument only makes sense if Google's entire ecosystem of services was just like any other random, independent selection of sites on the Internet.

There is no equivalent to Google. Nobody else is doing this, particularly not to this extent. Not using all that computing power and AI to do it.

Yes, if Google thinks I'm a robot, I don't think it's so strange to consider that some sort of value judgement, even if it's done by a legion of machines. Definitely more so than if some random small-time website decides to make that call based on a couple of if-then statements.

Imagine if using a web service is like visiting a shop, and you get directed to the slow-checkout+ID-check lane because maybe you stammered your order, or because you know the store that well, your shopping-cart route through the aisles is deemed "too fast" (read: efficient, also avoiding the "special offers", cookies/candy/soda/junk aisles).

Amusingly, how I feel about that "judgement", varies. Sometimes it's annoying sometimes it's cool because I feel "hey I'm doing something clever that humans usually don't". Similar to how being ID-checked in a liquor store can be both annoying and flattering (depending on your age and how often it happens).

You'd have a point except for the fact that recaptchas have become increasingly impossible to solve (for humans!). And recaptchas aren't just on google sites, they're everywhere.

Which is what this is trying to help solve. They know they're getting harder, so they're trying to identify you before even hitting the captcha part so that you don't have to do it.

No one should complain about anything, ever.

Actually it would be great is someone has some ideas for ways to identify humans that don't require stuff like javascript. From the perspective of a service provider (and I'm one) the bots are a scourge, they consume resources and they are clearly attempting to 'mine' the search engine for something, but they are unwilling to come forward and just ask the search provider if they would sell it to them. And since they are unwilling to pay, but willing to invest resources in appearing more human like, it leaves service providers in a pretty crappy position.

So anyone have some CSS or otherwise innocuous ways of identifying humans I'm all for it.

On a small scale, it's not too difficult. Detecting form POSTs with a new session catches most comment spam bots, and if an empty input field hidden with CSS is submitted with content, that's also a giveaway.

And I wouldn't discount javascript - another hidden field populated by onSubmit() is simple and effective. A few vocal paranoiacs advocate browsing with javascript turned off, but they are few and far between - and I bet they get sick of adding sites they want to see to their whitlist. We have over three thousand fairly technically aware users, and none have been tripped up by the javascript test.

If your site is valuable enough for an attacker to manually figure out your defences, then you need to consider emailing a a verification token - or even better, use SMS if you can afford the cost. Because this gives you a number to pass to law-enforcement, it means an attacker has to buy a burner SIM card.

Back on topic, Google's initiative is a useful tool to add to your defences.

Isn't this just the cost of having a "free" product? Bots are not really a problem. Its just that their traffic cannot be monetized. If you could monetize-bot traffic your problem would be solved. Or put another way, if you framed the issue as a business model one, not a technical one, it might be a useful exercise.

   > if you framed the issue as a business model one, not 
   > a technical one, it might be a useful exercise.
That was kind of my point. Clearly most of the bots are trying to scrape my search engine for some specific data. I would (generally) be happy to just sell them that data rather than have them waste time trying to scrape us (that is the business model, which goes something like "Hey we have a copy of the big chunk of the web on our servers, what do you want to know?" but none of the bot writers seem willing to got there. They don't even send an email to ask us "Hey, could we get a list of every site you've crawled that uses the following Wordpress theme?" No instead they send query after query for "/theme/xxx" p=1, p=2, ... p=300.

On a good day I just ban their IP for a while, when I'm feeling annoyed I send them results back that are bogus. But the weird thing is you can't even start a conversation with these folks, and I suppose that would be like looters saying "Well ok how about you help load this on a truck for me for 10 cents on the dollar and then your store won't be damaged." or something.

You may try to contact scrapers through access denied page.

Did you try to explicitly state that your data is available for sale when denying access to p=300?

If you wanted to buy data from Google, how would you email? What is Google's email address?

Google posts lots of contact information on their contact page. You would probably want to reach business development. I don't think they are willing to sell access to that index however, we (at Blekko) would. I suppose you could also try to pull it out of common crawl.

It need not to be commercial service. For example, Wikipedia is a donation-only service. A bot visit is generally not different then most user visiting (I'd assume most users don't donate anyway). Wikipedia doesn't really mind serving users that aren't donating, but the bot, while generally not different to normal user, are stealing resources away from actual users.

That's why Google needs proper API or Pro edition where you could execute proper SQL queries, etc.

Instead, Google is making their search less functional. I don't get why.

They should, but the proper response would be a solution that solves what others can't not complaining about someone not solving something you decided yourself to try out.

Including about other people's complaining.

It was sarcasm.

see no evil. here no evil. speak no evil.

Don't be evil

TIL 'free' is an excuse for unethical behaviour.

(Disclaimer: I work at Google, but not on ReCaptcha.)

The point of this change is to make things easier on 90% of humans -- the ones who have JavaScript and third-party cookies enabled now get to tick a checkbox and be on their merry way, instead of doing a useless captcha when we knew they were already humans. Recall that when ReCaptcha initially came out, the argument was "humans are wasting all of this time, let's turn it into useful work to digitize books".

If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans, not because Google thinks that the machine learning spam detection approach is fail-proof.

Recall that "reading" captchas already pose an insurmountable barrier to users with conditions such as illiteracy, low vision, no vision, and dyslexia. To accommodate these users, audio captchas are also provided, but a 2011 paper suggests that audio captchas are either easy to defeat programmatically or are difficult for users themselves to understand: https://cdn.elie.net/publications/decaptcha-breaking-75-perc...

I am visually impaired and can attest to both visual captchas being a pain and audio captchas being hard to understand. this change is nothing but an improvement as far as accessibility and usability goes. This is only a plus for people who implement these, as I have actually left sites that had insurmountable captchas for me.

Thank you.

Check out webvisum.com - from their website:

"WebVisum is a unique browser add on which greatly enhances web accessibility and empowers the blind and visually impaired community by putting the control in your hands!"

"Automated and instant CAPTCHA image solving, sign up to web sites and make forum posts and blog comments without asking for help!"

I was curious about the CAPTCHA solving, too, so I tested WebVivum out on ~8 reCAPTCHAs.[1] It solved all except 2 of them, taking 20-60 seconds each time. In 2 cases it reported failing to solve the CAPTCHA, but it never gave an incorrect result. That is, whenever it gave a solution the solution was correct (in my brief test).

So, while it's some way off their claim of "instant" CAPTCHA solving, this is definitely a very useful addon, especially for those people who cannot solve CAPTCHAs at all. Thank you for pointing it out.


> Automated and instant CAPTCHA image solving

How do they do that? This sounds like whitehat use of blackhat tools. Are they using captcha-solving farms?

There are ways to solve captchas somewhat reliably programatically. I suspect this plugin only works on certain computer generated captchas, not the street sign ones.


They send the captcha to their servers and how they solve them is a secret.


Is there a web service where one could purchase AI recognition of fuzzy text, e.g. a street sign or book cover in a photo?

Very helpful, thank you! I have a difficult OCR problem to solve, rather than identity. Interesting to see that the market price for "being human" is $0.00139.

For non-captcha OCR also consider Mechanical Turk. And there are a variety of services built on Turk too.

The fact that this works shows that distorted-text captchas are no longer effective.

From the Google's blog post:

> our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy

If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans

But, wait. Isn't that what we want? It seems like bots and spammers have a relatively small cost to a company like google, while digitizing books and house numbers is relatively valuable. I don't have numbers for a detailed cost-benefit analysis, but if bots get good enough to do time consuming work accurately, that's a win right?

That's like flying because you like airline food. No one flies if they don't have a destination. No one will put a captcha on their site if it doesn't tell computers and humans apart; that's its primary job.

From you description, you do sound kinda' like a bot. Disabled cookies. Disabled Javascript. Irregular searches. I understand the frustration with saying, "You have to have these features supported to use the product," but let's face it: providing an experience to people who deliberately disable huge chunks of browser functionality is a tremendous pain in the ass. I think I can understand both sides of the argument using different strawmen:

"Can I read this paper, please?"

"Yes, of course, just put on these reading glasses."

"Why do I have to put on the reading glasses?"

"Well the font is quite small. If you don't wear the glasses, you probably won't be able to make out anything on the page. Your experience will be seriously degraded."

"I don't want to wear the glasses. Why can't I just read the page?"

"Well, we can fit a lot more data and make the page more robust by printing the text smaller. Why don't you just wear the glasses?"

"I have concerns about the glasses. I'd rather strain my eyes."

"We're not going to make a special page for you when 99% of the people are totally okay with wearing the glasses or wear the glasses anyways."

"I have JS and cookies disabled"

So imagine what bots often don't have.

Adding JS interaction and cookies takes more effort on the part of the programmer writing a bot.

So yeah, you'd look a lot more like a robot. How else would you quickly differentiate between human vs non-human based on a single request, or even a collection of requests over time? It's a game of stats at scale.

Here's a snippet of Python using the splinter library, to visit Google, type in a search query, and click 'search' (which is very Javascript heavy these days with their annoying 'instant' search).

from splinter import Browser b = Browser() b.visit('http://google.com') b.fill('q', 'browser automation') btn = b.find_by_name('btnG') btn.click()

Not exactly 'more effort'...

With Selenium you can open a full web browser such as Chrome or Firefox and have it run through. A Google search is six lines:

require "selenium-webdriver" driver = Selenium::WebDriver.for :firefox driver.navigate.to "http://google.com" element = driver.find_element(:name, 'q') element.send_keys "Hello WebDriver!" element.submit


Writing a bot with js and cookies is trivial, but it definitely won't defeat these tools. They probably look for time between actions or track mouse movements, stuff that makes the bots super inefficient.

Yeah, but if you are trying to automate thousands of simultaneous requests, you'll have to use a lot of servers, which is costly even in the cloud.

Right now google and bing will run sites with JS enabled to see the DOM after any JS changes take hold. Usually these crawls aren't nearly as often as the general crawling, because there is quite a lot more CPU/Memory overhead to such utilities. I can't speak for splinter, but similar tools in node or phantomjs have a lot over overhead to them.

Still less effort than typical captcha.

That wasn't the point

It's more effort as less libraries support it, you need to execute unknown code on your computer, etc.

I think you're being a little hyperbolic. Google is classifying what is already normal human behavior. Having JavaScript disabled is definitely not "normal" human behavior. Of the stats I found only 1-2% of users don't get JS and the UK's Government Digital Service found[1] that only 0.2% of users disabled or can't support JS.

I don't think regular CAPTCHAs are going away anytime soon since any bot detection system is bound to have false positives.

[1] https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missi...

Exactly. It's perfectly reasonable to present users who disable JS with a one-time CAPTCHA they have to solve to use the site. Many sites just (usually unintentionally) prevent users with Javascript disabled from navigating a site at all, so this is a huge step up from that.

The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

...But this is their core search competency and exactly what makes their search so powerful. Page rank is basically distributed wisdom of crowds, aka algorithm of how people behave (build their websites) based on a search term/imbedded link.

This seems like a perfect extension of this. Remember the vision of google: "to organize the world's information and make it universally accessible and useful." Human behavior falls squarely into a large segment of the "world's information."

>Remember the vision of google: "to organize the world's information and make it universally accessible and useful."

I'm sure that's why they got rid of the ability to search what people are saying on forum and blogs. Google still indexes everything, they just got rid of the filter.

Their results now give preference to SEO'd pages & adverts.

The old discussion filter returns an illegal request error https://www.google.com/?tbm=dsc

Their search is only "powerful" for finding the more mundane and widely disseminated information; I've noticed that it's increasingly difficult to find very specific information with it as it basically misunderstands the query and returns completely useless results. Maybe that's why I look like a bot, as I try to tell it exactly what I want...

Well this is exactly the point. Obscure information has a very low social/viral index and as a result a lot of people don't interact with it so it is hard to find with Google - which is why I don't think it is a particularly robust search engine on it's own in the grand scale of knowledge development.

Google seems robust because humans generally think pretty similarly, and generally look for the things that the people around them are talking about or also looking for. That breaks down considerably though across cultures and time.

When trying to use Google to find something obscure, I'm not so much bothered by the difficulty of doing so as I am by the implication that "real humans" don't use complex search queries. They used to teach in computer literacy courses how to use search engines, complete with complex multi-term boolean queries, to find exactly what you're looking for. Now try the same with Google and you're a bot? WTF? They're basically saying "humans are too stupid to do that - humans are supposed to be stupid."

Or that a lot of their target demographic has never been taught that, and so they've optimised their delivery to be accessible to the majority?

Well to be fair most of their users probably are too "stupid" (aka were never taught) to do that.

Their search IS powerful, even for obscure things. But when you disable JS and cookies, as you have done, you are taking a huge amount of that power away from the system. Of course you are going to get bad results for anything which is specific to you -- you have disabled their ability to make a better judgement!

> "I have JS and cookies disabled..."

Disabling essential parts of web functionality breaks web functionality. I'm shocked.

Dropping the snark though. I'm surprised that this is still a complaint. At this point in the web's evolution cookies and Javascript are essential. Disabling those will make your experience worse and complaining about that is like removing the windshield from a car and complaining that bugs get on your face.

Tracking cookies are certainly not essential.

Yeah, tracking cookies might not be. But cookies in general? They're essential for a large amount of sites to handle something as simple as logins.

I would suggest you're over-thinking it. The essence of it is "we think you're a bot because you haven't given us enough private information about yourself".

Exploiting that information is Google's core business, and it doesn't like people evading their panopticon. So they're no making life harder those who care about their privacy.

Not surrendering your data to Google? We'll treat you like you're not even human, and through reCaptcha we'll tell thousands of other websites to do the same. That will teach you to hide things from the all seeing eye of Mountain View.

Why shouldn't us bots be able to search or participate in forums?

As long as you abide by all the social norms including moving that damn mouse the right way, I have have no problems with you, dear bot.

We'll legislate inefficiency. If you cant be as slow as a human, then you will be restricted.

Bots are equal, but separate.

Anecdotally, I block cookies and tracking scripts from Google and even run some custom javascript to remove the link shim in search results. I have yet to encounter the "we think you're a bot" detection filter, except when performing numerous complex iterative queries or Googling from within Tor.

The above is to suggest that perhaps tracking bugs and cookies aren't a component in the bot-detection algorithm, though that remains to be seen.

Well, looking at an example, my behavior definitely trips the 'not not a bot' detection for NoCAPTCHAs for whatever reason. I'm not too shook up though - it's really no more inconvenient than before.

Can you suggest a way to tell the difference between you and a bot? Merely throwing flags around without offering anything better isn't very helpful.

There isn't a way. As AI improves bots become increasingly indistinguishable from humans. All this does is rely on the fact that bots tend to use different browsers and behave in different ways than humans. But that can be fixed.

But it doesn't matter. If a human user spams tons of links in the comments after creating 20+ accounts, who cares if they are a bot or are doing it manually? I believe that websites should instead use machine learning like this to detect the bad behavior itself, rather than try to determine who the user actually is.

"bot" means no profit from ads. There you have it.

We were actually just discussing the "what if I trip their filter" concern at our morning meeting. Full disclosure: my company (as the username implies) builds FunCaptcha, a CAPTCHA alternative. Your concern, to us, is a very valid one and has been a driving force behind our own design and mentality. Our lead designer is (understandably) passionate about this so he actually wrote a few words on the blog that dives pretty deeply into the topic, if you're inclined: https://www.funcaptcha.co/2014/12/04/killing-the-captcha-wit....

I've also tripped Google's bot filters. Frankly, I'm more offended that Google is discriminating against robots, seeing as they are one of the leading companies in automation and AI :-)

Though you were joking, it's worth noting they're certainly not discriminating against robots. They're discriminating against your robots.

Which is to say: they're perfectly willing to let your crawl-able content and internet use help train their robots, they just don't want their crawl-able content and internet use to train your robots.

Aren't we talking about spambots, mostly? While law-abiding bots should probably be allowed in most sites, nobody wants a spambot in their blog or forum.

Isn't it right to block spambots? And if so, how do you tell regular bots from spambots?

Use re-captcha to prevent the spam-bots from posting... the real bots will just crawl anyway.

A couple months ago, I implemented some regular expressions to try and block a lot of bad actors, and have that include smaller search engines... our analytics traffic dropped around 5% the next week... our actual load on the servers dropped almost 40% though. Unfortunately it was decided the 5% hit wasn't worth reducing the load 40%.

Which sucks, moving forward a lot of output caching will be used more heavily with JS enhancements for logged in users on top of the nearly identical output rendering.

Server-side React with some useragent sniffing will break out three rendering server side. "xs" for those devices that are "mobile" (phones), "sm" for other tablet/mobile devices ("android", "ios", etc), and otherwise "md" ... "lg" will only bump up on the client-side from "md". It corresponds to the bootstrap size breaks.

In essence, I don't care. Bots get the same as everyone else.. if you don't have JS, you can't login or fill out forms. Recaptcha should go a step farther in helping deal with bots...

^ Probably the most underlying comment in this topic.

Is there an anti-trust angle to this?

In terms of advances in automation and AI I welcome this development, because this is new offensive in bots vs advertisers/scrapers. Bots will of course adapt, it is only question of time, and adaptation is advancement in automation and understanding of human behavior.

Oh, I thought the parent comment was going to mention how Google might be using this to additionally train an AI to learn what human behavior is like, just like they did with the 411 service to collect voice data some years back.

I'm sorry the web cookie hegemony is oppressing you. Come up with a better solution to filter bots and Google will hire you. Nobody is pushing this down your throat.

You sound like a robot.

A robot permanently stuck on the "outraged" setting.

Replicants are like any other machine. They're either a benefit or a hazard. If they're a benefit, it's not my problem.

I used to use selenium to crawl Google webmaster tools data. Despite randomizations and such, they still had me marked as a bot.

As google gets stronger in AI, this becomes less of a problem, no?

The slippery slope has gotten a bit steeper.

They've been doing this for a while. With recaptcha, you got easier captchas (a single street number instead of two words) if the system thought you were human. There was an official post about this a year ago. [1] Now they have probably improved the system enough to be confident in not showing captcha at all. It's nothing revolutionary. If it thinks you're a robot, you still get the old captcha. [2] [1] http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-j... [2] http://i.imgur.com/pCKS8p5.png

I've noticed that I get the house numbers when I leave 3rd party cookies enabled, and a much harder, often impossible captcha when they're disabled. Since leaving them off doesn't break much else, I do, and just fill out the harder captchas when I come across one. By the way, you only have to fill out the "known" word, the one that's all twisted and usually impossible to read until you refresh the image 10 times. Even completely omitting the 2nd word, which is the unknown word that OCR couldn't figure out, it will still validate.

Am I being paranoid when I think that offering this "free" service is a great way to track people over even more sites and usually the most important conversion pages that don't have the usual google display ads. I don't see a big technological innovation here as it appears that mostly they are checking your cookies to see if they recognize you.

So they can track you better and provided better targeting ads. That's where they get the money from. That's how you pay for visiting websites.

Every "free" service from Google has the end goal of serving you personalized ads. That's their business.

Certainly correct. I guess it is new for me that I'm requiring my users to give their data so they can be served personalized ads.

And there are a number of sites using Analytics, Adwords/Adsense, DFP and a number of other points of connection. They've already offered/bought recaptcha, all this does is make it easier for most people (who have cookies and JS enabled).

I think it also depends on the website? Some offer easy captchas all the time, some not.

One example: I have never seen a "hard" captcha here https://webchat.freenode.net/

It says that they do take mouse movement into account but the cookie part makes me feel a little uneasy:

> IP addresses and cookies provide evidence that the user is the same friendly human Google remembers from elsewhere on the Web.

If this becomes a trend then major commercial websites will become unusable for people who are not accepting (third-party) cookies. "Because those damn bots" is a straw man argument to make people trackable by assuming that there are no other useability improving methods that don't track the user (which I think is highly unlikely).

It's only used to generate a confidence level.

"In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues"

So if you have cookies disabled, you'll probably just get a regular captcha

that is the bait part of the bait&switch strategy.

Just ask anyone older enough to have worked with Microsoft et all in the past.

yeah, the company is nice now, but nobody can say anything about tomorrow. So do you their sane offerings, but be aware that you may have to be on the line to change it at a moments notice. and try to not depend on it too much. (i.e. always have a 1% bucket with an alternative solution, least you find yourself locked in when you 'thought' you had an alternative if you 'needed')

Sorry, can you expand on their long game here? They've just made it easier, is that the bait? Captcha was already everywhere so it doesn't seem they needed bait to popularise it? It's also used by non Google companies who will presumably stop including it if Google decide to make it an impassable lock - which would I guess be the switch? Making it harder would just be back to before, so an actual effective switch would just be a lock? Which doesn't make sense for the obvious reason that Google make more money in their services are accessible. What are you getting at here?

if they drive all other are you a human solution out of market (they already own captcha which most sites can't exist with without being drowned in spam) and then start to charge/show intrusive ads on it, then you have no option other than accept it.

The regular CAPTCHA is already approaching the point of being unsolvable for mere humans. If the only people who have to use it are cutting into Google's profit margins by blocking tracking, that gives them even more incentive to make life miserable for those users.

okay, that's reasonable.

But then a bot need only copy a human's mouse movements and disable cookies and we're back to the status quo.

Sure, for the bot. Captchas are pretty effective against bots, what's wrong with presenting the status quo to a bot? This is meant to improve things for the real people. I will appreciate not having to decipher some strange text.

I'm just not convinced from what's been shown that they'll have a long-term ability to distinguish between human mouse movements and those of a bot. So I'm curious as to really how one can keep this in place before it's overrun by spam and you need to make it more difficult anyway. More power to them if it works, but they're pretty light on details that inspire confidence, IMO.

i'm sure no one at google thought about that

There's no call for sarcasm. Do you have more insight into what data could allow them to distinguish between bots and people? If you do please share, because the Wired article and Google's own marketing video don't provide much information - it's obviously more focused on marketing than the technology.

why would google voluntarily give up that information?

That's not reasonable, its yet another way google is monitoring everything I do online.

well yes and no. In a way it is a win-win situation: It is reasonable in that it doesn't aggravate the situation for those who block cookies. (yet) And those that allow cookies get at least some convenience in return for that.

However, looking from a different perspective you can say that they're taking advantage of people blind with greed who want maximum convenience when using the web.

What makes you think they weren't already monitoring?

I'm using a tablet device, thus no mouse, and it gives me a captcha whenever I'm not logged in..

Really is this the breakthrough of bot detection? They are just leveraging cookies --which is nice improv UX-- but why do I need to click? delay loading to relax servers?

Well, it's probably that most sites you get a captcha on, give one to everyone... this will only reduce that for real users.

On the flip side, there are other events to hook into... onfocus/onblur, keydown, etc, etc... which can all go into bot detection... if you fill out a form and didn't focus on anything, click on anything, or press any keys.. you're probably a bot... If you have JS disabled, you deserve what you get.

For now I'm you'll just get a captcha if you don't match the standard behaviour. But it's that little bit easier to just use Chrome and stay signed in to Google all the time now. That's good enough for now surely.

It also makes your cookies valuable to spammers.

It's definitely not only relying on cursor movements. A simple $('iframe').contents().find('.recaptcha-checkbox-checkmark').click() proved that I'm not a robot, without me touching the mouse.

The cursor movement story seems to be a smokescreen to dilute the fact they're actually running an internet wide monitoring network, tracking users from site to site and building profiles on them.

I tested it a few times in their demo trying hard to to funny things with the mouse pointer. Still I was asked to type in some text all the time.

Then I logged into my gmail account and yes it worked.

So you're probably right about that smoke screen and it has nothing to do at all with mouse movement.

My fx browser deletes cookies at exit and my IP changes frequently and I think that's the true explanation for the outcome of my little test.

I was confused about all of this until I read your comment. Makes perfect sense now!

Also makes me think that they can dump the whole charade and provide no security check at all, but that'd probably make the service-providers uncomfortable and they lose the user as a source of human-intelligence for classifying things on google image searches.

I'm not sure where cursor movement was mentioned, other than the comments.

I read it in the wired.com article about this also submitted to HN: "And [Vinay] Shet says even the tiny movements a user’s mouse makes as it hovers and approaches a checkbox can help reveal an automated bot."

Sounds like if I 'Tab' to the field & hit enter I must be a bot....

I guess it's a lot more complicated than it looks at a cursory glance.

It's almost as if Google is playing its cards close to its chest to avoid tipping off people intent on defeating it's captchas!

Which is why Google Analytics is free for site operators.

To be fair, any service or site - Facebook, Twitter, Google Analytics, etc - that encourages your putting their code or widgets or whatever on your site is doing exactly that in addition to listing out which of your friends "like this".

Makes perfect sense. This is Google's response to the Facebook 'Like' button [tracker]

Then what is Google Analytics?

can be blocked without any impact on the browsing experience.

I tried this demo (https://www.google.com/recaptcha/api2/demo) in a new incognito window. Then in devtools:

I got an extra verification (enter two bits of text)

Probably the incognito window. It seems to rely heavily on cookies (likely linked to their analytics data.)

At best this would be just trigger another arms race with people developing bots to trigger a series of "human-like" mouse events, which I'd imagine is a far-simpler problem than OCR.

Not to mention that it could only be used as a heuristic and not a test; so, eventually the weight of that heuristic will just be reduced to zero once someone publishes humanlike_mouse_driver.js with carefully-tuned-to-look-statistically-human mouse interactions available out of the box.

You know a peculiar thing that humans do? They do things slowly. You know what bot (programmers) hate doing? Doing things slowly. When you've got bots that are diving through thousands of registration forms per minute, and suddenly you need to slow them down to 15 seconds per form, well, that's already an enormous win for the site owners, even if it makes the new captcha fail (if the quoted statistic of 99.8% of captchas being solvable by bots is correct).

Are you logged into Google? If you have their cookies it's one indication you're not a robot.

I wonder if you used a Google account in good standing for a bot, at what point would they start to detect it was a bot? I imagine if you used it for more than a few CAPTCHAs daily you'd probably end up having to do additional validation.

Which is a bad idea.

I actually have some bots which scrape Google sites (for the purpose of integrating stuff like Google Keep into KRunner), and they just use the Useragent of a regular phone, send normal POST data, etc. Works perfectly fine, and — I just checked — this bot is recognized as normal user by this captcha system. No Captcha input.

I tried it even with a new Google profile and just using cURL to log into Google, then started a new browser session and imported the cookies from cURL. Worked just as well.

I guess this makes it easier for malicious bot-authors...

Interestingly I logged out of my Google account in Chrome and immediately got an old style CAPTCHA. But not when I logged out in Safari (I very rarely use Safari).

Edit- Nevermind, it looks like Safari left some google.com cookie lying around while Chrome deleted it. Deleting it gave me the old CAPTCHA.

I strongly believe it is the bot detection .. They may also added extra checks to already logged users

Wouldn't it be possible to just tune bots easier now instead of having to work with OCR?

Who wants to place bets on the "potential robots" more likely to be those without Google cookies or a Google Account?

I can't think of many better uses for the menagerie of different tracking methods that they have planted on me, to be honest.

Well if you ever run for political office it's going to save them a boatload on campaign contributions.

Eh, you never know. Attitudes toward certain improprieties have moderated in recent years...

So what? Other once acceptable behavior is no longer tolerated.

There is always a center, just because it moves doesn't mean that people can't be a socially unacceptable distance from it.

I suspect you have mistakenly stumbled into a lighthearted thread with an overly serious mindset.

I've learned that I'm probably a robot, and it's probably because someone on my subnet hits the rate limits on Google APIs sometimes.

There was one day when it was so convinced, it was giving me impossible captchas just to use Google Search.

It's a poor betting opportunity as it's pretty much obvious it's the case. It also makes perfect sense.

Certainly seems to be true in my case.

Seems to be true for me too. When I reject cookies, I get more difficult recaptchas.

I recently added recaptcha to a site and got this version.

From an implementation standpoint it is utterly painless. The client side is copy/paste from Google's site and the PHP/server side was this:

      $recapchaURL = 'https://www.google.com/recaptcha/api/siteverify?secret=600SZZ0ZZZZZIZi-ZZ0ZEHZW1000Z_0ZZZ00QZZ&response=' . request_var('g-recaptcha-response','') .'&remoteip=' . $request->server('REMOTE_ADDR');
      $recapchaRespone = file_get_contents($recapchaURL);
            print("Recaptcha failed. <more error msg>"); return; 
      $recapchaResponeJSON = json_decode($recapchaRespone);
      if(!( !is_null($recapchaResponeJSON->{'success'}) && $recapchaResponeJSON->{'success'} == 'true'))
            print("Recaptcha failed. <more error msg>"); return;
Most of the time it just gives you that one checkbox, but if you use the form multiple times (e.g. testing) it starts to give you the classical text entry box. I have no idea how it works fully and this article only sheds little light on it.

Is that your actual secret? you might not want to reveal that/you should get a new one.

It is not. I kept the length and style the same to give a better example, but replaced most of the characters with 0s and Zs.

So now spammers will use botnets… Oh wait, they already do.

They already have the botnets. Now they need to use those end-user machines as proxies, using the credentials already on the machine. They just need to figure out the other parameters: maybe it's running js code ? Then you can use a browser engine/selenium). Maybe it's the click pattern ? Just generate the json data and send it. They can even apply the same machine learning techniques to figure out the best way to circumvent the captchas.

And the escalation continues.

Yeah, I feel like it won't be too long before spammers start finding ways to emulate users without having to solve any CAPTCHAs. Google is likely going to need to switch their 98%/2% to something more like 80%/20% (that is, 20% of users will still need to enter CAPTCHAs).

I am using a small tool that I wrote to integrate Google Keep and other Google stuff with KRunner and so on, and this tool (essentially being a dumb bot) also passes all the Captchas.

I’d say malicious authors would have it really easy now.

Computer Vision guy here. Okay so you've made some improvements for normal users.

The captchas are still the old same, just not shown everytime. Still can be cracked with latest neural net techniques. The visual matching stuff can be guessed 6/10 times.

You still have audio captchas, that can be cracked.

If all fails you still have cheap labour from third world country. I don't see why this is revolutionary?

Google will now have their captchas present on every site and start logging user behavior in the name of identifying bots. Who says they won't use the data to drive their ad empire?

Even if the success rate of detecting robots stays the same, I would say this is still a win because the majority of humans won't have to mess with it any longer.

Even better: those with various disabilities won't have to mess with it. My parents' only disability that I know of is near-complete computer illiteracy and I can tell you from experience that every time they're presented with a normal CAPTCHA it's like somebody just handed them a Rubik's Cube and told them to solve it before they can create a profile. In every case I know of, they just turn the computer off and walk away. Now, these are what I would call normal humans (don't tell them I ever said that) so I can only imagine how aggravated those with visual and/or auditory problems get when presented with a crazy CAPTCHA. And when your revenue comes from getting people to submit these, I can see it still being a boon to the website, even if all they did was lower the barrier of entry for humans.

My guess is it's based on the tracking data they already collect on most people. I try to avoid it, so I get stuff like this:


I have no idea what that second word is supposed to be, so if you use this, I probably won't use your site.

It isn't user friendly, but often with that type it doesn't care what you enter for the tough to read world. The easy to read word is your test. The other is a way to harness people to do difficult OCR tasks. Some web sites have had fun organizing entering dirty words for the tough to read ones regularly to mess with the results :)

It's actually the other way round for the Captcha posted and most recaptchas I've seen. The easy to read word is the OCR and the hard one is the real captcha.

This is fascinating. So they're harnessing the collective OCR powers of the internet surfing public? Diabolically clever.

"unklist", although I have not idea what it means.

When a Captcha contains two words, usually there is one "real" word, and one "fake" word.

Here's the JavaScript behind it: https://www.gstatic.com/recaptcha/api2/r20141202135649/recap...

It's hard to see what's sent over the wire (it's obfuscated), but the source gives you a good idea of what they're collecting. The biggie is the GA cookie which is running on over 10 million sites. Like any CAPTCHA, this is still breakable -- just load your actual cookies into Selenium or PhantomJS and replay your mouse movements. Of course, once you do that more than a couple times, you'll likely have to write a crawlers to generate fresh cookies. At that point, you may as well just break the visual CAPTCHA which is trivial anyway. Ie. You should still never use a CAPTCHA (http://www.onlineaspect.com/2010/07/02/why-you-should-never-...).

Captchas can also be useful as a differentiator between free/paid plans, or to slow down users (see 4chan)

In the long run, I think it's unavoidable that AI-type systems continue to improve, while humans don't, so this will become a harder and harder problem.

One helpful approach would be to separate out "why CAPTCHA" into preventing abuse (through high volumes) and "guaranteed one (or small number) per person" from "am I interacting directly with a live human", and using different things for each.

The naive solution to a lot of this is identity -- if FB profiles are "expensive" to create, especially old ones with lots of social proof, you can use something like FB connect. However, there are a lot of downsides to this (chief being centralization/commercial control by one entity, which might be a direct competitor; secondarily, loss of anonymity overall.)

One interesting approach might be some kind of bond -- ideally with a ZK proof of ownership/control, and where the bond amount is at risk in the case of abuse, but it's not linked to identity.

The tiniest mouse movements I make while tabbing to the checkbox and hitting my spacebar to check it? Or tap it on my touch screen? And why wouldn't this be vulnerable to replaying a real user's input--collected on, say, a "free" pornographic website? Their answer seems to be "security through obscurity".

"Security through obscurity" is a weak concept, however the goal here is not security but fraud detection.

Obscurity is a legitimate component of a fraud detection system, for the same reason that hiding your cards is an important part (but only a part!) of being a good poker player.

It's always security through obscurity.

f(obscurity, time, analysis) = clarity

The details of implementation are left to the reader as an exercise

I wish they would explain more about how the user interacts with the whole reCAPTCHA leads them to know it's a person and not a robot, but maybe they're worried about people writing bots to get around their protections.

> However, CAPTCHAs aren't going away just yet. In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of security checkpoints to confirm the user is valid.

Probably using a combination of G+ and GA to check your 'history' to see the activity is like a normal human. Visits a couple news sites each day, checks their gmail, searches for random crap randomly, GA registered a 'conversion' for some company = probably a human

I was thinking they may be looking at how long it takes for a user to click the "I'm not a robot" link. A robot would probably load the page and quickly, without delay, send the HTTP POST but I have to imagine they thought of this already and bots writers would quickly add a sleep() call in there at some point... Yea, I wonder about their internal logic too.

Or more likely they realize that explaining that your robot quotient is based on a statistical analysis of the last 6 hours of your browser traffic would probably freak everybody the fuck out.

They're almost certainly using the adwords cookies that get hit from 90% of the sites out there to figure out if you're a bot or not.

True enough, although it doesn't stop Google Now from coming up with helpful suggestions like "Hey, we noticed you were looking at this movie; would you like to see it on Amazon?"

Giving that away will give the bots a head start to figure out how to overcome this new roadblock.

In regards to the video, I feel like it has become a cliche to have upbeat, light ukelele music in the background for product demonstration videos. I instantly felt myself become annoyed when the music started.

And some unison whistling at the end to complete the cliche. Weird they would even have a video for something like this, leading me to be more suspicious of the mechanisms and data they are using behind the scenes to make this work.

God, I never put my finger on it until you mentioned it. So true.

What a misleading headline. Google will now look at your mouse movement but really be scanning to see if they've been tracking you across the web. Anybody who is is concerned enough about privacy to block/clean cookies will be assumed to be a non-human.

And then you'll do a captcha like they would have you done previously. So either you're status quo, or you get an improved experience if you're in the vast majority that don't do anything other than non-default. What's wrong with that?

It's interesting that the adversarial nature of internet security is "breeding" an adversarial AI. Inevitably, people will start working on AI to beat this new captcha. I think in terms of parallels to biological evolution, security/fraud AI has the greatest evolutionary force behind it. Fun and scary to think where this particular breed of AI will lead.

I always assumed Google's use of reCAPTCHA was to augment the OCR used to digitize Google Books, particularly in results the software couldn't confidently match to a word. Is this true? It's interesting that it's still the fallback for the new method.

That was the original goal of the project.


"By presenting two words it both protects websites from bots attempting to access restricted areas[2] and helps digitize the text of books."

For some time, you could pass a reCAPTCHA test by just entering the more distorted word correctly.

This should be the top thread. I find the whole topic of crowdsourcing to compensate for the inadequacies of computer vision (and other inadequacies) fascinating. OCR was the first problem. We've been helping Google Maps identify house addresses for a while now with reCaptcha, and with this announcement it looks like Google is finally tackling the problem of image association. Computers suck at determining which pictures contain birds. By making users tag all of the images on the web, they're making image search much more powerful and will hopefully improve the entire field of computer vision.

When I tell my future robot to go get my coffee mug, I don't want it coming back with the PS5 controller.

I only ever enter the distorted one, works every time.

That was the original idea behind reCAPTCHA (which originated outside of Google, acquired in 2009), but my understanding is that they long ago ran out of actual text that needed human OCR'ing, and/or found other reasons that approach no longer was helpful.

The "help OCR while also spam protecting" thing isn't currently mentioned on Google's recaptcha product page.

It is:

> Creation of Value

> Stop a bot. Save a book.

> reCAPTCHA digitizes books by turning words that cannot be read by computers into CAPTCHAs for people to solve. Word by word, a book is digitized and preserved online for people to find and read.


Good catch.

I wonder where i heard/got the impression that it wasn't really being used for this much anymore. Maybe from when most of the recaptchas most of us saw switched from scanned books to google street view photo crops. And I was also surprised by the implication that google's algorithms really needed human help for visual recognition of almost exclusively strings of 0-9. I would have thought that would be a pretty well solved problem.

Anyway, somehow I got the idea that recaptcha wasn't actually providing much OCR help anymore, but maybe I just made that up.

For the past few years the recaptchas I've seen were illegible text next to easy to read text. I think its obvious that they've run out of the low hanging fruit and now just have the worst of the worst as placeholders. The move to house numbers just proves that they're kinda running out of badly OCR'd text.

This move isn't too surprising. OCR based captchas have always been a hack and the "best" captchas are like having the best collection of duct tape and WD40. At a certain point you need to stop doing half-assed repairs and remodel.

they also used it to decode street number addresses, for street view

Those computer-vision challenges mentioned in the blog post aren't 100% clear. For the first one, my eyes went directly to the cranberry sauce, and I thought to myself "Wait... is that one supposed to be clicked, too?"

I thought it was unclear too for a different reason - the text says "that match this one" which I read as being actual identical matches. Sure it's obvious when you see the images but that wording feels really awkward.

Same, my first thought was to look for the identical cat photo, or maybe the same cat from a different angle, not just other cats.

don't know if this works only for me but here is a live example - https://www.google.com/cbk?cb_client=maps_sv.tactile&output=...

It gives me the new version as well, but it seems google is convinced that I am a bot. Getting a regular captcha after clicking the button and I have to say that this is a lot worse of an experience than regular old captchas. Now I have to wait for a few seconds after clicking a button, then still solve a captcha.

Hopefully it gets better with time.

Do you happen to browse incognito or with 3rd-party cookies blocked?

Looks like the new version needs an active and valid google cookie in order to tell if you're a robot or not.

Try this one:


It's very similar. I might go as far as saying that Google copied them.

I attempted tabbing to the checkbox and pressing spacebar (not moving the mouse at all) and it worked just fine. Impressive. But I guess the tell-all is how secure it is against bots. not how easy it is for humans to get through. For all we know it could just be letting everyone through :P

It worked for me, though Google thought I was a robot at first.

Probably because lots of users are currently visiting the link.

Got recaptcha'd, probably because of Ghostery.

I've been blocking third-party cookies for a while, and I noticed that I only get the old, hard to read, captchas, instead of the easier version with numbers.

Too bad this new version won't work for me either.

Obligatory XKCD: http://xkcd.com/810/

I just hope it also works for pen-tablets, where the "pointer" can suddenly jump from one location to the next when the pen comes near the surface of the tablet.

Or the much more common case of touch screens. I'm assuming it's fine - the fine pointer movements are just one aspect of it, and tapping/clicking with a pen are likely to produce small movements anyway (whether or not those movements are suppressed by the driver of whatever device you're using is another matter).

That and what about those who fill their form with tab navigation ? No mouse involved here. It is just showing off ...

I just tried the one ins0 linked above, and tabbing through, using space to select the checkbox, worked fine.

Using my iPhone, it thought I was a bot.

More appropriately stated: couldn't determine a priori that you were a human.

It's definitely not mouse based. I tried it in an incognito tab and it showed me the old form when I clicked the checkbox.

I tried the demo in two different browsers that I use regularly. On the one that stays logged into my google account, I was not challenged with a captcha. On the other browser, which I use quite a lot but not with my google account, the captcha appears.

I'd think that having a long-standing google account with a normal history of activity would be a good indication that one might be a human. If google is weighting that heavily for this test, that may create a new incentive for spammers and scammers to hijack people's google accounts.

On what page did you tested the new system?

1. Tested on my normal chrome where I didn't delete any cookies and logged in to my google accounts, and no adblocker running. So plenty of evidence I'm human, with all those cookies from big G.

2. Tested in incognito mode: BAAM: I'm a bot, had to fill out the old captcha!

Wow, that's odd. I checked it out on mobile and they confirmed me human without giving me the images set similarity question.

I don't think they can predict this from the way I touch the screen >.<

So, I tried it. With a Google Account that had ZERO activity for two years. Using a Java program to activate that site.

Passed the CAPTCHA. Without any further verification.

I mean, spammers are going to love it xD

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact