Hacker News new | comments | show | ask | jobs | submit login
Google's new CAPTCHA security login raises 'legitimate privacy concerns' (businessinsider.com)
109 points by r0h1n on Feb 24, 2015 | hide | past | web | favorite | 46 comments

Does anyone have good alternatives to old/new ReCAPTCHA? I've been scratching the surface of academic research in the area, and it's all kind of messy.

(It's no great secret that CloudFlare would love to switch away from ReCAPTCHA, for a whole variety of reasons. It's one of the things Tor users complain about the most, but it's an issue for a lot more users than that. We're doing a lot of stuff to reduce reliance on CAPTCHAs overall throughout 2015, but we still need a good one for some checks.)

I wonder if some kind of prize (anti-Turing prize?) would help. There's the core algorithm/approach question, as well as the infrastructure and deployment model question. I'm a lot more comfortable answering the latter; the former is a black art mixture of science and art.

You could disable CAPTCHAs for Tor at any point, but don't, presumably because there actually is a lot of scraping and other abuse coming through there. I don't think replacing reCAPTCHA is going to make Tor users magically happy. They will just complain about whatever alternative is used.

One thing you could look in to (email me if you'd like more info on this) is the notion of using Bitcoin-based "anonymous passports". The idea here is that someone sacrifices some Bitcoins to miner fees in such a way that they effectively mint themselves a certificate over a public key, without paying a certificate authority. Seen one way, the block chain itself is the CA.

Once such a certificate/"anonymous passport" has been created the owner can sign challenges with it to prove ownership, and if the user is observed engaging in abusive activity it can be blacklisted - forcing another sacrifice of money if they want to keep going.

The downside of this is that of course it relies on Bitcoin. However, there's a whole army of people working on the problem of obtaining bitcoins. There are local traders. There are Bitcoin ATM's being deployed throughout the world. If you want a private way to demonstrate some sacrifice of effort or wealth I don't think there's a better alternative.

Currently there's no convenient GUI for making these certificates. If a major provider like CloudFlare were willing to get behind the concept, such a GUI could be easily built though. For instance I could add it to Lighthouse, which is a cross platform Bitcoin wallet app that specialises in smart contracts.

Why not just rate limit responses? That ends up costing bot makers about the same amount of money as captchas (which are often solved by workers earning slave wages). This arms race will never end and if the insistence is always to prove you're human, then humans will always be exploited for this proof. Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.

Specifically one way to rate limit would be a cookie value that changes on each request, the previous cookie value expires, and only the site knows what the next valid cookie value is. Bots will pay for the cost of waiting in terms of computing time, and in terms of memory if they get around this by parallelization. As these costs go down due to cheaper computers, then so too will the costs of serving the site.

> Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.

... did you just solve the whole problem of on-going automation? Capitalism is saved! We will all work as CAPTCHA breakers. ;).

The article claims bits can solve the most distorted text with 99.8% accuracy. I'm not that accurate. Perhaps someone can write a captcha breaking chrome extension, so I don't have to bother.


the power of paying other peole cents to fill in captchas

An issue that I've run into is that 1) Google registers traffic from Tor proxies as suspicious (with some reason), it 2) puts Captchas in front of you (which are getting quite difficult to solve), and 3) if you're rate-limiting Tor proxies (around 6,000 - 7,000 worldwide as I was checking earlier), you're going to block a lot of legitimate Tor traffic.

Similarly for VPNs and other tools. Whose use is fairly likely to increase as people start seeking ways to avoid ubiquitous surveillance.

There are other options, including a few tools that look at how to provide a fair and anonymous reputation system for Tor clients:



I think you can rate limit without tying it to IP address. If each page returns a session key only valid for the next page request, then you force bots to wait as long as you want and/or spend money on extra memory for parallel sessions. One problem with this is, e.g., if people come to your site from an indexed link and have no possible session yet. In that case you probably would want to add a delay after some amount of requests per IP, so you'd slow Tor users down but only on the first request to your site. If your page is JS or browser dependent in some way, then bots would probably need about 100 MB per thread. All of this is in the ballpark of paying people to solve captchas.

This was a problem before Tor anyhow. You can run a proxy for a few cents a day.

I just don't see how captchas are some awesome solution. In any anti-bot technology, the cost to circumvent it is pennies. It strikes me more as something like DRM which just makes content producers feel good, but really only punishes average people.

edit: sorry I hadn't read your links. Good points and hopefully someone like CloudFlare would make this easy for people to add to their sites.

You'd have to limit novel sessions to very low activity rates. That would require some sort of persistence token (not necessarily a cookie), and if provided on an anonymised basis, one that's verifiable but not predictable or traceable to prior cookies. Which is what much of the references I provided covers.

Sorting a mechanism for allocating those tokens' seed values is difficult. FAUST requires an unblinded token request initially.

CAPTCHAs had been useful, though always problematic. The goal isn't perfection but costs. Problem is that costs keep falling.

Rate limiting alone would still leave open many categories of abuse.

I might even incorporate the request rate into a bot detection algo, maybe have it trigger temporary hellbans.

Request rate is definitely one thing you can limit, but it's tricky when attackers potentially control large numbers of IP addresses.

There's an annoying triangle here: wanting to preserve privacy (== unlinkability), machine-independence, and "working well for good traffic with limited resources, as well as blocking attackers with substantially more resources". Ideally it is "choose zero", I'd be happy if the state of the art were even at "choose one".

er, I meant choose two, and we're generally at zero or one.

Yep, definitely do, as you can probably guess from my username.

Full disclosure: I'm from the team behind a leading CAPTCHA alternative and our concerns are two-fold -- privacy and vulnerabilities.

a) New reCAPTCHA relying on a 'black box' to verify users is of course, naturally concerning privacy wise.

b) the technology that has been implemented to cater to this black box has actually opened the door to more vulnerabilities.

Our Design Director explains the reasoning behind the concern regarding the 'black box' here: http://www.funcaptcha.co/2014/12/04/killing-the-captcha-with...

And I myself go into more detail about Egor Homakov's findings regarding the new vulnerabilities here: http://www.funcaptcha.co/2014/12/04/killing-the-captcha-with...

Apologies if this feels promotional - if you have any questions, I'd be happy to answer them. This is an area of web sec that we're, obviously, very dedicated to.

I looked into this research area as well. In fact, I considered writing a thesis about CAPTCHA. The problem is really that there is little to no formal theory about this subject. CAPTCHAs remind me of how cryptography used to work: Someone invented a cipher, someone else broke it and so the cipher had to be improved to prevent this attack (well, it still works kinda like that, but the theoretical background is much better now). Much more an art than a science. What is a strong CAPTCHA? Again, almost no theory behind this. It's strong if programmers have a hard time figuring out how to break it with software. It is a frustrating situation, really, and computers aren't getting any dumber so CAPTCHAS must get harder to the point where they're not recognizable by humans.

Yes, but unlike crypto, a CAPTCHA doesn't need forward protection. If I deploy one, and it turns out to be weak, I can just upgrade and then not worry. I'm not worried about people time-traveling to the past.

I run a help site for an online game, and I used to use CAPTCHA's on our various forms. We had tons of bot problems. So, one day I switched the CAPTCHA to a simple trivia question that only people who play the game would know the answer to. (And helped the people who didn't/were newer by linking to the answer.)

Our bot problem disappeared over night, and we haven't had a problem since. Definitely not a solution for everyone, but it could be a great solution for some.

I don't recommend doing something like this. Most often, this will turn away a lot of people and will not be worth (if the question is too hard).

You should reduce reliance on IPs and instead give people unique IDs once they solve your captcha. Then any abusive traffic will be de-muxed based on ID and so legitimate users will be effected much less by the bad behavior of others.

Google's new CAPTCHA security login raises 'no concerns': http://pages.higg.im/2015/02/24/googles-new-captcha-security...

So if google is grabbing all your profile data via the traditional reCAPTCHA but also makes you fill out a form, then it's all ok. But once it becomes obvious that they are collecting the data and they are using (the already collected) data to make it so you don't have to type in the text on the picture, then it's a privacy concern.

Or do you honestly believe they bought and continued to operate the old reCAPTCHA out of the goodness of their hearts, never collecting all that data that everybody is upset about now?

"Or do you honestly believe"? Really? Your choice of wording is a tactic that spins the discussion in a way that people should recognise as a derailing tactic by now. It attempts to make people who do indeed view things this way feel embarassed, and it's not even about beliefs to begin with. Please stop.

I agree. That was badly worded. Sorry. Unfortunately, by now I can't edit the comment any more.

However, there is no technical reason why the old system would not have had exactly the same means for tracking the user as the current system has.

If you consider that Google is mainly an advertising company and that Google's investment into reCAPTCHA must provide them with some value, that lead me to conclude that the old system was doing exactly the same tracking as the new system is doing now.

The advantage is that now I don't have to type in letters while the tracking stays the same.

While the old recaptcha could run javascript to collect all sorts of data about typing and mousemovement, we expect it not to.

This new one tries to make a behavioural profile of you and use that to determine you humanness. Maybe the old one did that too. Maybe every Google service does that. I dont know. The outrage is that now we know one is doing that.

And on a personal note, yes. I do believe that Google ran recaptcha just for the free annotations of data.

How does one run a reputation system without knowing whose reputation one is calculating?

The point here is that the new CAPTCHAs are hosted on Google's domains when used by other services.

Much as with Facebook trackers, this provides cross-site user tracking by Google.

"Perona told us: “The use of Google.com’s domain for the CAPTCHA is completely intentional, as that means Google can drop long-lived cookies in any device that comes into contact with the CAPTCHA, bypassing third-party cookie restrictions [like ad blockers] as long as the device has previously used any service hosted on Google.com.”"

(From TFA).

Google did use reCAPTCHA to recognize Street View house numbers and street names, so I think there are (somewhat) more innocuous reasons than just wholesale data collection.

The point is that the old reCAPTCHA (as I understand it) did not set & grab global Google cookies, and thus did not track you across your web browsing. Whereas the new one does, and the privacy policy is vague enough to allow Google pretty much any use of the data that they generate from this.

This sounds pike Google could be violating EU laws about data protection. We've seen that the EU is happy to enforce stupid laws (cookie notification; right to be forgotten) so they need to be a bit careful. They at least need a robust rebuttal to researched concerns.

I've been presented with this new captcha about a dozen times, and "failed" it every single time, whereupon it falls back to the traditional squiggly text.

I run Ghostery, so perhaps passing it relies on possessing some tracking cookies? If so, I'm happy to continue failing it.

There is no AI and other stuff this article talks about. It's just google cookies and nothing interesting about it http://homakov.blogspot.com/2014/12/the-no-captcha-problem.h...

I'm blocking re-captchas completely and one at least one occasion the site just let me in. Must be poor integration on those sites.

Little bit off topic but, if you use only <tab> to go to checkbox and press space to select it, it brings up the good old "type the text in the image" verification (distorted text).

So it thought I was a robot and fallback is to use the old captcha. Well, not sure if this new captcha solves the problem it was intended to do so. Am i missing something?

This is quoted from their blog post about recaptcha -->

"However, our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy."

Here you can test it by yourself --> https://www.google.com/recaptcha/api2/demo

Oooops... It presents me with the Recaptcha image challenge but the top half of the challenge image isn't visible on the pop-up.

1800x1280 phone screen isn't sufficient it appears.

I assumed this was how it worked. Tracking mouse/keyboard seemed a little phony, and google is hardly a stranger to tracking personal information. It really is a bummer that there aren't many great alternatives.

I can't imagine there are many sites using this that aren't already using Google Analytics, which is already deeply integrated with their ad platform and knows exactly who you are using cookies.

Friendly reminder that every request to any Google associated server comes at the price of having your privacy invaded. Yes, this means Google Search but also GMail, Android, reCAPTCHA, Maps (any website displaying a Google Map), Google fonts (any website using Google fonts), Google CDN (any website using Google CDN), G+ (any website using an integration) et cetera.

In fact, completing this list with ALL Google its tentacles will probably break the character limit on HN. On a very few exceptions, Google its scooping eye is there to learn more about you.

This isn't Reddit, and substance-free paranoia isn't welcome here. In particular, some of the properties you listed are hosted on cookie-less domains and have privacy policies that severely limit data collection.

I can't understand why anyone is surprised. Anything you load from Google's servers is used to gain more insight into your online habits.

It's their core business. To learn as much as possible about people online to be able to show them the most relevant ads.

Forgive my ignorance, is there a way I can use cookies for authentication tracking only on websites while blocking access to screen-size, CSS etc?

I'm confused as to why people would care. It's easy to block, and if so you have to do a harder test. No one said wanting more privacy wasn't hard work.

For the 99% of people who don't care, it's a great improvement.

And if someone wants to write a better system, who's stopping them?

Easy, as in

1) take a few months to study about web technologies to develop an intuition about why and how these technologies are tracking you

2) click a button to block their cookies

You and I already did (1), but most people haven't.

Great. How do I 'easily' block this on my iphone, please?

Well, I guess jailbreaking does not qualify as easy, does it?

So what? Kind of useless article, nothing new.

It's google. By this point, if anyone is really surprised they hate privacy, they have clearly been living without any internet or newspaper access for the last 10 years.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact