The trick was that google's backend registration logic did not validate the referrer of the signup form, or that the submitting IP address matched the one that downloaded the CAPTCHA. So I cloned the form on my own server, loaded it in an invisible iFrame with all fields filled except the CAPTCHA. Then I served the CAPTCHA to the user, who solved it and clicked "submit". Then the entire hidden form in the iframe submitted and registered a google account with the visitor's IP.
I was surprised that worked, because it was a huge vulnerability. Nowadays I would go for the bug bounty, but back then I tried to sell it for $5k -- unfortunately nobody believed I could do it and I couldn't prove it without revealing the method, so I ended up unable to take advantage of the code.
I did have about 60k accounts from random page views on blogspot blogs, but I never did anything with them. Perhaps that was the only time my inability to finish projects ironically saved me from some trouble. :)
good times :-P
At least thats what happend with me and my friend when we mined about 1/10th of users from facebook, and crowdsourced a bunch of info from people, made it public, (we had an psudeoanonmyous no login messaging feature where people would literally share informaion that would make people question what privacy is when n-1 people are divulging everything about you) from searching names. If I wasn't a US citizen, id probably do it better now that i know more of what im up against. Maybe the landscape might be more open in the US in the future.
People moan on and on about facebook and companies like them as if they are all powerfull (they can seem that way if you play by their rules and submit to their jurisdiction for pragmatic reasons), but challenging those assumptions and leveraging user behaviors, and arbitraging legalities, one can probably crack some pretty big holes...
> The code you reversed is used to protect many sites’ registration process including Google and many others. We are concerned that having your code and analysis publicly available will make it easier to build registration automation tools which will result in a surge of spam in all the services protected by this code and will affect negatively many Internet users.
> This is why we kindly ask you to temporarily remove it from GitHub so your work won’t be used for a malicious purpose which we believe was never your intended goal.
As I wasn't aware that the botguard was also used for this purpose (separately of ReCaptcha, in Gmail and other services) before publishing my code, I removed the GitHub repository for now. I'm sorry for honest security enthusiasts who didn't read the article, but I don't want to cause harm.
Google also proposed me to come visit them in their offices to discuss about my work.
Google is essentially trying to run code on a user's computer but doesn't want anyone to know what it's running there while it doesn't stop the "bad guys" from doing their own analysis without publishing it. I'm not saying that they're trying to do anything evil, but it just strikes all the wrong notes for me when they try to suppress information on a system they should have known is wide-open to analysis.
Edit: got one, just using Github's research.
Edit2: had to patch the decompiler to have it run on python 2.7.8, so that it understands that long is an int.
I think reCaptcha is overused by services that should be publicly available for automation, specially in Brazil. I think this is a bad use case for reCaptcha or any other captcha system.
Anyway... congrats for your work!
"Google servers will receive and process, at least, the following information:
Execution time, timezone
Number of click/keyboard/touch actions in the <iframe> of the captcha
It tests the behavior of many browser-specific functions and CSS rules
It checks the rendering of canvas elements
Likely cookies server-side (it's executed on the www.google.com domain)
And likely other stuff...
At least the old captcha was a more simpler image.
And my UA string is randomized (unless I override it for a particular site), so that doesn't do anything either.
I wish someone could come up with a plugin that allowed a "safe" subset of JS to run without consent, though.
until there's a significant number of people doing the same thing, you're simply "that guy in <insert geoip lookup city here> with the randomized UA", and you're infinitely more fingerprintable than just about anybody else on the internet. got to EFF's panopticlick to see how unique your fingerprint is. using an iPad with up-to-date software gets me the same fingerprint as about 18000 other people in my geoip region, i haven't been able to do better than that yet.
It's a weighted random selection from a list of most common browser UAs, weighted by frequency of that UA.
So they can try to fingerprint all they want - all it'll do is clog up their database with useless entries.
or my personal favourite: sending the do-not-track header, something that only a small number of people send that makes you much easier to fingerprint.
And DNT is currently at around ~8%, so, although it does leak some information, it doesn't leak an absurd amount (~3.6 bits). (That's using data from here , which is FF-only. If you have a better source of data for this, please let me know.)
You can also abuse parsing quirks to figure out which rendering engine's being used, or just try to use request-generating features that shouldn't be present in whatever browser you're saying you are (<svg>, <video>, styling on engine-specific psuedoelements, etc.)
Here's an example using just HTML+CSS that will request a different image depending on whether you use a webkit or gecko derivative. If you use neither, no image will be requested. Someone who says they're Chrome but requests Firefox's image is immediately outed as a liar.
Same thing given something like `<img src="jar:http://example.com/ewwww_jar_uri!/baz">`. Gecko will make a request to http://example.com/ewwww_jar_uri while other browsers won't since they don't support the jar URI.
I believe Mario Heiderich also posted some stuff using webkit's styleable scrollbars that could be used for fingerprinting screen sizes and how large certain elements are when rendered.
The list goes on, but my point is that fingerprinting at the rendering / layout engine level is trivial, so you're better off being legitimately ordinary if you're worried about fingerprinting.
Are your headers in the correct order for the given UA?
Correct capitalisation for the given UA?
Correct accept for the given UA?
Correct white space around or between values for the given UA?
It is far better to appear to be the same as everyone else if you want to be anonymous (i.e. to browse on an iPad) than it is to do anything to try and not be tracked.
Anonymity today is to be invisible within the crowd, not to stand out as you are the only sheep that is shorn.
With that fingerprint they can track your habits across multiple domains. Bots can look like browsers, but they can't necessarily browse the internet the same way a human can.
No cookies? no way of tracking.
Or if you want to see it like this, they are getting something standard and turning it into something that it's under their control.
You have chosen a very interesting value of zero.
They have all your interaction with every Google property, tied to your account when you're logged in and semi-persistent pseudonymous identities when you are not. They have the clickstream data between every google property (most notably, Search) and the rest of your web experience, which can (fairly easily) reveal many websites you visit. They have ga.js or AdSense tracking code running on double-digit percentages of all pages on the Internet.
If asked to, Google could provide you with as accurate a record of my flights between Japan and the US as either nation's customs agency could, simply by looking at a time series of IP addresses. Their data got radically more accurate a few years ago when I started using Google Maps with the location permission turned on.
For added giggles, Google is a SQL join away from associating my extraordinarily-well-established-but-weakly-verified Internet identity with unique identifiers like e.g. my social security number. That would probably make someone in the Borg hesitate for a few minutes, but clearly they're OK with saying "At scale, we know the huge class of people which happens to include Patrick -- who we know intimately but prefer to avoid acknowledging that fact in social settings -- is identifiable by a vector of features, and a machine can very quickly cluster a random Internet user with Patrick versus with Spammy McSpamsalot. We can thereby organize data about this person to serve their needs better, for example by giving them access to resources only for trusted people, like captcha-free whatevers. Bonus: this is one more reason why it is fun and convenient to invite Google into even more areas of their life! It's a win-win!"
One could probably write a greasemonkey script that replaces the new captcha with the noscript version.
The Google post also gives the examples of Snapchat (https://support.snapchat.com/login2?next=/), WordPress (https://wordpress.org/support/register.php) and Humble Bundle (https://www.humblebundle.com/).
At the end of the day, it does not stop a determined individual from reverse engineering the code (and then publishing the technique).
BUT it does make it more difficult to understand, work with, etc. It simply reduces the number of individuals who have the skills necessary to follow everything though.
It will deter the less skilled people from attempting to "hack" the system....
... especially if slight changes are made. Google could change the smallest thing and run the lot back through their mangler and the attacker would have to go through the process of unobfuscating it all over again.