Hacker News new | past | comments | ask | show | jobs | submit login
Hacking Google ReCAPTCHA v3 Using Reinforcement Learning (arxiv.org)
55 points by ArtWomb 36 days ago | hide | past | web | favorite | 43 comments

"Our proposed method achieves a success rate of 97.4% on a 100x100"

I think that is higher than what I get when I do it myself.

I can never get the street lights one, and then it gives me the bus one, which i can never get right either and then finally it gives me the bus one. Which I am 100% at, maybe.

The bus, the lights, the hills have nothing to do with your success rate. It's all about using Chrome and being logged into google.

this is ReCAPTCHA v3 which does not include any of the image recognition tasks. It's just a matter of clicking the checkbox

Typically if you "fail" the checkbox (which somehow happens to me a lot) you get the bus and street lights and such that the other posters are referring to. What are those if not ReCAPTCHA v3?

reCAPTCHA v2 apparently. reCAPTCHA v3 is advertised as never interrupting users at all: https://developers.google.com/recaptcha/docs/v3

Which is kind of horrible, since it means that you might not be given an obvious opportunity to change your score if you fail.

This can be pretty frustrating and we’re occasionally hopeless what to do. Especially when there’s no apparent reason for the low score. Eg https://github.com/google/recaptcha/issues/248

>reCAPTCHA v3 is advertised as never interrupting users at all

That's got to be a joke, since I have to pass the challenge like 99% of the time, not exaggerating. Of course, I have my browser configured in a privacy conscious way, so...

If you can see a challenge, it must be reCAPTCHA v2 instead.

So does v3 fall back to v2 if you fail it?

You fail at the bus and then get 100% on the bus?

ReCAPTCHA is one of the worst things that has happened to the Internet. Please consider an alternative if you are a webmaster that has a choice. It's overkill for most purposes.

Also: by using ReCaptcha, you're essentially giving Google free money, in the form of training samples for self-driving cars. Surely there are CAPTCHAs for things that benefit the public at large?

I never understood how it works. Do we give free feedback and then tests us?

It compares your answer against answers from other users, to make sure that they agree. This gives them a training set to use.

What if we troll it? Or wisdom of the crowd is too strong?

What’s the alternative?

Fairly simple: not having CAPTCHAs. By including them you externalise your business costs onto your users.

So how do you stop bots from submitting fake data to sign up forms or trying to brute force password fields etc. on websites?

For passwords: offer 2FA (ideally WebAuthn, but it sadly hasn’t landed in enough browsers yet), use a reasonable password strength policy, and possibly consider OAuth login. Or, and I know this is crazy talk, allow users to opt out of the bloody captcha if they say they use a password manager, and then use some basic heuristic requirements on the password.

But stop making me click the friggin’ stop lights every time I log in to my own account.

> brute force password fields

rate limit

> stop bots from submitting fake data to sign up forms

I don't think there's an universal solution here. it depends on the application itself and why you consider fake signups an issue in the first place.

Rate limit by what? Username? IP?

per user if you want to defend against targeted attacks. per IP if you want to prevent untarged attacks. so, both.

add an email-reset for the limit so users can't be locked out of their accounts by a DoS.

Circuit-break after 5-10 failed attempts and require 30s or 1m before each attempt after the first 3 failed ones.

To stop fake signups, require confirming the email address and only allow some number of signups per IP per day. It's not perfect but neither are CAPTCHAs and either way you can probably stop most spam, if it's even a problem for you.

Something I have found to work well is simply adding a hidden input field to forms. Since just about every spam bot automatically fills every field and submits, you can just disregard any submission which has that field populated and there is zero interruption to normal users.

I've used this tactic for years and sadly it's not remotely helpful for any of my websites anymore. The bots only get better over time. Many are also running PhantomJS, headless Chrome, or similar.

It was only a matter of time!

I imagine using a standard text field and then hiding it using css probably works much better than setting a type=“hidden” field. I also usually use something like name=“phone” and then just naming the actual phone field something else, if needed.

Except then, you have autofill/extensions that form fill for you.

Generating simple math question/answer can work well enough to keep out non-targeted traffic (someone not targeting a bot tailored to your site).

Amazon & Google itself(login) uses old squiggly-text captchas. Try wrong password 2-3 times to be greeted by one.

Recaptcha gives Google exclusive automated access to your website. Do you want Google to grow as company because of this?

Any recommendations?

For most situations at most scales, there's a lot of simple open source captchas that should work just fine. If you're working on something that has high potential for dedicated abuse, there are some paid ones that will work well too. I can edit this post with links in a moment. But a captcha is just one line of defense, and you can take a lot of other things into account, like threat score of visitor IPs, verification methods like email, phone, etc.

If you want to simply limit amount of unwanted traffic, implement conventional image captcha with some minor twist. State-of-art bots do not (yet) have a human-like AI, so you will be safe(r) until someone adapts all existing bots to solve your modification.

If you want to hinder determined (but inept) adversary, impose reverse time limit: make your captcha a bit complex and deny answers, that arrive too fast. Legit users will spend a bit of time to solve captcha. Machine-learning-driven bots will blaze it. In addition to measuring speed of filling captchas you can measure amount of user time spent on other actions on your site — in process making your bot detector increasingly similar to Google's reCAPTCHA.

In general look for behaviors, distinguishing legitimate users from malicious. Hint: having Google account might or might not indicate a legitimate user, but it is probably more efficient to ask users for it directly than in roundabout way by using reCAPTCHA.

I don't have much data to back this up, but I but I've noticed that I get the recaptcha challenge and almost every single time when I use Firefox. Whereas if I use Chrome I only get it once after not using Chrome in a while.

Also on Firefox mobile, not only do i get the challenge, but I get multiple challenges.

This is confusing because ReCATPCHA v3 is non-interactive: https://developers.google.com/recaptcha/docs/v3

I always assumed it used your IP, tracking history etc. to decide. Am I wrong about that?

I'm sure it takes that into account, plus whether you are logged into any google account with associated meta information about you at the time.

It also learns about typical usage on that page, it trains about usage patterns, mouse movements etc.

We find it very effective in eliminating bot spam.

It's also very effective at excluding people like me. Even when logged in to a Google account with a very benign history (mostly used to watch youtube) with years (>4) of regular activity, Google won't[1] even attempt reCAPTCHA (any version) because they think my browser isn't one of the handful of specific browser versions they support[2]. The actual browser version is fine, it's configured similar to the Tor browser minimize data leaks and browser fingerprinting. So I cannot use any site with reCAPTCHA not because of any technical limitation; unless I "upgrade"[1] to a configuration that leaks a lot more data.

[1] https://imgur.com/9wT9yZ2 [ignore font/layout issues in that image - my usercss and fontconfig/freetype font rendering settings are very unusual, complex, and often very disruptive]

[2] https://support.google.com/recaptcha/?hl=en#6223828 "We support the two most recent major versions of the following: [Desktop: Chrome, Firefox, Safari, Edge]"

> We consider that the agent successfully defeated the reCAPTCHA if it obtained a score of 0.9

I wonder how to setup browser history, cookies and Ip address in chrome. You some got idea please share.

I'm somewhat glad to hear I'm not the only one who has been subjected to extremely excessive recaptcha tests according to these comments. Especially when I'm filling these out just to login to websites of which I'm a customer. I get it for ordering, but if you put this on your customers logging in, it's like you're begging for cancelations. Google is the one determining your paying customer's user experience.

And if I pass your captcha, can you not cookie me with a signed token indicating that I already proved I was human for 30 days? It's like these lazy people can't handle bot login spam, so they just throw recaptcha on their login form and call it a day.

If your login form requires paying customers to fill in recaptcha each time, you're doing it wrong. Please stop. Or go out of business faster.

The fastest way for me to get flagged is to request the audio test instead of the visual test, More than 7 out of 10 times it will halt and say my computer is sending automated queries and that I should try again later.

I've even gotten caught in a reCAPTCHA loop where I successfully complete the capctha only to have to redo it again as soon as the page reloads.

It's probably related to credit card fraud.

There are large sets of stolen credit card numbers. Most of them are disabled. Crime group automated purchase process to determine if the number is live.

So online stores really want to eliminate the automation.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact