
Why can’t a bot tick the 'I'm not a robot' box? - grzm
https://www.quora.com/Why-can-t-a-bot-tick-the-Im-not-a-robot-box/answer/Oliver-Emberton?share=1
======
renlo
I’ve had a number of reCAPTCHA incidents where I could not pass the test for
tens of images, it was a very frustrating experience. Please do not use
reCAPTCHA.

The items one is supposed to select often overlap the grids, so it becomes a
kind of Keynesian Beauty Contest[1] at that point; I assume they validate
based on how much in alignment you are with previous answers, so it becomes a
problem of “What nearby grids would a person reasonably select when there’s
overlap”, or, you’re tasked at selecting <some_item> and you see <some_item>
in the distant background of the image you’re supposed to classify and you
need to determine “How visible would <some_item> need to be for a reasonable
person to classify <some_item> as being in this image”.

On top of this, when you’re clicking through image after image after image,
the additional frustrating thing is that you’re helping train their
algorithms; you’re doing work because their service isn’t smart enough to know
you’re human, or, you’re seen as a marginal customer they can piss off by
forcing you to work for them for free.

It’s a frustrating experience when it fails, and my current strategy is to
leave the website when the ‘I’m not a robot’ checkbox fails.

[1][https://en.m.wikipedia.org/wiki/Keynesian_beauty_contest](https://en.m.wikipedia.org/wiki/Keynesian_beauty_contest)

~~~
retro64
You’re overthinking it. I have had a similar experience. The worst were the
“stop light” questions. Does it mean the pole, the light, the tiny corner
overlapping another square? I used to try to include everything as it was
technically true. Very frustrating – until I finally started to not care.
Click on the most obvious pictures. Click click click. Done. Get it wrong?
Click click click on the next one. Way faster with a much better success rate
doing so. It usually only takes a couple of tries now.

~~~
JoshTko
Just a suggestion to not say "You're overthinking it" it can be considered
pretty dismissive.

~~~
retro64
Maybe a cultural thing? For me, "you're over thinking it" is a
familiar/friendly way of expressing an idea, similar to how you would approach
a friend. It was not intended to be dismissive.

~~~
vectorEQ
you are right. it is a cultural thing. that being said, the advice is sound.
the internet is a mishmash of cultures :) if i would bring my culture to the
internet, everyone would think i'm troll or horrible person. just because ppl
from my culture are a bit direct and cynical :D if you'd like people to
respect and take into account your culture it's good to do vice versa.

now that's over thinking :D

------
eridius
Lately I've been getting reCAPTCHA prompts all the time even though I'm not
browsing in incognito mode and haven't cleared cookies. All I'm doing is
running a very basic ad blocker, using Safari (which blocks third-party
tracking), and very rarely loading a Google site. The most interaction I have
with Google is when I end up having to use my corporate Google account as SSO
for some other site.

Given that I'm not doing anything unusual, it really feels to me like
reCAPTCHA, for all its complexity, boils down to "what's your history using
Google software? Oh you rarely use it? I'm gonna give you a captcha". It
didn't used to be this aggressive, but it's really ramped up in the past few
weeks.

~~~
blitmap
I have noticed this too. I've switched to DuckDuckGo for everything and I
haven't changed my habits. Started getting more captchas a couple weeks ago
and I know I answered several of them correctly (I'd get tested 3 times in a
row).

Possible fingerprinting?

~~~
wstuartcl
It is also plausible that because google analytics runs on so many sites that
they could do something shady like put you in a pester segment if they see you
coming from duckduckgo to other sites frequently. It is not hard to imaging
using Recaptcha as a nuisance against other search traffic providers.

~~~
pergadad
I doubt that many people would make a connection between their search engine
and seeing captchas on other sites. So limited gain for, if anything, many
unnecessary complaints.

------
bsamuels
Slightly related, but I have a fun conspiracy to share:

I'm convinced that part of the reason Google released headless Chrome is as a
honeypot for bot authors to use. The idea is that instead of going through the
effort of fingerprinting and identifying new bot software, release something
that bot authors will use instead that you have a capability to detect.

Somewhere inside of headless Chrome, there's one or more subtle changes that
make it so Google can detect whether you're using headless Chrome or normal
Chrome. There's no limit to how subtle the indicator could be - maybe headless
Chrome renders certain CSS elements slightly slower than normal Chrome, etc.

It sounds pretty crazy/complicated but I could definitely see it being worth
it if it means detecting $X,000,000 worth of ad fraud every year

~~~
mpol
Interesting idea :)

I don't think spambots are currently using Chromium or even running
JavaScript. Using simple spamfilters in JavaScript still works fine on my
setups.

~~~
bsamuels
Most modern credential stuffers use headless browsers with all the bells and
whistles, html5, javascript, etc.

Login attempts are usually spread over a massive botnet of residential IPs as
well, where they'll only use each IP for one or two login attempts before
moving on to the next.

It's a very fascinating problem space

~~~
Damogran6
In my experience, the botnet didn't upgrade their JVM...it was 18-24 months
out of date. THAT was what we filtered on at the F5 to blunt the attack.

------
partiallypro
Every time I fill one of these out I get the picture test, and I answer them
correctly...but am asked 3-5 times to identify which blocks contain a school
bus or stop light. It's very annoying.

~~~
ivanbakel
I think it's speculated that you're recorded as being a useful classifier if
you answer correctly on initial test captchas, so you get given Google's
datasets for machine learning. It would explain why you get picture tests even
after you should definitely pass the check.

~~~
kzzzznot
If that is true that is a huge breach of trust. Are these practices ever
audited?

~~~
Scene_Cast2
By whom? For what? (Meaning - probably not, unless you count a few Googlers
sanity checking launches)

------
hartator
At SerpApi.com, we built a bot to check these boxes and an AI to solve the
actual CAPTCHA.

Checking the box is actually not that hard. There is no advanced measurements
of your mouse and touch speed. This is an Internet myth. It's more a game of
cookies, making them age well, and having an organic set of headers.

~~~
SheinhardtWigCo
Aren’t you afraid of being sued by Google for selling their search results?

~~~
wstuartcl
My first thought as well, looks like the business is 100% based on breaking
TOS.

From their site: Is scraping legal? In the United States, scraping public
resources falls under the Fair Use doctrine, and is protected by the First
Amendment. See the LinkedIn Vs. hiQ scraper ruling for more information. This
does not constitute legal advice, and you should seek the counsel of an
attorney on your specific matter to comply with the laws in your jurisdiction.

ROFL, I guess if you are able to ignore the layers of other issues TOS,
breaking of technology to specifically exclude your use case, etc and are only
willing to apply some very tangential case law against your reasoning it is
"legal".

------
mrccc
The captcha always reminds me of The Stanley Parable:

> Employee #427's job was simple: he sat at his desk in room 427 and he pushed
> buttons on a keyboard.

> Orders came to him through a monitor on his desk, telling him what buttons
> to push, how long to push them, and in what order.

------
fabioborellini
Isn't this a typical Quora answer? Full of filler and shitty hard-to-verify
details that provide no value to the answer ("the language is encrypted
twice", what the hell), and very little effort on answering the actual
question (what is the purpose of CAPTCHA).

And the community rules try to block people from writing firm "you're full of
shit"-like answers, even though every other answer of Quora is full of lies
like "Linux is fast, because it was designed for 16-bit computers".

~~~
jaabe
I had my “wow” this place might not be that good experience with Quora
yesterday when I was trying to google evaluate AWS workmail.

Quite a lot of the “extremely good looking” answers on Quora straight up said
that you couldn’t do e-mail in AWS. These were answers from after workmail was
a thing by the way.

So I started looking at other Quora answers on stuff I wouldn’t normally need
an answer for, and it’s frighteningly how often completely wrong answers look
correct.

Don’t get me wrong, there is a lot of truly amazing answers as well, and it’s
entirely possible that I just suck at it, but I don’t think I can always tell
the amazing answer from the completely wrong one.

~~~
adventured
My experience with Quora has been that more often than not the older the
answer, the better it is. I find that answers in history, are often better
than in tech. It always seems like the community that initially built Quora,
stopped building it further several years ago and now it's floating out in
space Wile E. Coyote style.

~~~
distant_hat
Quora went significantly downhill a few years back. It was a combination of
hordes of new users, bad moderation, and bad incentives (order in which
answers get shown etc).

------
djflutt3rshy
The box has made browsing using TOR insufferable! It fusses and makes me click
storefronts and traffic lights until I run out of patience and close out of
whatever webpage I was trying to visit. I assume it has to do with a lack of
Google cookies on the browser, essentially punishing me for trying to protect
my privacy.

~~~
Kalium
This might surprise you, but it actually has to do with what traffic coming
out of TOR looks like. Well in excess of 90% of traffic coming out of TOR is
spam, bots, malicious, or some combination!

Google isn't going out of their way to punish you for trying to protect your
privacy. They're trying to stop unwanted traffic. By unfortunate happenstance,
you appear to be disguising yourself in the exact same way a shocking amount
of bad traffic is.

~~~
mattlondon
Not just for Tor.

I use Firefox with a few basic extensions (Privacy badger, uBlock, Google
Container) yet every time I am presented with having to pick out traffic
lights over and over and over again. I usually have about 5 or 6 "challenges"
before I give up and use another site.

My timezone has not changed, my IP address and rough location has not changed,
my screensize has not changed, my broadband speed has not changed, and my
general computer dexterity has not changed, yet I am relentlessly targeted. On
chrome I never saw these challenges, but on firefox with the privacy plug-ins
I am always always always challenged.

At this stage I think the only signal it is using is "is there a google cookie
in this browser? and if so has the google cookie got some 'normal' looking
activity logged against it?" I.e. they are checking their server-side logs for
a given cookie ID and seeing if that looks normal or not (i.e. seen on google
search, seen on youtube, seen ads from a variety of third parties on various
different sites, mixed up with time of day and speed of viewing etc etc).

Since I have got Google in a container in Firefox, I am guessing that my
google cookie is not present when the captcha loads (due to the containers and
privacy badger et al) so there is no identity back in the mothership to
compare me against.

~~~
gcb0
for google, you are the enemy. not even bots.

captcha is google master blow against ad blockers.

a regular user, who they have all the info, give them dollars per ad
impression. You, with your doNotTrack (ha! that was a joke) and privacy addons
makes them only cents per ad impressions.

you are google's enemy. remember this when you get stuck in captcha hell (and
consequently censored from most sites until changing device/ip)

~~~
nine_k
IDK. I run Firefox on many OSes, everywhere with uMatrix that blocks known
trackers, ad networks and such. I don't see most ads (if any).

I rarely see the "I am not a robot" box, and hasn't seen image recognition
tasks for a long-long time.

~~~
raws
That also heavily depends on what kind of/which sites you visit.

~~~
gcb0
"that also depends if you have something to hide" was said of every police
state and censorship scheme.

------
miguelmota
As a user who's constantly clicking on the crosswalk or storefront images you
can't help but to think that you're essentially working for free training
Google's machine learning models by providing them with supervised data
points.

~~~
littleweep
I've been thinking about this a lot lately. Where is our compensation? It's
our time and brain power training Google's AI that will one day be sold back
to us. I'm really not into this.

~~~
gingerbread-man
Because Google can extract value from captchas, it makes world-class captchas
and bot detection AI available to every webmaster for free. I don't know what
that level of service would otherwise cost, but it almost certainly wouldn't
be affordable for low-traffic blogs and the like, which would end up
vulnerable using weaker captchas or trying to roll their own. Everywhere else
the cost would just get passed on to users.

I don't love the compromise of paying for things with my data or by training
Google's AI, but it's hard to say users aren't getting anything out of it.
That said, I do miss the old reCaptcha.

~~~
JohnFen
> it almost certainly wouldn't be affordable for low-traffic blogs and the
> like

Very few low-traffic blogs that I see use (or need) CAPTCHAs. I know that the
ones I run don't.

> I don't love the compromise of paying for things with my data or by training
> Google's AI, but it's hard to say users aren't getting anything out of it.

I don't think they are getting much, if anything out of it -- aside from being
increasingly punished for defending themselves against being spied on by
Google.

~~~
jopsen
My personal blog has a spam filter for comments.. it's either that or
captcha.. or sign in with Google/Facebook.

~~~
johannes1234321
Often a trivial non-standard thing like "what's the name of the author" works
well enough. Especially outside the English language. Spammers won't spend the
time to bother adopting their scripts for that.

If this somple thing comes from a popular WordPress plugin the equation for
the spammer changes, of course.

~~~
hombre_fatal
There's certainly a period of time where that solution is sufficient as it
stops the lowest level of drive-by <form> spam.

But it also sucks the first day you get an attacker who solves it once and
then spams you thousands of times.

Modern spam tools are pretty impressive these days and minimize the targeted
work the human spammer needs to do in these cases. In the early 2000s, you
could set a custom question and then assume no attacker is going to manually
code for your little blog.

But even in 2008 I was using spam software (out of curiosity) where you could
import a massive blog list, and it would pause spamjobs with failed comment
submissions, let you pencil in a value for this unknown field, and then click
resume.

You could also choose other actions for that field like "prompt me each time"
and sit at your computer multiplexing your labor across hundreds of blogs. And
that was pretty polished ten years ago.

------
dudus
This is reCAPTCHA v2. There's even a v3 that does not have a checkbox at all.
Is just a Javascript API that gives you a score between 1.0 and 0.0 on how
likely a user is a bot or not. I suspect it uses the same ideas of this one,
maybe more since the article is a bit outdated.

[https://developers.google.com/identity/protocols/OAuth2#inst...](https://developers.google.com/identity/protocols/OAuth2#installed)

~~~
JohnFen
> Is just a Javascript API that gives you a score

Yes, this is the worst of them all, as it will completely lock me out of
websites that use it.

~~~
dudus
I don't think that's the goal. Nowhere they suggest locking people out though
def it is possible. The idea is that the website can choose to be more
cautious about that user, requiring 2FAuth, flagging for possible credit card
fraud and comment moderation. I think these are all good use cases.

~~~
JohnFen
It will lock me out of the websites because it requires me to enable Google
Javascript code to execute, which is something I will not do. I allow very
little JS to execute at all, and I don't allow any from advertising companies
or entities that report to advertising companies.

I understand the reasons why sites may want to do this sort of thing, but
personally, the cost of allowing this to happen in my browsing is simply too
high.

------
CodeMage
Is there any place where I can find a comprehensive list of countermeasures to
stop Google from recording and analyzing all the stuff that the article lists?
According to the article:

 _It turns out they record and analyse:

\- Your computer’s timezone and time

\- Your IP address and rough location

\- Your screen size and resolution

\- What browser you’re using

\- What plugins you’re using

\- How long the page took to display

\- How many key presses, mouse clicks, and tap/scrolls were made

And ... some other stuff we don’t quite understand._

~~~
pixl97
Turn off javascript, mostly. To hide your ip you need to use a VPN.

~~~
scrooched_moose
Is noscript still the best for that? I haven't looked at other options in a
long time.

~~~
kevingrahl
If you want more granularity I’d suggest giving uMatrix a try. You’ll
basically break every site at first and have to make adjustments for every
site you visit (whitelist certain Ressourcen on a per domain basis) but I
think it’s well worth it.

NoScript (which is totally fine) just blocks all JS, uMatrix can block much
more.

~~~
jammygit
I found umatrix required me to turn so many things on for the average site
that I wasn't sure it was blocking anything significant anymore. I suppose I
have up after a while

~~~
kevingrahl
I can see why you’d feel that way but for me it still blocks a lot of stuff I
don’t want. I block some domains/companies via my hosts file and run a pihole
but there’s always the odd advertising network etc I, or my pihole didn’t knew
about that gets blocked by uMatrix.

------
dazhbog
Every time that thing asks

Select all images with traffic lights, I'm like, does the pole also count?

Select all images with cars, what about that car that is two pixes in the next
tile?

Do I click based on absolute truth, or how they expect an average user to?

------
xirdstl
I feel a sense of dread whenever I see this box. Is it going to let me
through, or am I going to spend the next few minutes futilely clicking signs
and lights, only to give up and leave the site?

~~~
clairity
just preemtively say no and leave the site. this is just another tracking
vector for google and it should be discouraged.

i'm generally against this type of gating, where the people doing the right
thing get punished disproportionately (even small slices of time add up to
wasting thousands of human-years over the population) just to combat the tiny
number of bad actors. target the bad actors directly.

it's the same for tsa security theater. let's put all those humans to work
training dogs of all sorts and filtering them through people at the airport.
the money for those privacy invading scanners can be put toward training and
housing the dogs. our collective time is not wasted on silliness and standing
in line, and we'd probably save a lot of tax dollars that way.

~~~
xirdstl
I'm getting closer to doing that. Lately, I check the box, and if I'm
presented with images, I leave.

I have also trained myself to wait a few seconds before clicking the box,
which seems to help assert my humanity.

~~~
afandian
Wait til it's standing between you and your bank account.

~~~
snazz
Phone banking hasn’t died yet, luckily.

~~~
afandian
Yeah it's ironic. It's more convenient to use the phone to get my balance,
even if you count the time spent listening to the recorded message telling me
how much better my life would be if I used Internet banking.

------
meritt
A bot absolutely can, you just need to use a more sophisticated bot. This
article [1] is from August 2017, so the arms race has escalated and techniques
improve, but the gist is the same: You just do a better job of simulating the
"human" characteristics they monitor. Gen 4 bots (bots that run on an infected
user's machine) can circumvent these measures as well.

[1] [https://intoli.com/blog/making-chrome-headless-
undetectable/](https://intoli.com/blog/making-chrome-headless-undetectable/)

------
wulfmann
This has been done:

[https://www.youtube.com/watch?v=fsF7enQY8uI](https://www.youtube.com/watch?v=fsF7enQY8uI)

~~~
mnorton
this should be at the top, dammit

------
jiveturkey
TFA is not very clear about what it's describing.

It is describing how the checkbox is collecting your browser's characteristics
(eg they go to great length describing the webGL fingerprint) and your own
characteristics (eg mouse behavior), such that when you click the box, you are
determined to be a person or a bot. If they think you are a person, you don't
have to do the CAPTCHA.

The whole bit about a double encrypted "VM" is overstating the case. The "VM"
is "just" a bytecode interpreter, which at the end of the day can't do
anything the browser's javascript engine can't do itself. Yes, it's some heavy
obfuscation, and what's more interesting than the interpreter itself is the
decision to spend what must have been lots of time/resources to develop it.
It's security by obscurity, and in this case it is delivered to the client so
obviously it's reversible. Maybe there's a deeper purpose.

EDIT: ah. the purpose is not to obfuscate. it is to fingerprint the CPU
characteristics. by running their own interpreter, and changing the opcodes on
the fly and such things, they can defeat JIT and learn something about the CPU
itself. if they have user info (google cookie) they can know what CPU/CPUs
that user typically uses and if "the checkbox" records something different
it's a signal.

------
ddebernardy
When trying to detect ad fraud, one problematic scenario is that of replay
attacks. It's basically when a scammer records human behavior on a site, and
then replays a mix of actual their sessions to fraudulently click an ad.

The Quora answer is interesting but it's not clear to me whether the "I'm not
a robot" box cannot be defeated in a similar manner.

~~~
amirhirsch
I work on bot detection at hCaptcha.com. (Hiring: reach out if you want to
apply machine learning to stop bots and help websites monetize their traffic
without ads)

In order to successfully execute a replay attack you would also need to pass
the Turing Test, i.e. click the correct images. If you design a bot that
starts a combinatoric attacking by trying random guesses we can easily confuse
it, so most attackers try to use a solver service.

We can also identify how you interact with semantic content in the images when
you click on the image and characterize your mouse interaction as human or
non-human. Since confidence increases as more results come in we can also run
them after the initial pass and then shadow-ban bots. (And notify the targeted
website that we have determined e.g. a particular signup is a bot.)

Ultimately, many techniques beyond simple correct/not-correct are required to
defend against the main attack vector: humans hired to solve captchas en-masse
and make thousands of fake accounts. Modern ML is pretty effective for these
kinds of problems. Browser obfuscation does not add real security, and today's
reCAPTCHA (all versions) is easily defeated in practice.

~~~
jammygit
This solution has more or less locked me out of certain accounts, except for
when I want to spend a whole evening solving captchas in order to log in. I
just don't use those services anymore, which means I've lost paid-for content
that I'm practically locked out of.

------
doubletgl
Doesn't that particular Captcha also work in an incognito browser? I don't see
where all this complexity comes in. You simulate the mouse movement and the
click. Your browser pretends to not be able to run webgl, so no ghost image.
Forcing the user to have a history with google services would lock everyone
out who's new. The user agent and other browser metadata is easy to fake.

------
yeutterg
After reading this thread yesterday, I had a nightmare in which I called 911
but was required to answer an endless stream of personal verification
questions, never able to report the problem.

~~~
squarefoot
This already happens with many automated service call centers: endless key
tapping before an actual carbon based lifeform who can understand the problem
picks up.

------
tptacek
I believe this is related to anti-spam work that Mike Hearn did, and described
at a high level on the ModernCrypto lists:

[https://moderncrypto.org/mail-
archive/messaging/2014/000780....](https://moderncrypto.org/mail-
archive/messaging/2014/000780.html)

(search "Javascripts").

------
taftster
I'm not convinced that picking the pictures has anything to do with actually
convincing google if you're a bot or not. I mean sure, it's an indicator. But
I _know_ that I can pick the right pictures of school buses and store fronts
every freaking time, so that's only a very small indicator.

More likely, the majority of the algorithm is devoted to the "fingerprint" of
your browser. If you have adblock running, you may not have a google ad
cookie. If you have a randomizing user-agent addon, you're going to get
blocked.

What we need, for captcha, is an addon that makes you look as human as
possible. Mouse click timings become random. Javascript fingerprint becomes
John Smith common. Third party cookies, temporarily enabled (and then pruned).
VPN traffic routed through a common looking gateway that few bots use. etc.
etc.

~~~
derefr
You're assuming that the picking-of-pictures validation logic cares solely
about what pictures you pick. What if it cares about your "mouse click
timings" _not_ being random, but rather looking like the mouse movements of a
human who is using eye saccades to examine and classify images, and then moves
their mouse only when they see and find one, and sloppily at that?

~~~
taftster
Right, that's what I'm saying. I was putting forth that humans are pretty
random with their timings. But the truth is, they are probably not and that we
all probably click in some uniform distribution.

But I am pretty sure, if you click on the images too quickly, you're going to
get stopped. Slowing down your clicks, moving around a little like you're
"thinking" seems to help. That's been my experience at least.

I'm personally rooting for the AI here. When the robots and crawlers become so
smart to become too hard for Google to distinguish between them and humans,
then we will all benefit. They are becoming smarter, which is why captchas are
becoming more difficult for humans.

------
speedplane
There a many services that can get around a Google Recaptcha. They are not
free, but cost roughly $2 for 1000 recaptchas. This means is that recaptcha
makes things more expensive, but still surmountable.

Services that use recaptcha should consider why they are using them.
Preventing spam, stealing proprietary data, and preventing actual harm are
legitimate reasons. On the other hand, stopping free information from becoming
truly free is bad. For example, numerous U.S. government agencies use captchas
to prevent scraping or analyzing of public documents. These government
organizations provide "public access" in the narrowest possible sense, making
it difficult to search for a specific record or analyze data in bulk. If they
can't do it, they should allow others to try.

------
taftster
Google runs a very profitable ad network. People who use adblockers don't
directly help support their revenue. I have a conspiracy theory that google is
making it hard for people running adblock to get through captchas. I want all
the code that is used by the captcha to think I'm running a pristine ad
displaying browser.

I wonder if recaptcha seeing your ad cookies (and other dark tracking
indicators) would be enough to help get through. Like we need a "recaptcha
profile" in your browser that would have just enough fingerprint to get you
through.

Robots don't watch the ads. Adblock blocks the ads. Google's revenue is mostly
ad revenue. I don't think it's coincidence.

------
blazespin
I never got this. Why not just run the browser in a VM, capture the screen and
do a mouse click. I mean, why bother with all the headless nonsense which is
an arms race. How could google possible ever defeat something that never ever
even goes near the operating environment and just appears as a simple human
mouse click?

I suppose you’d have to simulate the human movement of the mouse over to make
it look like a human actually did it, but how hard can that be? Just train it
with a few 100 Turks moving mouse pointers to click on links.

Though an interesting counter measure might be to inject cpu spikes and
measure the impact in the mouse movements and the robot controlling it.

Must be a fun job, both sides.

------
hexo
Why can't I just run headless chrome of firefox and have my bot click it
there? "Aha here it is, so I click using system 'fake' mouse click". where's
the catch?

~~~
wmf
The script can detect the difference between real Chrome and headless Chrome.

~~~
noir_lord
Point a webcam at the screen and wire a mouse to the computer controlling the
webcam, you'd have to simulate the computer moving the mouse like a human
would but I don't see why it wouldn't work.

~~~
joshuamorton
You've now forced spammers to purchase webcams. And write code or whatever to
make realistic mouse movements. This is expensive. Whatever they're doing
likely isn't worth it anymore.

~~~
cr0sh
fake-users-as-a-service business op?

I can see it being possible - racks of "robot arms" that move mice based on
whatever criteria is needed (in this case, reCaptcha).

It works for 3D printing, as well as device testing - so in theory, this could
be done too...

~~~
brokenmachine
I don't get why you'd need a physical robot arm? Just have a usb device that
simulates the mouse movements. Or just software to move the mouse virtually.

~~~
noir_lord
You don’t, you wouldn’t even need a webcam, you could just splice into the
video signal at output and interpret it directly (much as the HDCP bypasses
do).

------
seotut2
I think recaptcha and captcha in general are very overused today, as they
cause way too much inconvenience for the user. Why discriminate against robots
so much?

What's so wrong about crawling or using automated tools? With today's networks
and hardware performance most websites shouldn't concern themselves with
denial-of-service type of attacks, unless they're past a certain threshold of
popularity.

~~~
learc83
At work we have a scraper that likes to use a particularly expensive search
query using hundreds of different ips dozens of times per second, all so that
they can scrape data that is freely available from us as an XML feed at a
different url.

Every time I found a way to fingerprint and block them, they'd change their
bot to avoid detection. Captcha for rate limiting seemed like the least bad
option.

However, eventually I decided that instead of blocking them, I'd return random
results from the db. They weren't checking the data too thoroughly because to
this day they haven't changed the bot to avoid my most recent pattern
detection. They're still merrily scraping useless data.

~~~
seotut2
Rate limiting can be achieved without actually inconveniencing regular users.
Put a exponentially growing delay on the server's response for requests coming
in too quick succession.

~~~
learc83
The problem was that with the sheer number of ip addresses they were using,
and with the rate that some normal users used that particular endpoint,
regular users would have been inconvenienced--either by being forced to wait
or by being forced to do a captcha.

I would have set it to only show the captcha if the delay was active, so users
would effectively have had a choice of wait or captcha.

------
bhntr3
I haven't seen any mention of keystroke (or mouse) biometrics
([https://en.m.wikipedia.org/wiki/Keystroke_dynamics](https://en.m.wikipedia.org/wiki/Keystroke_dynamics))
When the checkbox appeared I assumed that was what it was.

Keep a history of the biometrics of the devices either on the account or in a
cookie. Use a sufficiently secure and obfuscated language to capture and
upload the data (maybe using steganography in an image, explaining the weird
image uploads.) Prompt when the biometrics don't match a known user of the
device/account (according to ml). Or if the tracking data didn't exist
(incognito) then prompt using a backup model trained in the difference between
the biometrics of known bots and known humans. Keep a very loose threshold
here.

That's how I'd do it if I were Google (and no one else because only Google can
afford that.) All the browser fingerprinting stuff mentioned is great but
doesn't really work as much as you'd hope in practice.

~~~
phire
The basic theory behind that checkbox is to attach an unmovable cookie to your
browser.

The majority the client side reCAPTCHA is fingerprinting to make it impossible
for spammers to steal cookies from legitimate users.

Once you have the immovable cookie, is easy to do regular reCAPTCHA challenges
until you are sure that browser is being used by a regular human.

You will notice that if you ever move to fresh OS install, or a different
browser that reCAPTCHA suddenly starts showing you image challenges again,
which last for several weeks.

Keyboard/mouse biometics is a nice theory. But that's all it is. It doesn't
work as a general CAPTCHA solution because it's so easy for bots to fake human
looking input.

~~~
bhntr3
Great points. I agree the critical and biggest innovation is building a secure
environment inside the browser. When I explored keyboard/mouse biometrics it
was for detecting account theft which is a bit different.

If they have a way to create a secure, immovable cookie across browser
sessions even in incognito mode then they don't need biometrics. In the
absence of persistence, biometrics could serve as the cookie. Even with a
naive approach in a hackathon, a member of my team was able to get very high
precision identifying users based on a small sample of keyboard and mouse
movements. I'm sure Google can do better.

So it's not really about attackers being able to look like any human. It's
about being able to look like a specific human. Which is much harder.

But maybe you have more experience? We abandoned it seemed intrusive and
because we knew we couldn't invest in the secure environment. Without that it
doesn't matter. And with it maybe there's an easier solution. But I figured
that given Google made it that they would be using keyboard/mouse movements
for user identification.

------
avodonosov
So how does it really work? In the article a lot of words about obfuscated
code reading browser fingerprint, but that's just a fingerprint, a bot can run
a browser as well and fingerprint will not reveal him.

How clicking the checkbox helps? Do they measure the delay it takes me to read
the captcha request and react by clicking?

------
yalogin
Can anyone point me to the virtual machine and encryption mechanisms used by
Google, as was alluded in that link?

------
judge2020
If you often get this while using Tor -
[https://privacypass.github.io/](https://privacypass.github.io/)

For firefox -
[https://github.com/dessant/buster](https://github.com/dessant/buster)

------
untangle
I see a lot of conjecture and theory and not a lot of evidence. AI models?
Maybe for ad behavior but not for this. Google has you and your machine ID'ed
and fingerprinted. That's the core of the authentication. The rest is
subterfuge and obfuscation.

Is there any evidence that I'm wrong?

------
ThePhysicist
So basically they do very advanced browser fingerprinting? I wonder if they
keep that data around, as that would tell them who uses which third party
services (not that they would have trouble working this out by other means)
and would make a nice addition to their tracking efforts.

It’s quite depressing to see that almost all sites now require sending data to
Google just to log in to them. Not to mention that they help to turn billions
of users into clickworkers for annotating AI data. Tell me what you want but
I’m pretty sure they show image captchas to users that they are absolutely
sure are no bots. I use Chromium on Linux and I’m logged into a Google
account, still I have to solve three or four image captures at times to use a
login.

------
wolco
The bigger question. Why do we treat bot visits differently. Automated
submissions or manual have the same rate limiting controls that prevent more
submissions than expected. The content of a form shouldn't be acceptable or
not based on who sent it (human or machine) there needs to be another
verification process against the data submitted to expected/acceptable values.

Why couldn't a bot purchase a droplet or shoes. As a saler I would be happy to
sell to them. Purchases would be quicker and less wasted resources with humans
browsing the same product pages for months before buying.

~~~
dragonwriter
> Why do we treat bot visits differently.

Because bots are used in _multitarget_ and _multisource_ spam attacks that
humans can’t do efficiently; rate limiting on any particular target site,
particularly for submissions from a particular source, cannot prevent such
attacks.

------
thowmeaway
It only catches obvious bots. I write stuff that gets past ReCaptcha all the
time. I'm just one guy and I am not even that good at this.

Plenty of other people get past it too:
[https://medium.com/@jsoverson/bypassing-captchas-with-
headle...](https://medium.com/@jsoverson/bypassing-captchas-with-headless-
chrome-93f294518337)

I am pretty sure Google is just doing an 80/20 rule here, or even a 99/1 rule
since there are so many simple bots that are easy to detect.

------
dazhbog
The other day I noticed that if you leave the tab open long enough the images
become grainy..

I wonder if they also overlay another image that to humans looks like noise,
but its a neural network attack[1][2]

[1] [https://youtu.be/SA4YEAWVpbk?t=34](https://youtu.be/SA4YEAWVpbk?t=34)

[2]
[https://medium.com/datadriveninvestor/8b966793dfe1](https://medium.com/datadriveninvestor/8b966793dfe1)

------
ryantgtg
Totally anecdotal: Back when I was using a vpn and would see this recaptcha
more often, I found that the recaptcha would often declare me a bot (and give
me a second chance) if I clicked the boxes too quickly. Like, the storefronts
would load and I'd immediately click+click+click+click then submit. But if I
slowed down and staggered my clicks, it would realize I'm just another
inefficient human who needs time to move a mouse and make decisions.

~~~
taftster
Right. I think I have seen similar behavior for mouse movement as well. Not
sure, but I think moving your mouse around randomly helps as well. The timing
of picking the images, as you say, helps. As does the order you pick them in.

Of course, I think the biggest thing is your browser's fingerprint. If you are
using a lot of privacy blocking addons, etc. you are going to be spending a
lot of time looking at captchas.

------
aasasd
Could anyone please translate this part to a normal technical description?:

> _Google’s invented language is decoded with a key that is changed by the
> process of reading the language, and the language also changes as it is
> read._

I feel like that's either some everyday cryptothing that I'm too tired to
realize right now (ahem hashing cough?), or some clever stuff that I want to
know quite a lot. Or Google cracked the secret of writing Malbolge.

------
_cs2017_
Why wouldn't a bot just use a proper browser (not headless), detect the "not a
robot" box at the pixel level, and click on it using browser automation or
some mouse movement script? At today's level of bot sophistication this seems
almost trivial. Sure it might cost more in resources, but I doubt it's
economically prohibitive when you're making at least a cent per fraudulent ad
click?

~~~
judge2020
I believe the Quora answer is putting a little too much faith in Google using
"fair" factors when determining if a client is a robot.

I'm sure it plays a part in determining "hijacked extension" activity vs human
activity, but it's likely that the majority of the decision is how much recent
activity your signed-in Google account has, whether or not you're signed in on
Chrome (Firefox has a lot more stories of recaptcha challenges), and maybe
even if you have Google WiFi or Google Home linked to your account. I wouldn't
be surprised if they purely whitelist accounts that subscribe to Google Fiber
or Fi.

------
alt_f4
Judging by the amount of spam i receive from a reCaptcha v2 protected contact
form, I'm positive bots can tick the box or circumvent it.

------
ValleyOfTheMtns
I feel somewhat vindicated by this story. I was talking to a couple of work
colleagues about the "I'm not a robot" box and they were convinced it worked
purely as a time delay, to slow bots down. I had a feeling that there was far
more to it than that, but I didn't know how/where to get that information at
the time.

------
bluedino
They encrypt it twice? Forget about it, then.

~~~
orblivion
Triple distilled, with two scoops of raisins.

------
tyteen4a03
Is there an easy-to-implement, simple-to-use alternative solution out there
that delivers better UX, does not require i18n treatments and stops spambots
just as well?

I'm not seeing any except asking questions about the site, and even that takes
effort to translate. Numerical math questions I assume can be easily bypassed.

------
arendtio
Sometimes I get really angry about that captcha. For example, when I am trying
to buy something, having an account on the store with 100+ orders, and still
having to play 6 rounds or so of 'find the car' at the speed of the very slow
fading images.

So sophisticated and still such a pain in the a __for the user.

------
mnm1
This explains everything except why Google itself won't let humans through
even when they've selected the right things a million times. All this
sophistication and it's still too stupid to do its basic job of knowing the
right answer which couldn't be more than a few equality checks away.

------
seanwilson
The linked post doesn't give any concrete answers in my opinion. Can't a bot
use a real browser to pass the CSS tests? Can't the behaviour of a human be
recorded and mixed in with the behaviour of the bot to seem more human like?
Why can't the encryption used be decoded?

------
ascii_only
unCaptcha2 claims 90% accuracy at beating reCaptcha.
[http://github.com/ecthros/uncaptcha2/blob/master/README.md](http://github.com/ecthros/uncaptcha2/blob/master/README.md)

~~~
executesorder66
This addon also works in a similar way:

[https://addons.mozilla.org/en-US/firefox/addon/buster-
captch...](https://addons.mozilla.org/en-US/firefox/addon/buster-captcha-
solver/)

------
ddtaylor
You certainly can click that box using chrome headless or a simple VNC setup.
It's insanely easy. Google has replaced the idea of a CAPTCHA (completely
automated public Turing test to tell computers and humans apart) with a basic
rate limiting heuristic.

------
klyrs
Okay how about a button that says "I'm a robot" and you only serve to sessions
that don't click it. "Sorry you said you were a robot. If you clicked it by
mistake, clear your cookies and uninstall at least 50% of your fonts"

------
jonathanstrange
I don't understand this. When I get the "I am not a Robot" box, in 95% percent
of all cases I only have to click the checkmark. That's it. A robot certainly
can do that, too.

In the remaining 5% I get a Captcha that is impossible to solve, e.g. it
states "Please mark all cars" and once I've marked all cars it states "Please
also mark all traffic signs" and when I've marked all traffic signs it states
"Please also mark the following traffic signs" and so on, for as long as I
bother to try.

I have never encountered a Google Chaptcha that worked in any other way,
either it works trivially with one click or never.

My conclusion has always been that Google Captcha are simply broken for anyone
who runs an ad-blocker, and I don't don't bother with services for which it
doesn't work. Problem solved.

------
smadurange
"the language also changes as it is read" Lol. Quora and its nonsense.

------
nixpulvis
I feel like a "bot" with access to control your mouse and read the output from
the graphics system (aka get an image of the screen in real-time) could have
little trouble with these things.

~~~
brokenmachine
_> could have little trouble with these things_

That's a bit of an ambiguous sentence - are you saying they would have little
trouble, as in not much trouble, or the opposite?

------
robk
Our national mail carrier added this to their tracking page! Makes tracking a
parcel absolutely maddening and prevents third parties from doing simple (non
abusive) aggregation of tracking data.

------
michaco33
I wish they finally could so I could stop being held hostage through multiple
rounds of visual recognition tests -- how long is it going to take me to prove
I'm human when I'm senile?

------
not_a_cop
Oddly, they don't answer the harder question.

How you can prove that a human genuinely wanting the information in the page
is filling that information out? Imprisoned humans are great CAPTCHA defeat
bots.

------
superlupo
I hate it because I always have to solve lots of storefront or traffic sign
images as I have disabled third party cookies because I do not want to be
tracked by Google all over the web.

------
walrus01
Turns out that the low cost way to bypass this is to hire a click farm of 25
people in Bangladesh, each running remote desktop sessions to about a dozen
virtual machines.

------
fubaron
Well, that's not completely true:

[https://www.youtube.com/watch?v=fsF7enQY8uI](https://www.youtube.com/watch?v=fsF7enQY8uI)

------
ainiriand
That answer is not an answer. It is just a way to look smart while not helping
at all. I've seen this behavior a few times, mostly at work.

------
ezioamf
[https://github.com/ecthros/uncaptcha](https://github.com/ecthros/uncaptcha)

------
kumarvvr
Wait, what is stopping me from using something like selenium, and mouse
automation tools like in python and simply do what a human does?

------
sascha_sl
Yes.

You get punished for blocking Google tracking.

I love it. Totally not evil(tm).

Very tempted to cancel/stop using anything that uses this crap, but it's a bit
too ubiquitous.

------
IncandescentGas
It is difficult to prevent being tracked by google, when a website forces you
to expose your visit to a google property just to log in

------
1stranger
Protip: if you click on the headphone icon on recaptcha you get an audio
challenge which I find much simpler and less annoying.

~~~
executesorder66
And this addon makes it even easier:

[https://addons.mozilla.org/en-US/firefox/addon/buster-
captch...](https://addons.mozilla.org/en-US/firefox/addon/buster-captcha-
solver/)

------
danschumann
I'm sure it could.. if you had the minimum hardware and a smart enough bot.
It's always going to be an arms race.

------
peterwwillis
Back in the late 90s, my teenage friends and I were writing Visual Basic
programs to automatically click ad banners and surf webpages to generate
money. I doubt it's significantly more difficult to do the same now with a
captcha box.

------
jtth
I think less of whatever entity puts it in front of me.

------
modzu
the answer is basically wrong. robots can and do. captcha has become an AI
training tool and it now blocks _humans_ as a consequence. the convenient
side-effect being a nudge to those humans to change their behaviour in such a
way as to never need captcha and thus support google's business: enable
cookies, login to google, use unique identifiers (IP), etc

------
rotrux
[http://sikulix.com/](http://sikulix.com/)

~~~
jammygit
Does that actually work? I found the images had to be a pretty close match for
it to recognize something

~~~
rotrux
Probably not out of the box. It just seems like a more promising approach than
a browser API.

------
codeulike
Its getting weird when you have to fill that box in with pen and paper

 _Marci Robin was buying a Fiat 500X from a West Palm Beach, Florida
dealership, and was in the final stages of signing all the paperwork, when she
was presented with a strange but simple question: was she a robot?

This wasn’t online or anything, she was right there, in person, in front of
the sales person, who wanted her to check a box, with a pen, on real paper,
confirming that she was not, in fact, a robot. She claims she isn’t._

[https://jalopnik.com/dealership-makes-woman-sitting-right-
in...](https://jalopnik.com/dealership-makes-woman-sitting-right-in-front-of-
them-c-1826232532)

edit: and the original tweet:
[https://twitter.com/MarciRobin/status/998030243981033472](https://twitter.com/MarciRobin/status/998030243981033472)

~~~
zxcvbn4038
I had roughly the same problem. My facebook career lasted for all of thirty
seconds before they decided I was a robot and banned my account. I appealed
and they asked me to send a photo. I sent them a photo and they said I was
still a robot and there was no further appeal. Very helpful. So my choices
from that point are to 1) shame them in the media and hope its an entertaining
enough story that people like it or 2) do something better with my time then
facebook. I chose the latter.

~~~
defertoreptar
Just throwing this out there. Have you considered that maybe you are a robot?

~~~
echelon
I know this comment is in jest, but please don't joke on Hacker News as you
would on Reddit. Posts like this add little informational content, spawn a
thread of joke responses, and take time away from hundreds of people who are
here to read high signal commentary.

Sorry to be so profoundly un-fun.

edit: I'm being downvoted, but consider the comment chain this has generated.

~~~
jancsika
> I know this comment is in jest, but please don't joke on Hacker News as you
> would on Reddit. Posts like this add little informational content, spawn a
> thread of joke responses, and take time away from hundreds of people who are
> here to read high signal commentary.

Here's a challenge:

Create a bot capable of generating and posting content like the above that's
attributed to "zxcvbn4038."

The bot doesn't have to be able to get through spam filters or register
itself. It just has to take the current state of a forum thread as (part of
its) input and output a) content for a post and b) a position in the tree to
insert it.

The human bot-author can then try to insert the content into the thread at the
given position. If the post doesn't get detected as spam _and can avoid
downvotes_ , the bot wins.

The prize is the sheer joy of contradicting what I'm assuming is an HN mod:

> edit: I'm being downvoted, but consider the comment chain this has
> generated.

Edit: added rule about avoiding downvotes to retain "high signal commentary."

~~~
jancsika
And it looks like we may already be there:

[https://www.theguardian.com/technology/2019/feb/14/elon-
musk...](https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-
ai-writes-convincing-news-fiction)

Yet some more signal for this thread that started with a joke.

------
pluma
tl;dr: because Google massively invades your privacy to know whether you're a
bot or not before you tick the box, also the CAPTCHA is a bit more
sophisticated than just showing a checkbox to click on.

------
anth_anm
Bots that can defeat captcha easily are probably the AI development I'm most
excited for.

The amount of time I've wasted clicking on crosswalks and store fronts is way
too high.

------
wifirouterlogin
Interesting

------
_the_inflator
Put another way: basically this is another Google tool to track user behaviour
that validates the fingerprint Google already has. Sneaky!

~~~
scarejunba
Honestly, if there were a web standard where I could opt-in to all of this
tracking and it meant I would be ‘trusted’ I’d happily have my user agent send
them almost anything they want. I trust Google not to fuck me.

