Google reCAPTCHA is the absolute worst. It makes me solve several puzzles very often, usually when I use a mobile network and I’m not logged in with any Google account. It’s so frustrating that most of the times I find a reCAPTCHA I give up before trying and just go elsewhere e.g. when a site uses reCAPTCHA for sign up or after the first failed login, I’ll most likely skip if I don’t absolutely need to access such website.
Glad to see I’m not the only one who’s getting tired!
I am starting to think that I may be a machine given how many time I fail that stupid test...
The traffic light is absolutely the worse. Should I select the box that includes the vertical yellow pole? And what about the horizontal one between two opposite traffic lights? And if just a small fraction of the border of the traffic light is in a box what should I do?
I can’t wait for a captcha solver as a service honestly...
I can't use Google search directly any more without being presented with a never-ending series of these things. Thus I use DuckDuckGo, and if it returns inadequate results I rerun it with "!g" at the end which sends it to Google. That seems to work without CAPTCHAs.
The worst part is that quite a huge percentage of the Internet relies on it! Soon we won't be able to use any sites whatsoever because of it. I don't like where this is going. :/
Indeed, I definitely am not happy with how much control Google has over the Internet in general. Everything from how they present and rank search results, Google Analytics scripts everywhere, the sometimes vaguely-political messages on their homepage, the ostensibly-anti-bot checks including CAPTCHAs and just plain banning you if you want to do more "advanced" searches (like the ones Fravia would've taught...), etc.
There's other (mysterious to me) stuff that sites call to Google for, apart from captchas and analytics.
As a uMatrix (and former NoScript) user, I've long noticed that many sites make calls to ajax.googleapis.com for I have no idea what. Quite often the site will refuse to work without that.
To be fair, a lot of sites make use of javascript from a lot of other sites as well: cloudfront and amazon are common.
As a uMatrix user as well, I have manually whitelisted ajax.googleapis.com. It is actually not tracking scripts, but hosted libs like jQuery, which many many sites use and will not function without. See here: https://developers.google.com/speed/libraries/
It's using you as human labor to label data. That's why you have to solve multiple. It's annoying but greatly helps accelerate research and development since data is king.
No, I don't enjoy Google Books, but that's really not the point.
If that's what I'm helping you do then tell me that and let me opt-in. Maybe I want to contribute. But, don't instead throw up a road-block to the service I'm actually trying to consume, under the guise of providing that service, then double my required effort to help you provide some other service.
It's actually implemented as a dark pattern. If you want cheap labor then use Mechanical Turk or otherwise. But, don't steal thousands of hours of your customers' time under misleading pretenses.
Google isn't providing a service to you, they are providing a service to the site that uses the captcha. And they haven't put any roadblocks, the site did - by installing a captcha.
You should complain to the site that decided to offload their captcha costs to you.
I don't see why they would. If some delivery service had an expensive pay-on-delivery model, and some ecommerce site decided to use it, but didn't tell you (the buyer) that they did, and so you got a surprising shipping bill on receiving the product, would you blame the delivery service?
Your relationship is with the site, and so it should be their responsibility to inform you of the "bill" you'll have to pay to Google for using the site's captcha-protected services.
The analogy doesn't hold. For this argument to be valid, we'd have to pretend that dark patterns don't exist.
And, Google is definitely employing a dark pattern here. This is designed to be a stealth tax, to go largely unnoticed by anyone, including the site operator.
Even Google's selling of the product doesn't make clear that users will be doing additional work; instead suggesting that saving the world will be a by-product of a normal captcha process. [0]
This isn't true any longer -- reCaptcha used to help digitize text but now it's all Google Maps (streetview) images they're using to train their own self-driving and Maps software.
I think it works by labelling something like half the images, but doesn't tell you which is which. At least that's how the older text recaptcha stuff worked - you could easily identify the pre-labelled text by the kind of distortions they used and just enter any old gibberish for the rest. So actually the training stuff is extra work beyond just classifying you as human
By showing the same images to thousands of other people and going with the consensus, I imagine. For any given label, if you pick a photo that few other people picked, it presumably considers that to be "wrong."
its because google no longer just tests if you are a human- they also train their model- try this- and I've tried it successfully far too many times-
Enter your first captcha wrong and then half the times, on the second one it won't say you did first wrong. (it's because first was a learning set). do the second one right and then you're good.
I deliberately mark the exact opposite of right boxes in the first captcha at times, takes me the same time but at least I can say I delayed skynet.
You are training it to better translate scanned books for Google Books. That's where the samples come from. They are words from actual books that Google isn't sure about. Stop messing up Google Books, please.
In my experience if you just press next captcha without solving them, a bunch of times, it will stop with those and give you one of the "pick all storefronts" with multiple pictures, that can actually be solved.
Too me it seems that if you take your time and try to answer them properly they will try to get you annotate as much data as possible. While if just click though it fast it'll let you pass.
I got the 'pick all the storefronts' one just this week.
Was mildly infuriating because what constitutes a storefront seems pretty culture specific, and I got shown a bunch of buildings, where half of them i had to go 'well, that MIGHT be a storefront, it looks kind of commercial, but it doesn't appear to have much explicit signage...god dammit..."
indeed. it needs to bloody stop. its no different than a whites only washroom in 1929. im not logged into google tracking. so im not allowed to participant in a huge swath of the internet unless i spent 15 minutes a day training AI. when will UX be taken seriously? when will users (aka humans) be respected?
The most annoying thing for me is that I often encounter them in various websites for the Brazilian government, when I want to get somd official documents.
This is how things are done at the Federal University I'm currently attending. I have filed a complaint against the university on the grounds that they're using Federal resources (the students' labor, whose place there is being paid by the government) to enrich a private, international corporation (Google).
This is wrong on so many levels.
There's no decision yet, and the process is still running.
This is one of the major pains of building a platform: you also have the limit the damage that a "trusted" user can do, like if a moderator were to get hacked by username/password reuse.
I learned this the hard way on a forum I built and ended up painstakingly building a feature that lists all moderator actions and a "Reverse" button next to them. ...Among many other similar features.
It's funny to think back on the day I decided to build a forum: "How hard could it be?"
the word combination "fuck Google" has become very common in my thoughts when I encounter Google's reCAPTCHA... Google is breaking the web in many ways and this is one of them.
On my part it is a direct reaction to developers and their employers cutting corners and adding these challenges to login forms and anything else you can imagine. It's entirely reasonable to show a challenge after a couple of failed login attempts, but they should never be part of the default login flow. These decisions hurt users.
If you work on a product that shows a CAPTCHA while logging in, please discuss this issue with your team and consider not challenging your users during their first login attempt.
I do use a CAPTCHA, but not on a login form. It's solely for a "contact us" form. We do try to encourage just regular email with a mailto: href, but unfortunately, customers expect a form.
And, if I don't use the captcha, we get flooded with spam. We are using Google's "nocaptcha", which is usually unintrusive, but is a pain for anyone not logged into some Google property.
Also a pain for anybody using tracking blocking. I get captchas all day on all kinds of sites and I'm always logged into anywhere from 3-12 google accounts.
We do put a clickable mailto: url on the page as well. Unfortunately, unprotected forms get tons of automated spam. Emailing a verification doesn't work well...customers just don't pay attention to instructions :)
I just base64 encode my email address and decode it in an onload event handler. Most bots don't seem to execute JS because I haven't received a spam email yet.
I remember reading about a blog that used this method something like 10 years ago, and wondered why I don't hear about it more often. Glad to know it still works just as well as back then.
I made a contact form that sends the data URL encoded if JS is disabled, or JSON if JS is enabled. If the back end gets url-encoded data, it's spam.
I also use the hidden field trick, but I label it phone number and fill it with zeros. Then I hide it with an external stylesheet. I don't actually want a phone field, its a decoy. If it is changed from zeros, it is spam. Most spam bots don't seem to parse external stylesheets.
I've had zero contact form spam with this method. Mind you it is a very low volume site.
I was just having fun with it, making a php-style contact form backend but using node.js. I read lots of those old blog posts you mention.
Make some text asking a question "What is five plus 2?" And there's an empty text field next to it that on the backend looks for the trivial right answer.
That should reasonably stop spam, and still allow a field based html mail sender. And eliminates a dependency on Google.
For a contact form I think it's ok to show a challenge. It won't eliminate spam entirely or work against targeted attacks, but at least it keeps out the bulk of annoyances (at the cost of annoying some of your users :P).
EDIT: Have you considered putting the messages in a pending state instead? You could ask for their email in the contact form, and send a confirmation link that needs to be clicked for the message to be validated. Unvalidated messages could be deleted after a week, without any human interaction. I'd expect this to have comparable effectiveness against untargeted spam.
Yes, and Google also blocks the audio challenge sometimes, so people with impaired vision or with neurological issues that affect object recognition are effectively denied access to a large portion of the web.
nocaptcha aka reCAPTCHAv3 is the worst choice. At least with v2 users of firefox/adblock/ec get a chance of passing the captcha by answering challenges. v3 takes that opportunity away from them.
The reason websites are CAPTCHA laden is because they want to make it harder for people to legally scrape their sites. For this reason, you'll be hard pressed to convince the owner of a site with a login CAPTCHA to remove it.
I think that stems from flawed security models. CAPTCHAs are trivial to bypass, and scrapers can keep their session cookies to never see the login challenge again. Regular users are the ones suffering from CAPTCHAs on login forms, not scrapers.
>It's entirely reasonable to show a challenge after a couple of failed login attempts, but they should never be part of the default login flow.
Nope, that only works if there weren't botnets that just multiplex millions of requests over thousands of websites.
Anyone who has tried IP-blocking bots has run into this where 50k+ IP addresses just need to send you one request per couple minutes.
Your posts in this thread show how easy it is to be against recaptcha without addressing why people use it, or you suggest it's a flawed security model when people use it. It's kind of hard to take your advice seriously when I think of real world websites fighting real world abuse.
In another post of yours, you recommend the web developer to just expend more and more effort to circumvent abuse without recaptcha. Like creating a whole pending message system for, say, a forum instead of just a contact form. In fact, you'll find that there's no shortage of work for you to do once you attract abuse. And that's an easy solution to prescribe when you have no skin in the game. God forbid the site doesn't even make any money.
I would have appreciated if your post would not have been a personal attack. I was suggesting limiting login attempts based on a particular user account, not an IP address, and I have never talked about a pending message system for forums.
Most brute-forcing at scale isn't for a specific account, it's for {uname,password} tuples. As in, you buy these combo lists online and rent a botnet to try them out on a list of sites.
What would your advice achieve?
I changed the "you" to "your advice", but I think that's a petty distraction since I'm making specific points against your advice, not you as a person. And this comment is an example of what seems to be a disconnect between your understood attack models and the attacks that people actually need to defend against.
Discord defends against combo lists by showing a CAPTCHA and asking for email confirmation if you log in from a new location, but no challenge is shown during the default login flow.
reCAPTCHA's tile fade-in doesn't punish bots, it punishes real humans. Your argument might be valid if reCAPTCHA were a true CAPTCHA, rather than compliance/recruiting mechanism used by google to abuse people until they relent and adopt google products.
But yeah, it's free so you can't resist using it. Have you ever stopped to wonder why google gives this service out for free? When you integrate reCAPTCHA into your website, you're selling your users out.
Humans are the problem, not some mythical AI powering the bots. It costs a fraction of a cent to get a CAPTCHA solved. The only real counter being used is making it take longer to solve a CAPTCHA - which is exactly what the services like reCAPTCHA do, while minimizing the impact on heuristically 'good' users.
Making CAPTCHA solves take 30 seconds or a minute instead of 5 seconds is the state of the art.
The heuristic google uses for 'good' is 'compliance with the google surveillance system.'
And if throttling attempts is in fact the state of the art as you claim, you can do that without pulling in google code at all. I trust you are that competent at least. However the fact that the noscript version doesn't make you wait at all leads me to conclude your excuse for google's behavior is bullshit.
> And I SUCK at these to the point where I think I’m not getting the rules of the game.
I have a feeling that that they've switched from a human source of truth (e.g. 3 other people got the same tile and consistently classified it as X), to a machine source of truth (e.g. their image classifier says it's X).
Now instead of thinking like a human to solve a captcha, you have to think like their sort-of-shitty "AI."
Also, the recaptcha puzzle I hate the most was the one where you had to classify street signs--but then it showed you foreign signs in foreign alphabets. Street sign conventions can vary so much around the world that the puzzle was manifestly unfair. I haven't seen this one in a while, so I do hope they retired it.
This has tripped me up as well. I’m somewhat sure now that only the bulbs matter. I have also had higher success rates when not marking tiles that contain only a very small part of what they are asking for. They are sort of teaching me to better approximate more careless people.
I'd guess they just make you solve an appropriate (TM) number of puzzles to absolutely certain you really are not a robot.
Of course, being logged in to your google account, preferably in chrome, and not blocking any of their scripts or cookies, would also go a long way for that. wink
>"I'd guess they just make you solve an appropriate (TM) number of puzzles to absolutely certain you really are not a robot."
If it were a matter of security then they wouldn't allow people with javascript disabled (which could reasonably be considered a bot-ish characteristic...) to pass through after solving a single challenge. Given how much easier the noscript version is to solve, who would make a bot that requests and solves the normal version instead of the noscript version? Any bot making a semi-serious attempt to crack their captcha will try the noscript version first.
Multiple challenges aren't there for more security, they are there to get more free labor out of the human. And the tile fade-in is to punish the human for not using the web in a Google Approved™ manner.
I feel like I have more success if I deliberately wait a few seconds before clicking things. Have you watched an average computer user click things? If you're posting here you probably have extreme outlier mouse skill, and I think Google penalizes anything unusual.
You're doing it right. Google is gaslighting you; lying and telling you you've failed challenges when you actually solved them correctly. They do this to punish users who opt out of the google 'ecosystem' by not having a google account, not using chrome, using adblockers, etc.
The proof of this assertion comes when you manage to enable the noscript version of reCAPTCHA (which is only available on sites that have opted to use the lowest security setting). Once you start using noscript reCAPTCHA, you discover that your correct answers are accepted the first time every time. The challenges have the same format; click the cars, click the traffic lights, etc. There are two differences: the tiles don't fade in slowly, and the correct answers are always accepted.
Presumably when google implemented their dark patterns in reCAPTCHA, they couldn't be bothered to implement them when javascript wasn't available. I hesitate to draw attention to this since Google might correct their error, but I'd like for people to become more aware of their anti-social business practices.
(By the way, the noscript version will accept either sort of answer. Only the bulbs, or the entire enclosure. Both answers are accepted.)
I actually don't think that they are trying to punish people for not being logged in. If that were the case, they would make it clear that having a Google account and leaving it logged in, and using chrome would make the captchas appear less frequently or not at all.
In reality, I think they are telling people they were wrong in order to extract just a little bit more free machine learning data from them. And really, they don't actually tell you that you are wrong, they just keep feeding you new questions. This is most likely because they don't know whether your answer is correct or not. They are giving this specific question to a bunch of people, then comparing the answers to get a correct response for later.
So you might get a few training questions first, then end with a question that they already know the correct answer to so they can decide if you are trying or not.
It's quite brilliant and is going to really catapult Google ahead of the rest of the world when it comes to self driving cars. The cars will be able to use all the captcha data to not only train their ML models to recognize things like signs and whatever, but the ML won't even be fully necessary since they have the re-captcha volume to basically directly identify every roadside visual on the planet one by one.
I've seen those ones too, and was planning on writing a filter that replaces the "normal" ones with the noscript one, on the assumption that it was just someone not copying in a "<noscript>...</noscript>" fragment, but since you mention "is only available on sites that have opted to use the lowest security setting", I suspect that won't work.
Accessibility guidelines used to mandate that content was accessible without JS, which may be the reason why the noscript version exists, but it seems the latest revision has unfortunately removed that requirement. No, I will not run arbitrary code on my computer just to access your site...
I think people realize that, but is it much worse than proof of work that is helpful to nobody? It's easy to take a position against Google and Recaptcha. It's easy to take a position against something that inconveniences you.
What people actually don't seem to realize ITT is that abuse is becoming so easy and such a problem that we are becoming increasingly reliant on centralized services like Cloudflare and Google.
You used to be able to just generate your own captcha on the server with simple libraries, but Xrumer (mass website-spamming software) could crack those 10 years ago.
I'd like to see more comments addressing the ever-lowering barrier of online spam/abuse instead of opting for the low hanging fruit of condemning people for trying to save their websites/platforms from it.
> I liked the idea when they just showed images of words that an OCR couldn't read accurately.
Like I said, popular spamming software like Xrumer could crack those captchas ten years ago.
> And no, I disagree that we have no option but to rely on "centralised" services like cloudfare or Google.
Can you pitch alternatives, though? For example, an attacker can still spoof IP addresses in 2019 and create volumetric attacks that you certainly cannot endure without someone's help upstream (i.e. centralization). No need to bother with spoofing though since you can rent a botnet for peanuts. Attackers have decentralized attacks but there is increasingly only centralized defense.
In my mind the answer is in building decentralized apps/services.
A DDoS on a static site cached on just about any CDN that runs logic exclusively on the client is much harder to pull off successfully because it's so much cheaper (practically free?) to mitigate, and doesn't affect any existing users who would already have the necessary resources cached locally.
I thought that until my sites behind AWS' CloudFront were repeatedly DDoSed and I saw my bill.
AWS did let me report these DDoSes and they would reimburse me, but it felt wayyy too precarious and I ended up switching to Cloudflare (free).
And I think that should worry us all.
Also, only the most trivial sites can be 100% cached. And those are the sites who need Recaptcha the least (or need a server to get a challenge from). Abuse is not a simple issue to solve.
> I think people realize that, but is it much worse than proof of work that is helpful to nobody?
It pollutes the original intention with a different goal. Is the goal to easily distinguish between bots and humans, or is the goal to trick people into doing free classification work for Google? Different intentions lead to different incentives which may lead to different results.
My first reaction to your comment was that I would be pretty shocked if anyone here didn’t realize that, and wouldn’t you know I have now read more of the comments and here I am shocked.
The idea that this problem results from heightened security measures is wrong, but it’s not laughable; it’s just sad.
Whenever I browse with the TOR browser, it's been 100% impossible for me to verify myself as a human even if all my answers are correct.
I think they need to fix this bug, or at least give a message that the CAPTCHA won't be solved so we no longer waste our time.
Their No CAPTCHA's are very rare for me when I'm browsing logged in to my personal google account under the same session, on a normal browser.
They can be confident I'm not some kind of a bot, yet they still require me to solve on average two different tests to train their "AI".
Oh wow, yeah I use FF with uBlock Origin and they are not short by any means.
An aside from the corporate dystopia at hand: I do not know as much as I wish about how this works, but am intrigued at how the objects so often bleed into other “boxes” at just the right amount to demand multitudes more cognitive energy to negotiate with myself over what side of this false binary to place my bets on.
Back to corporate dystopia, my awareness of the procedure and intent is so blackboxed that I feel like a mule. Since I started approaching them with sloppy selection and minimal to no discern, I’m doubtful that the length of the challenge has any correlation with a measurement of suspicion at all. Rather, those liable to be this considerate will similarly recognize identify the wrath the cookie monster.
Proving one is human takes human effort. Given that, I'd rather do something marginally useful than something completely pointless.
It would be cool if they released the gained data as open source, but that might compromise the service, and I guess someone has to develop and host this thing, so keeping it to themselves is fair enough.
The only issue is the conflict of interest (they benefit from giving me more captchas), but they don't seem to be abusing it for that end. They sure abuse their position of power by hellbanning you when they suspect foul play, but I doubt that it's because they want you to do more work.
The problem is that as soon as you're getting value from users solving captchas, there's an incentive to show them more captchas to solve before accepting that they're human.
Incidentally, it's a mystery to me why so many of the Gutenberg/Internet Archive books have such ludicrously bad plain-text renderings, utterly unusable/unreadable, with often barely one word correct per page - since neural nets have been getting very high scores on MNIST handwriting recognition (for example) for a long time now, maybe since the 90s? It's a shame.
On any level, the data represents an average product of human visual calculation. A simple use case would be comparing that with the product of machine visual calculations to better understand and optimize the systems they are designing.
I find it strange how all the comments here are blaming Google. Isn't it obvious that CAPTCHAs have gotten difficult because AI got better at solving them? Soon bots will be better than humans at solving CAPTCHAs, and the system will fail completely. I predict that then Google and Facebook will completely block new user signup from Tor, VPNs or browsers without cookines. Everyone else will require an existing Google, Facebook or similar account to create an account.
We hate them for the same reason we hate airport security theater; they do not work and have a high burden on the people being subjected to them. Plus, as it’s Google doing it, you can hardly escape the goddamned things. So yeah, we blame Google for using us as data classifiers and adding hoops and hurdles to the open net, while accomplishing precisely dick.
I for one hope that AI gets to the point that it can effortlessly beat them, so we can stop dealing with them.
> We hate them for the same reason we hate airport security theater; they do not work
Well, they absolutely do work. They work so well that they've reduced bot actions by almost 100% on our sites.
We wouldn't use Recaptcha if there wasn't abuse on the internet. But, unfortunately, there is. There is a sobering amount of it.
I'm actually curious about all these posts suggesting that websites use Recaptcha for no real reason or for some trivial reason. To me, it suggests a massive misunderstanding that people have about the internet.
It's certainly something to worry about, but how about this angle: abuse is getting so cheap and hard to prevent that we're electing the aid of complicated systems engineered by large corporations like Google. That scares me, but not from a Google=bad standpoint. It indicates that the internet has fundamental problems that make abuse trivial, and that's a different discussion worth having, but it's a much harder one than Google=bad. Probably less cathartic, too.
The false negative rate may be very low, this doesn't speak to the false postive rate. Considering the goal was supposed to be "telling humans and computers apart" both kinds of errors are important to consider. Google predicates their captcha primarily on the logged-in user's activity around the web, failing that they use the ip. A human being not logged in and using an open proxy is likely to get shut out completely. It ceases to be a captcha, in the traditional sense, at that point. This may be a good thing, depending on your standpoint.
Sure, but that should be a chilling reminder of how bad abuse is on the internet.
Obviously it comes with downsides, but it's a trade-off. Nobody uses Recaptcha for fun.
As I reminded a sibling comment, even HN uses Recaptcha on its login/register page. There's no telling how many fewer spambots we have to deal with every day because of it, yet we're somehow here discussing whether Recaptcha servers a purpose while profiting from it. :)
Well, they absolutely do work. They work so well that they've reduced bot actions by almost 100% on our sites.
We wouldn't use Recaptcha if there wasn't abuse on the internet. But, unfortunately, there is. There is a sobering amount of it.
I'm actually curious about all these posts suggesting that websites use Recaptcha for no real reason or for some trivial reason. To me, it suggests a massive misunderstanding that people have about the internet.
Or we’ve just managed to notice that CAPTCHAs don’t seem to keep millions of bots, spammers, dummy accounts, shills, etc off the net. I’m glad you’re having such perceived success, but it’s not a universal phenomenon.
Captchas do do that. They can't do everything, but they are an incredibly powerful tool.
Do you run a service that needs to prevent abuse at scale? What exactly is your Recaptcha replacement?
It's very easy to complain about the inconvenience of Recaptcha, but I'd like to see less of that and more constructive conversation about what all these supposed alternatives are, because all I see is "you don't need it" which is merely a reminder that, understandably, most people aren't running sites at a scale that attract abuse.
Even HN uses Recaptcha. Check out the login/register page.
It's not even bots that are breaking them much of the time. The spammers just get people in third world countries/using mechanical turk type services to break them. Captchas like this cannot stop a human determined to break the rules.
I think the main thrust is that the tool is not fit for purpose, but Google benefits from it anyways. They get the AI training data, but the website owners still get hammered by bots, and privacy conscious users and users who have the audacity to be from a country flagged by RECAPTCHA get to struggle. Living in Russia at the moment, I almost prefer the sites that just outright block me to the stupid RECAPTCHA that cycles me through 3-4 iterations of images only to ask me to try again.
RECAPTCHA is no longer doing what it promised, and that's the general thrust of the article; the bots are just getting too good for it.
It's not that, it's the fact that Google asks you almost never if you're using a Google account. That pretty much gives Google-account-havers a fast pass to the Internet, which gives Google even more power that it really shouldn't have. Also, they're using the captchas to train their AIs, which is unfair extraction of labor.
Honestly, I'm finding Google's captchas quite difficult of late because its Americanised. It asks me to identify crosswalks (oh.. pedestrian crossings, I thought you meant the pavement), find traffic lights (I don't traditionally expect them to be above the road or on motorways), or identify storefronts which are not always clear, maybe because I lack the cultural context.
And I'm from a major Western European city, which is about the closest I can get to American culture without being not, I wonder if they present the same captchas if they think you're from rural China or Uganda.
Every time a CAPTCHA thread comes up I have to point this out. By using one you’re externalising your business costs onto your users. You can make that choice, but if you do you’re far more likely to negatively impact the section of society that already has problems online: those who need to use assistive technologies.
I came here because that last point is getting lost in the discussion.
> While a bot will interact with a page without moving a mouse, or by moving a mouse very precisely, human actions have “entropy” that is hard to spoof, Ghosemajumder says.
It's bad enough that systems working from this (highly dubious, IMO) premise will force us all to use the mouse even if we're used to the tab and arrow keys; much worse is that there's no workaround for people who _can't_ use the mouse and rely on switch control. It sounds like an accessibility nightmare.
So, I use Vimium to interact with my browser with solely my keyboard 90% of the time. I wonder how much this in any way correlates with the absolutely infuriating amount of captcha challenges I get one after the other.
not to mention that somewhere in the back of your head there is a calculation going as a user that this is an expensive website to you, in terms of effort.
These really make my blood boil. I continually trip whatever it is that makes Google think I'm a bot (probably a VPN + ublock). Sometimes it takes upwards of 5 tries (each with 3 or 4 tests) to pass. After the first failure the audio one stops working, and sometimes that's unintelligible. I honestly wonder how anyone who's even slightly visually impaired is supposed to pass them.
I wouldn't be surprised if in the not too distant future they were hauled up before the courts on discrimination grounds, and not before time. There's something very wrong when a human consistently fails CAPTCHAs. For one thing I've tried selecting all boxes containing parts of a traffic light/fire hydrant, and only the ones that mostly contain parts of the object and have failed both times.
A few years back Google changed recaptcha to use a bunch of extra information to make a decision as to whether someone was a real person. This resulted in sometimes there not being any images, once you click to start the CAPTCHAS, it just shows a checkmark and let's you through. What also seems to have happened then is that they use additional I formation to make the process much harder sometimes.
Ublock never seems to matter too much for me, but the IP address used can matter a lot. Whether you're logged into gmail/Google seems to affect it positively also.
I suspect the specific VPN you're using is the source of the lion's share of the RECAPTCHA problems you're having. They are probably utilizing a range that was or is used in a way that RECAPTCHA doesn't like.
> What also seems to have happened then is that they use additional I formation to make the process much harder sometimes.
Correct. Your reCAPTCHAv3 (which is the new completely challenge-less version that doesn't even make you click a checkbox) score is a good indication of how much reCAPTCHAv2 (the clickbox version) will fuck around with you.
In case anybody is wondering, v1 was the two-word OCR version and got shut down last year.
I run Linux, use my own VPN, and use Firefox with uMatrix. CAPTCHAs are one of the most user-hostile things I experience on the web. I've had to give up on registering for sites, or signing into sites I'm already registered with, because after literally minutes I still hadn't gotten through. I actively try to avoid sites that use CAPTCHA but unfortunately it's not always possible.
Had the same setup &, i found that on brave i can do them in 20 secs compared to 3 minutes. Not saying I support them messing with FF but at the end of the day, I need to get things done.
If you're using an adblocker, you probably also don't have any positively identifying google cookies, which has become more and more synonymous with "bot" to them. They could defend it from the opposite direction: when you're logged in to google they trust they know you, so when you're not, there's a lack of trust. Not suspicion, no, but a factual lack of trust. There's plausible deniability if anyone ever claimed it's suspicion, and that they do it to punish people who want to remain independent and anonymous. But if that's the effect and they don't do anything to prevent it...
To me it feels similar to the duality of gas station "cash discounts," which have also been perceived as "credit card penalties." Or mobile providers' "free data for the music streaming service of our choice."
I use ublock. It doesn't block Google login cookies by default. I haven't noticed many issues with captchas lately, likely due to the fact that I stay logged in to my Gmail account in Firefox most of the time.
It is almost impossible to pass those CAPTCHAs behind Tor. Personally I am not sure if it is possible, I could never pass. It would just give me more and more CAPTCHAs to solve. I gave up after the 6th one.
I think there's a logic bug/feature where if the rate of attempted captcha solutions from your IP is too high, it won't tell you you've failed (it knows you didn't) but it'll keep giving you more and more captchas. Sometimes I eventually get through after 4, 6, 8 attempts. Maybe because the automated attempts slowed down, or maybe just by chance. Sometimes I end up getting the dreaded "too many automated requests from your network" message (the equivalent of Gandalf's "You Shall Not Pass!") which is fucking wonderful when it's not even google's website I'm trying to access.
For Google ReCaptcha, simply install the Buster addon, it solves the captcha for you via speech-to-text.
For captcha's in general, I think we should stop pretending that we can prevent bot traffic from a dedicated attacker without annoying the users.
A simple captcha from the 2000's (the ones with lines over a word or number of letters and numbers), should be good enough to hold off basic script kiddies. Same for a basic TTS audio clip.
> A simple captcha from the 2000's (the ones with lines over a word or number of letters and numbers), should be good enough to hold off basic script kiddies
The problem is that while the kiddies may not be sophisticated their scripts can be - the solutions will work its way in the scripts sooner than later.
Fighting against AI cracking is the same battle that AV vendors have fought in the past decades. It ultimately ends by selling snakeoil.
Any dedicated attacker can nowadays circumvent your captcha solution, it's at best now to get the low level background noise to go away, similar to IP blocking SSH when an IP makes too many attempts.
If your site security relies entirely on the captcha not being broken, your site security needs an update.
The next release will use native messaging to send native user input events to the browser. It's already working well, I just need to finish the app installation bits.
This. Google just disables captcha immediately. Also I tried clicking the audio manually, it immediately disables with a notice that “we are getting .... please try again later”.
This is so eye-opening. I've been frustrated with these things for a while and I always figured it was me. When asked to click on the traffic signs, I'm never sure whether to click on just panes that have a part of a sign or include cells that show the posts it is attached to. I finally got so discouraged, I tried the audio clues and have found that to be easier. I've found that, after listening closely, I only need to identify a single word and that is usually relatiely easy. All in all, however, I really do hate these things.
> The latest version, reCaptcha v3, announced late last year, uses “adaptive risk analysis” to score traffic according to how suspicious it seems; website owners can then choose to present sketchy users with a challenge, like a password request or two-factor authentication
eg: if you're not browsing the web signed into a Google account and allowing all their tracking. Fuck that.
I've actually started trying to see how wrong the newer Capchas from will let me be on purpose. Either by not selecting all of them or picking wrong ones. They let you through a lot of the time.
My working theory was that companies like Google were using the capchas mostly to generate AI data, so only a few of the images on any given test were actually already labeled. Any of the other images (particularly the really grainy ones) would accept any answer because they were genuine classification questions.
Reading this article, I wonder if it's not even that -- that companies like Google are assuming, "you're not going to get everything right, so we'll give you some leeway."
I remember when a forum I browsed introduced the text captcha. Users intentionally typed in the same incorrect expletive for the word not known to captcha. It was easy to tell by the font. The goal was skew the algorithm.
Well google’s traffic CAPTCHA’s main purpose is to label a huge data set for Waze. At least it must be a huge beneficial (to google) side effect. Am I wrong?
So if it asks to click on traffic lights, and a large enough group of people click on, say, red cars, can we make their their self-driving cars stop if they see red cars on the road?
Come on HN, lets all do this for a few days, you know we can do it ;-)
The issue isn't just that humans struggle with them or that bots are getting better or what not, it's because there's no way to make a captcha that works across multiple websites like a standard 'library' and expect it to remain uncracked. Anything that becomes common will be attacked and defeated, because there becomes a financial incentive for spammers and no gooders to do so.
The solution is to make captchas that are bespoke to each site, since it means the same bot or script can't be used on every one and spammers have to go out of their way to crack each one. You can already see this right now; sites with their own systems generally get no spam at all.
But given that most people aren't programmers, well it means they're stuck with mainstream captcha systems which present a giant target to the internet's never do wells.
Niche sites can avoid the issue with topic specific questions though.
1. It's not feasible for various website to implement their own custom CAPTCHA formats. Building custom CAPTCHAs is a lot of work.
2. The custom CAPTCHA tasks wouldn't be that different from each other. As the article discusses, image/text/audio recognition are some of the only universal tasks that can work for CAPTCHA.
3. Nothing is stopping a malicious actor from implementing a "check which type of captcha" function and then selecting one of several CAPTCHA cracking functions. Fragmentation of CAPTCHA format just delays the cat and mouse game.
1. As I said, this is a huge reason stuff like Recaptcha exists, and why custom ones can't work here, even if they're probably better if done correctly.
2. You can also use stuff like timing how long it takes someone to fill in the field, hiding form fields with CSS or JavaScript, randomising field input names, checking the referrer, etc. All these come up in tutorials about captchas.
3. You could ask them niche specific questions instead of requiring them to do general tasks. This is what I do with all topical internet forums and sites; have a wide array of custom written questions on the topic in place of stuff a bot can easily figure out. For instance, all questions on Wario Forums are about Wario Land and WarioWare games, not things meant to be 'culturally neutral'.
#3 (a rotation of specific questions) is definitely a measure some sites could use, but as you point out, it's incredibly niche -- I've only even seen it on forums. For example, what questions could Reddit ask you? Wario Forums is pretty much the ideal on the niche spectrum, so it's not a very useful baseline for comparison.
I rotated questions on the /register page for a large forum I run, but as my forum became more popular and more of a spam magnet, my attackers simply built a lookup table of my questions->answers. I regressed back to Recaptcha.
Another problem is that I was surprised how many legit users would be pruned out by a simple question like the equivalent of "what color is Wario's hat?" for, say, a forum that covers games in general. I did basic stat tracking on the pass-rate per question to know which were bad ones, and it seemed pretty random which ones users had trouble with. Or they'd accidentally be riddles like (made-up example) "How many triangles in a triforce?" 3? 4? 5?
And people would finally register and complain on the forum that a seemingly trivial question was too hard. Or they didn't know what "the website footer" was.
At a point, especially if you're not so extreme on the niche/theme spectrum, Recaptcha was the better trade-off.
I've said this in another comment, but I'd love to see an HN submission where we discuss anti-spam/anti-abuse strategies instead of just doing the easy thing of bashing Recaptcha.
You're right, it's only a solution for niche sites rather than ones aiming at all users. Obviously, Reddit/Facebook/Google/YouTube/whatever are out of luck here, their audience is basically 'everyone on the planet' and they don't have any real way to test that.
And you've also got a point that a certain percentage of legitimate users would be pruned out by a simple, topical question. There are probably a few people who couldn't register on Wario Forums cause of this sort of thing, and there were probably a few who couldn't join my previous sites cause of it.
So your questions would have to be very much tied to the audience. General gaming site? Asking who this is with a picture of Mario, Link, Pikachu or Sonic the Hedgehog would work pretty well. Niche site? A bit more obscure, to go with the audience likely to be visiting there.
That said, I think a few things will need to kept in mind:
1. Firstly, a lot of niche sites already have fairly strict requirements to get in, and have a more drawn out approval process than the norm. For example, quite a few I know of have you required to post an intro in an 'approval' forum in order to get access to the rest of the site or server. So I suspect users on these sites may be more used to having to think/research the process to join a forum than those on Facebook.
2. To some degree, it also filters for people who are genuinely interested in the topic to a more than average degree, which may overlap well with 'people likely to stick around for the long run'. For example, the people likely to remember King K Rool's guises in Donkey Kong Country 2 and 3 may be good users at DK Vine, someone who could identify Rawk Hawk or Flavio would be more likely to be a good Mario RPG forum member, etc.
It's a bit like the comments I've heard about Ling's Cars... the only people who shop there really, really need a car.
Actually, maybe a bit like Hacker News too. The people most likely to 'tolerate' the old school design here are well, web developers, old school hacker types, etc.
Either way, it definitely all depends on how niche the site is.
If you need CAPTCHA I may suggest plain text CAPTCHA (preferably ASCII only) with entirely server side computation, meaning anyone can read it and has maximum compatibility. If necessary, make your own rather than using an existing package, since that makes it less likely that automated spam will get through if you use a different one for each thing.
However, you should never need CAPTCHA to login (except possibly anonymously; Fossil requires a CAPTCHA to login anonymously), or to do stuff while logged in. You should not require CAPTCHA to read public information either, or to download (since you may wish to use external download management; for example, I prefer to use curl to download files rather than using the web browser, and it seems that I may not be the only one).
Of course manually entered spam will still get through even if you do use CAPTCHA.
I fear any test where a machine has to decide whether you're human enough, is always going to be easy to game for a machine. You can't replace humans with machines and then not expect machines to replace humans.
When you look at it that way, the whole captcha approach, no matter how clever, seems doomed to fail.
Why not simply allow bots? If it is because bots exhibit behaviour you don't want (like spamming), why not filter them based on the behaviour you don't want? Learn to recognise spam rather than fabricating some test. And when bots are truly indistinguishable from people, is it really a problem that they're not real people?
What's actually wrong with reCAPTCHA is that google has convinced so many sites all over the web to require it to use them, and all that free labor is going to just improve Google's machine learning programs.
Google: Hey, you don't have any Google login/session cookies? Not even one from a previous login (yeah, you can't fully log out)? That's wrong! Here, click on the traffic lights for 5 minutes! Or have one of those super slow fade-in fade-out captchas!
The fade-in fade-out captchas are especially infuriating. Captchas are already bad for UX but those which purposely slow down how fast you can solve them are on a whole different level.
Been testing captchas inadvertently quite a bit in fresh installs across multiple VMs, from what I can tell it has nothing to do with whether you have cookies or privacy configurations, it comes down to whether you use Chromium or use a shared IP. If you use Firefox even without privacy configs, expect to spend 3x the time as Chrome, that's not even including the fact that those fading images load 5x slower on Firefox. If you use a shared IP (in my case a $6/mo vpn) + Firefox, it's not even worth trying imo as it can take 3+ mins to complete captchas on most sites and it's much quicker to just open Brave to complete it in 10 seconds, the amount of tries you have to do also has nothing to do with getting all the pictures correct.
I use Firefox android with Ublock origin and am logged into Google (boo). I used to use a plugin to change my user agent to Chrome when on Google sites. This was necessary as you get the old style search results page if you use Firefox, but if you pretend to be Chrome you get the current page style which works perfectly. With that plugin enabled I always had to solve multiple recaptchas and my recaptcha v3 score was 0.1. Disabled it jumps to 0.9 and v2 gives me no puzzles. Guess I will have to live without shiny Google search results then.
> newegg lost some of my business recently after thinking it was a good idea to make me fill in a captcha before taking my money.
Probably they are combatting fraud, especially the variant "check if the credit card is still valid". There's not much defense against a botnet operator trying out a 100k dataset of stolen CC numbers other than captchas :(
For the interested, this kind of fraud simply orders cheap (on the order of 1-2$) stuff online to check if the card/cvv is valid. Doesn't draw much attention unless one of the victims has transaction notification active or diligently checks their CC bill.
Except when you're getting 3min+ captchas on FF they don't give you the speech recognition option, so it's either open up chrome and do it in 15 seconds or nothing.
So far it's always worked. If they don't give you speec recognition at all that would be an ADA violations (which was recently ruled to also be valid for websites and apps).
In my experience, buster also works better when you don't use the Google Speech API but any of the other ones so Google can't correlate.
> Malenfant says that five to ten years from now, CAPTCHA challenges likely won’t be viable at all. Instead, much of the web will have a constant, secret Turing test running in the background.
I wonder how tracking-based captchas can be compatible with privacy regulations like the GDPR. Do you have to positively opt-in to a website seeing whether or not you're a robot?
We're basically moving towards a world where the venn diagram for the web and privacy no longer intersect.
I'm not convinced OCR is as good as humans. I recent did a project making an unauthorized copy of a rare ($2000) book from a university library. I scanned in every page, but tesseract OCR really struggled with pages that started off straight but curved off. I tried lots of preprocessing techniques with limited success. My options were to type it in by hand or rescan that page so the lines were straight.
What a weird way to compare OCR to humans. If a human can't see the writing because the page surface is curving away, they'd adjust the page surface. Likewise, getting decent scans is the most cost and effort effective way of getting good performance in OCR.
I personally found tesseract to be incredibly good, and have even used it in non-traditional OCR applications for doing things like reading signs.
I'm saying that I can read skewed/bending pages easily, but I'm having a super hard time getting tesseract to play nice with such images. I almost always need to rescan.
Tesseract is incredibly good... as long as your lines of words are straight.
I'd welcome a CashCaptcha that charged me $.05usd to click past a reCaptcha. They happen enough to be annoying, but not often enough to present a financial burden - but if I were a spammer trying to abuse automated access, the actual cost might finally outweigh the return.
It seems they are difficult because they're intentionally designed to de-anonymize users coming in over VPN/Tor, using the small variation in click timing. If you see one of these and want to stay anonymous, close the tab and walk away.
Is it not monopolistic behavior that Google favors their own customers in their captchas?
I hope the EU fines Google for leveraging their security library prevalence to coerce people to use Chrome and/or open Google accounts.
I also wonder if that’s GDPR compliant: unless you accept Google’s data collection terms on GMail and/or Chrome products, they will use their position as security authority to degrade your browsing experience on third party sites.
Each time I'm faced with a Google reCAPTCHA I think about how how I, and so many people like me, are unwittingly helping to train our eventual robot overlords.
I've long suspected that Google quickly realizes when the user is human, but then serves up some more images for them to "solve" to get some extra training data for its pattern recognition AIs.
> Google wouldn’t say what factors go into that score, other than that Google observes what a bunch of “good traffic” on a site looks like
A few days ago, I signed up for some service on a new-ish laptop, and it made me pass the storefront captcha three separate times.
This is yet another example of the social credit score being implemented in the US; in this case punishing users for opting out of continuous tracking (which will in turn be used for price discrimination or worse).
The good news is that this is almost certainly going to lead to a massive backlash as it becomes more common.
While I see your point, social credit is not tracking, it is tracking with legal, economic, and political consequences. Given that the consequences to you of opting out of this tracking are little more than a minor inconvenience of your time, comparing it to the nightmare that is the social credit system is laughable.
And, I would add, you're in part trivializing the horrendous impact of the social credit system by making this comparison, because it gives others the impression that this is merely a difference of degree, rather than of substance. It allows people to make arguments like "Oh, the US credit score is just like China's social credit score, so the social credit system can't be that bad." Yeah, NO. You don't get denied freedom of movement between cities or states because you owe a few dollars, you don't have your passport revoked because you don't use Google cookies, you're not forced to sit in the back of the bus because of something vaguely political you posted on twitter, you don't get denied the ability to send your kids to certain schools because you rolled a stop sign.
The social credit system is not a _tracking system_, it is a _legal system_ (made possible by surveilance), and while the US may one day be there, to suggest they are anywhere even on the same planet yet is laughable. Your average person in the US still, even after decades of abuse, has innumerably more rights than your average Chinese citizen.
> ... little more than a minor inconvenience of your time
This is not entirely correct. I have seen recaptchas that simply deny access without giving any option to solve them when browsing with Tor. The message says something like: automated systems detected unusual activity, try again later
>People get very excited about China’s social credit system, a sort of generalization of the “permanent record” we use to intimidate schoolchildren. And ok, it does sound kind of dystopian. If your rating is too low, you aren’t allowed to fly on a plane. Think about that — a number assigned to every person, adjusted based on somebody’s judgement of your pro-social or anti-social behavior. If your number is too low, you can’t on a plane. If it’s really low, you can’t even get on a bus. Could you imagine a system like that in the US?
>Except, of course, that we have exactly this system already. The number is called a bank account. The difference is simply that we have so naturalized the system that “how much money you have” seems like simply a fact about you, rather than a judgement imposed by society.
I find if I use VPN then google will display one on search. In particular, when I try using Opera VPN then I always get one. I decline to do them so search via bing instead.
Forcing users to prove their not bots is totally the wrong approach. They should be forcing bots to prove they're human so that real humans don't see this nonsense. Easier said than done, but that's not my problem.
>Forcing users to prove their not bots is totally the wrong approach. They should be forcing bots to prove they're human so that real humans don't see this nonsense.
How do you tell who's the user? A bot can look like a user and a user can look like a bot.
You do understand that what you're asking for is literally impossible, not just "easier said than done"? Blocking bots until proven human is exactly the same as allowing humans until proven bot.
They already are forcing "bots" to prove they're human, to the best of their abilities. At some point their measures dictate that traffic from VPN = bot, until that "bot" can prove otherwise. If you're blocking whatever mechanisms they use to identify "human" then it shouldn't be a surprise that they can't differentiate you. Their only interface is whatever traffic happens between you and them, not any intention or motivation behind your actions.
They have a negative interest in blocking people from accessing anything, since pageviews = ad views = dollars. The only time they have an incentive to block anyone is for click fraud, or for any similar reputational damage from a person / bot / IP.
I agree with you, but, to be clear, the issue isn't VPNs, it's that the VPNs you are using are also used by spammers/bots/etc. or a large amount of other people. If you set up your own VPN somewhere that is just used by you and your family (for example), you will never run into this issue.
As a case in point, the same issue crops up with lots of users going through the same corporate proxy.
And it's the same reason that you can run Netflix (for example) through a personal VPN with no issue but will run into problems if you use a popular, retail VPN service.
I mean, unless you're going through additional effort, you're still identifiable through a VPN. Any web tracking, whether through cookies or through fingerprinting, still works just fine. At best you're merely making it more difficult.
Also, there is also no reason, though it would be a PITA, that you can't add your own measures to a private VPN, whether that's rotating IPs or some other measure. Is it going to keep your illegal activities truly anonymous? No, but neither is a retail VPN. It is a matter of degree and what tradeoffs you're willing to make.
As far as uses for a private VPN, the most obvious is to ensure intermediate parties, particularly on the same subnets, can't snoop on your actual traffic _content_. This isn't going to keep you anonymous from the NSA, but it sure will help against corporations (ISPs and their numerous corporate parents/cousins/siblings). Another benefit is that by protecting against packet level inspection, you are protecting yourself from many current forms of traffic shaping and bandwidth metering/throttling, as well as from limits on services you are running or the content of files you are downloading, as well as from intermediaries (e.g. ISPs) from inserting ads or additional tracking or whatever else into your (mainly web) traffic. This also comes into play not just with your normal ISP but any you are using while traveling (coffee shops, airports, hotels, and other untrusted networks).
I run a VPN from home for using at coffee shops and the like. While the proliferation of HTTPS has made sketchy networks less of a problem, DNS is still leaky.
I route all traffic, including DNS, through my VPN when I'm not at home. I was just commenting on the leakiness of DNS to preempt people saying "HTTPS means you're safe!"
I used to run my own DNS server when I was on Comcast. Now that I have a real ISP run by people I trust who have the same opinions on privacy that I do, it's no longer worth the hassle.
Possibly, sure. Depends on what IP blocks it has and how well known they are. Ultimately there is great incentive to track any retail VPN service, even the less well known ones.
> Easier said than done, but that's not my problem.
Well, but it is. Spammers and abusers have directly made it so. "This is why we can't have nice things." It's not your fault that thieves exist, but you've decided it's your problem enough to put a lock on your door.
And if a website has deficient measures against spam/abuse, it becomes your problem again when you have to see it or deal with it. Turns out that Recaptcha works pretty well in a world where it's only becoming easier and easier to abuse web platforms. And that's become a problem for all of us who want to participate online.
It's easy to be dismissive here, especially because you can just put Google in your iron sights and fire away instead of acknowledging why people use Recaptcha. Or you can just say "I'm sure there's a better solution" and leave it as an exercise for the reader. But I think you're barking up the wrong tree.
Glad to see I’m not the only one who’s getting tired!