Hacker News new | past | comments | ask | show | jobs | submit login

I've never understood what happened to reCAPTCHA, it was originally so great and is now just so, so toxic.

Originally it was an awesome solution based on OCR'ing books that usually worked quickly on the first try, and almost never took more than two.

Then it turned into a single checkbox (analyzing mouse movement) so it was even faster... and I remember some simple image-based like "select the images of cats" that were also easy to get right. So even better.

But THEN... in the past couple of years, the image-matching started asking exclusively for analysis of street images, that has two huge problems:

1) The images are so blurry and ambiguous it's really hard to get right, it feels like a test designed to make you fail

2) You never know how far you have to go -- you keep clicking items, they keep replacing them with new ones, and there's zero indication of if you're almost done or if you're getting better or worse.

Once I did one for three minutes straight, neither passing nor failing, until I just gave up and left the page... if it's a bug, that should never happen. If that's supposed to be able to happen, that's the apex of asshole design. Either way, it's a failure in every way.

There's a third problem: quite a bit of the stuff they present is (almost) uniquely American and presents a recognition challenge in other cultural contexts. That yellow vehicle? Looks nothing like a bus in most other parts of the world. And so the rest of the world gets to learn what an American Bus looks like... Not, I think, what was intended.

Or it tells you to pick out pictures of cars and shows you a pickup truck. Now you have to figure out if people would call that a car or not. How about a delivery truck? A motorcycle?

Or it will ask for pictures of crosswalks, and you have to decide if 3 pixels of a crosswalk in the corner of one of the pictures counts.

If it makes you feel any better, I'm fairly sure the answer to those questions don't count. I know I've gotten some reCAPTCHAs "wrong" and gotten marked as a human. It's picking up on a lot of signals, not just whether or not you're "right". So, the good news is you can relax, and safely rewrite all the questions to "Do I think this is a store front?" or "Do I think this square counts as a crosswalk?" or whatever without loss.

My "favorite" is the one where you have to select the boxes with traffic lights. Does that mean just the actual lights, or the entire structure? More importantly, what does Google's AI think the answer is?

crosswalks are also an american term for pedestrian crossings.

I often get asked to identify store fronts. They are the worst.

The pictures are blurry and positioned at weird angles. There are lots of signs with east-asian letters (I'm not informed enough to guess what kind of alphabet they belong to) and I have no idea wether they are store fronts or not.

Is a sign to a dentist's office a store front? Generally it seems like anything with a sign above some sort of door or window qualifies as a store front.

Came here to say the same thing. It's literally impossible to distinguish a store from any other kind of business in many of those pictures. If Google wants to do behavioral fingerprinting they should just say so instead of pretending to do image recognition. But I guess some people just lie so much that they forget how to tell the truth.

What makes you think any store is not a store front? I realize that’s part of the problem, I’m just wondering why you wouldn’t assume the very literal “it is the front of a store” interpretation.

A commercial building with a sign on it might not be a store. They didn't ask for officefronts or warehousefronts. What about a bank or brokerage? A dental office or urgent-care center? Those can look a lot like storefronts, but whether they're considered such is pretty arbitrary.

I understand where you’re coming from and I’m having difficulty explaining the difference... it mostly comes down to what you consider a store (or a shop or whatever you call it). I know they could localize it more, but I feel like it should be pretty obvious what they’re talking about - a place of business selling good to the general public. Whatever you call that, banks and dentists and warehouses and medical facilities don’t really apply.

So yes, it’s arbitrary, but it’s supposed to be. It’s about your gut feeling as a human because that’s the whole reason they’re showing you any of these images.

If it “looks a lot like” a storefront then you’ve really got the same problem as everyone else in the comments: they’re small, blurry, images and it’s hard to tell what it is. That’s also the whole point: their algorithms can’t tell, so they want a general consensus from users. There are images they know and use as a control, but some percentage of the ones you see they’re legitimately not sure about.

E.g “Spot the fire hydrant” - oh, it’s those things that cops drive over in Hollywood movies. I don’t know if other counties have them too but it seems distinctly American and this capatcha is oddly common

Are you in america or using a vpn that shows as in america?

NZer here. The captures are usually American places with American themes.

I have definitely seen the "fire-hydrant" one, and we don't have fire hydrants (they are underground below well marked covers that are illegal to park on or placed where you can't park).

And coming from a first-world Western country, I have definitely been flummoxed by at least one that was too American for me to decipher. I feel sorry for anyone that doesn't watch American media.

Huh, there's fire hydrant here in Brazil. Although not as common as it was a time ago!

I see that stuff too. Not American.

I am from India, not using VPN. Except for storefronts, everything I get looks like from US-traffic lights, cars, buses (including yellow school buses), cross walks etc.

That hasn't been my experience. Most of the "storefronts" are (from what I can tell) based on Asia. I almost never see English signs. I'm still able to complete these challenges with only a little bit of difficulty.

Because it’s still created in an entirely American context. For example, the word storefront is an Americanism. The more commonly used word in the UK is shopfront, and in other English speaking countries they may just call them shops or stores, without the addition of the word front.

Fourth problem: How vague the instructions are. When I'm asked to click the boxes that contain signs, do I include the poles?

Yeah, this one puzzles me too. Generally, it seems like signs and traffic lights don't include supports, poles, etc.

Totally this, I'm British and am probably more exposed to american culture than other nationalities on average, and yet recaptcha still sometimes leaves me clueless on some americanism, that is when it's not driving me crazy with it's infinite loop. For other nationalities it must be straight up discrimination.

I sometimes wonder if these projects are actually internal astroturfing, someone trying to make people hate Google from the inside, it's so bad it must be intentional right?

Originaly it didn't belong to google, it was an aquisition. I remember seeing a ted talk about it.

To me it constantly feels like I'm working for google for free for their AI projects which is very annoying comparing to help a smaller company OCR books.

Trying to convince a robot that you aren’t a robot by teaching a robot how to look at pictures is a pretty absurd state of the world.

When they reboot the Matrix, instead of being used as batteries, the machines will keep humans around for machine learning test sets.

I think that was the original story for the matrix https://scifi.stackexchange.com/questions/19817/was-executiv...

Well, it might have been too close to the storyline of Hyperion Cantos (which probably got it from somewhere else).

You aren't working for free. You get access to a website and the publisher gets bot protection. It's a 3 way win-win-win transaction.

I think two things happened:

1) Computer vision got a lot better over the past few years. It's also become way easier for the average Joe bot operator to run cutting-edge stuff. OCR tasks don't cut it for distinguishing people from machines any more. Every time I see a blog post about a new computer vision architecture or how some random developer trained a neural network to get an X% result on benchmark Y, I think to myself CAPTCHAs are going to get more annoying.

2) The frequency at which most people have to solve a CAPTCHA has gone way down. In the beginning, I remember having to solve a CAPTCHA every single time I did anything on some sites. Now, I can't even remember the last time I had to do more than just check the checkbox. So, the amount of annoyance is amortized over a larger number of sessions, and Google probably feels like they can ask the user to complete more tasks as a result.

I've noticed the opposite on #2, especially in the last year or so. I've been solving a lot more captchas than I used to. I run Firefox with a lot of privacy focused add ons and I don't stay logged in to Google, I wonder if those have something to do with it.

Yes, they most likely do have something to do with it. If Google is unable to ID you in some way (e.g. browser fingerprint, cookies, IP, etc) and determine you're a good Internet citizen, they'll assume that you could be a bot and offer challenging Captchas. It's annoying, but on the bright side it proves that your privacy add-ons are working!

Same here. When this highly advertized service was launched ('just a click!') it worked perfectly. Slowly, over the past couple of years, they deliberately replaced that wonderful service with another one where we act as Google's unpaid workers.

Captcha Data has been used to traon ML models for a very long time. What's changed recently is that simple stuff like OCR has already been solved and democratized so the simple puzzles no longer work.

I'm not talking about the simple puzzles or 'words' that reCaptcha initially used to show. I'm talking about their 'improved' way of testing whether you are a bot by just making you click a checkbox. That doesn't work anymore (most of the times).

The frequency goes down as Google identifies you with stronger confidence. Try browsing from a VPN and you will spend half your time solving CAPTCHAs.

I am also getting way more captchas at least since the last 6 months. Exclusively using Firefox with clear everything on exit, multiple profiles, fingerprint flag on, some addons etc. No VPN. I get captcha almost all the time, even for Google searches from Firefox address bar (one out of 10 searches I think). But never gets a captcha for Google websites (gmail, youtube etc).

2) isn't true at all for me. I've always loved captcha and it has become a huuuuuge annoyance as soon as I'm using a vpn, tor, a weird wifi, a non-typical device, etc.

It is so freaking slow. I sometimes lose 60s to complete a captcha.

An insightful remark about ReCaptcha on HN recently (I don't have a link) was that it went from being "are you human" to "which human are you".

Ha, ha, very accurate observation.

And if Google keeps the pressure and nothing hits them back, soon the answer will be "Number 17 of 312 still using Firefox".

I still can't believe how Google has changed their tune - from "dont be evil" to being worse than MS ever was, which is quite an achievement in itself.

Google is in some ways much more adverse in impact than MS, but I suspect that hiring a bunch of people under the "don't be evil" mantra (and baking that "we're the good guys" into culture) has helped hold them back from some bad behavior.

At the same time an implicit belief in "we're the good guys" (combined with indoctrination including interview hazing rituals) can enable bad behavior, because then: "of course whatever we do is good, by definition, because we're the good guys" and then not questioned. MS did some really underhanded and insidious things with its power, and it's easier to see some of Google's behavior as due more to hubris/brainwashing.

I've started to use the CS101 whiteboard hazing as a litmus test for whether there's any point in trying to do good at Google, for my own career. So long as they insist on subjecting everyone to that (starting with people having just spent 4 years and a quarter of a million dollars on a Stanford CS education, and then people with verifiable experience on top of that), and also considering having been caught on abusive hiring/mobility conspiracy at they executive level, I think the CS101 whiteboard ridiculousness is not a good sign for corporate ego and intentions. It's also not great when CS students focus on drilling for that, to the exclusion of other things. For myself, if I applied anyway, I'd be fooling myself that I wasn't mainly after the compensation package, rather than wanting to have positive impact.

> I still can't believe how Google has changed their tune - from "dont be evil" to being worse than MS ever was, which is quite an achievement in itself.

It's called "selling out".

It sounds funny but I don't get it. ReCaptcha doesn't identify you does it?

To the website? No. To Google? Almost certainly given how it works.

I can imagine that, if Google already knows enough about you, just clicking "I'm not a bot" would be enough. Though I wouldn't know.

It seems like another way to punish people for caring about privacy.

There’s also this to consider: Google knowing enough about you to know you’re a human, and then wanting to use you to train. That’s why in some cases you can get away with just spamming whatever the hell you want in the picture grid. Because it trusts you enough to train it.

> 1) The images are so blurry and ambiguous it's really hard to get right, it feels like a test designed to make you fail

On top of that, I think some of the training sets are wrong. Multiple times I've been asked to find traffic signs, but it would only let me pass when including street signs.

There's also the issue that it will lie to you if the alogrithm decides it simply doesn't like you. Which means you'll end up doing at least a couple of rounds before it decides to let you through.

Rather, if it does like you (because you frequently get it right), it'll ask you to give it extra data.

Fascinating. Conspiracy theories around software. Might make for a fun sci-fi creative writing exercise.

I always envisioned their devious model to be something like:

- You want to train on an unlabeled dataset, label it along the way.

- You have a set of untrusted validators, some with no history, some with known credibility and accuracy scores. And you have a lot of them.

- You do kind of a zero-knowledge proof by showing the unlabeled dataset to validators that you know you can trust because of their historical high success rate, which you've already established through asking them to label a dataset that you already have high confidence on.

Kind of like how a blue-green colorblind person could find out which pen is blue, which pen is green if he is surrounded by people he can't fully trust. Ask people around you and maybe even show the same person the same pen (or a really dead-easy captcha) twice in a row. If they lie to you both times, they are not to be trusted.

If you use Chrome or Brave you can get multiple boxes wrong and still get through i've found, even on a cheap VPN IP.

Here's a hint: VPNs do almost nothing to safeguard you from modern fingerprinting techniques. If you're using any browser [1] but Firefox or Safari, Google probably knows exactly who you are and is just doing the boxes for shits & giggles.

[1] except those that reCaptcha doesn't support.

You have to answer the way most people would answer, not what is the most technically correct.

I guess if your adversary is a dogmatic AI then that might be by design.

I keep expecting it to eventually ask me to "click on the pictures of terrorists" and them using it to train automatic drone targeting software.

They also changed it so that if you've seemed human in the past, they're able to determine if you're probabilistically a human now.

This data is a few years old but I imagine it's the same based on my experience.

They're using your cookie + IP + your account data to determine if you're probably a human.

A LOT of reCAPTCHA sites never prompt you. You only know if it's there because you're on Tor or something.

> A LOT of reCAPTCHA sites never prompt you.

That has only happened to me in Chrome, not Firefox or Safari. Which is the subject of this article.

I believe even worse than showing you new sets of images is when the reCAPTCHA system gives you a "low trust score" and intentionally fades out the selected images, but very slowly, and replaces them with new images of the same type. Just downright feels abusive to the end user. Good luck if if you have tweaked any browser settings to be more amenable to privacy!

I wish more sites would implement a Jigsaw-puzzle-style similar to the Binance login captcha, but I can't speak to the efficiency of that in defeating bots.

Yea it was much better when it was run by Carnegie Mellon. I guess selling it to Google seemed like a good idea at the time.

Today I feel like Google uses it mostly for their self-driving-car computer vision projects.

Sometimes it is straight up wrong too. I once got a picture of a sign with a traffic light on it asking me to identify the traffic light. If you selected nothing it wouldn't let you go ahead. So I clicked the squares with the sign and it let me proceed. I don't even think it should be that difficult to see that it wasn't a traffic light since all colors were bright. A typical in use light will only show one color at a time.

>Originally it was an awesome solution based on OCR'ing books that usually worked quickly on the first try, and almost never took more than two.

People kept trolling it by typing the test word correctly, and random garbage instead of the OCR word. It was easy to spot which one was which. Source: I was one of these people.

It is made by google to train their neural networks. Neural networks are evolving and need harder examples for training.

Because it is an adversarial system, the busters are getting better, so reCaptcha needs to catchup.

What happened? The spambot algorithms have gotten better and can now defeat the simple tasks. It's a perpetual arms race of you vs. the spambot developers.

they’re using the service to train self-driving cars to recognize traffic lights, bicyclists, etc

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact