I've never understood what happened to reCAPTCHA, it was originally so great and...

mikro2nd · on June 10, 2019

There's a third problem: quite a bit of the stuff they present is (almost) uniquely American and presents a recognition challenge in other cultural contexts. That yellow vehicle? Looks nothing like a bus in most other parts of the world. And so the rest of the world gets to learn what an American Bus looks like... Not, I think, what was intended.

jandrese · on June 10, 2019

Or it tells you to pick out pictures of cars and shows you a pickup truck. Now you have to figure out if people would call that a car or not. How about a delivery truck? A motorcycle?

Or it will ask for pictures of crosswalks, and you have to decide if 3 pixels of a crosswalk in the corner of one of the pictures counts.

jerf · on June 10, 2019

If it makes you feel any better, I'm fairly sure the answer to those questions don't count. I know I've gotten some reCAPTCHAs "wrong" and gotten marked as a human. It's picking up on a lot of signals, not just whether or not you're "right". So, the good news is you can relax, and safely rewrite all the questions to "Do I think this is a store front?" or "Do I think this square counts as a crosswalk?" or whatever without loss.

profmonocle · on June 11, 2019

My "favorite" is the one where you have to select the boxes with traffic lights. Does that mean just the actual lights, or the entire structure? More importantly, what does Google's AI think the answer is?

swang · on June 11, 2019

crosswalks are also an american term for pedestrian crossings.

simongr3dal · on June 10, 2019

I often get asked to identify store fronts. They are the worst.

The pictures are blurry and positioned at weird angles. There are lots of signs with east-asian letters (I'm not informed enough to guess what kind of alphabet they belong to) and I have no idea wether they are store fronts or not.

Is a sign to a dentist's office a store front? Generally it seems like anything with a sign above some sort of door or window qualifies as a store front.

notacoward · on June 11, 2019

Came here to say the same thing. It's literally impossible to distinguish a store from any other kind of business in many of those pictures. If Google wants to do behavioral fingerprinting they should just say so instead of pretending to do image recognition. But I guess some people just lie so much that they forget how to tell the truth.

chrismeller · on June 11, 2019

What makes you think any store is not a store front? I realize that’s part of the problem, I’m just wondering why you wouldn’t assume the very literal “it is the front of a store” interpretation.

notacoward · on June 11, 2019

A commercial building with a sign on it might not be a store. They didn't ask for officefronts or warehousefronts. What about a bank or brokerage? A dental office or urgent-care center? Those can look a lot like storefronts, but whether they're considered such is pretty arbitrary.

chrismeller · on June 12, 2019

I understand where you’re coming from and I’m having difficulty explaining the difference... it mostly comes down to what you consider a store (or a shop or whatever you call it). I know they could localize it more, but I feel like it should be pretty obvious what they’re talking about - a place of business selling good to the general public. Whatever you call that, banks and dentists and warehouses and medical facilities don’t really apply.

So yes, it’s arbitrary, but it’s supposed to be. It’s about your gut feeling as a human because that’s the whole reason they’re showing you any of these images.

If it “looks a lot like” a storefront then you’ve really got the same problem as everyone else in the comments: they’re small, blurry, images and it’s hard to tell what it is. That’s also the whole point: their algorithms can’t tell, so they want a general consensus from users. There are images they know and use as a control, but some percentage of the ones you see they’re legitimately not sure about.

squidi · on June 10, 2019

E.g “Spot the fire hydrant” - oh, it’s those things that cops drive over in Hollywood movies. I don’t know if other counties have them too but it seems distinctly American and this capatcha is oddly common

BearsAreCool · on June 10, 2019

Are you in america or using a vpn that shows as in america?

robocat · on June 11, 2019

NZer here. The captures are usually American places with American themes.

I have definitely seen the "fire-hydrant" one, and we don't have fire hydrants (they are underground below well marked covers that are illegal to park on or placed where you can't park).

And coming from a first-world Western country, I have definitely been flummoxed by at least one that was too American for me to decipher. I feel sorry for anyone that doesn't watch American media.

vitorgrs · on June 11, 2019

Huh, there's fire hydrant here in Brazil. Although not as common as it was a time ago!

jazoom · on June 10, 2019

I see that stuff too. Not American.

nonamechicken · on June 11, 2019

I am from India, not using VPN. Except for storefronts, everything I get looks like from US-traffic lights, cars, buses (including yellow school buses), cross walks etc.

josefresco · on June 10, 2019

That hasn't been my experience. Most of the "storefronts" are (from what I can tell) based on Asia. I almost never see English signs. I'm still able to complete these challenges with only a little bit of difficulty.

addicted · on June 11, 2019

Because it’s still created in an entirely American context. For example, the word storefront is an Americanism. The more commonly used word in the UK is shopfront, and in other English speaking countries they may just call them shops or stores, without the addition of the word front.

Nition · on June 11, 2019

Fourth problem: How vague the instructions are. When I'm asked to click the boxes that contain signs, do I include the poles?

mirimir · on June 11, 2019

Yeah, this one puzzles me too. Generally, it seems like signs and traffic lights don't include supports, poles, etc.

tomxor · on June 11, 2019

Totally this, I'm British and am probably more exposed to american culture than other nationalities on average, and yet recaptcha still sometimes leaves me clueless on some americanism, that is when it's not driving me crazy with it's infinite loop. For other nationalities it must be straight up discrimination.

I sometimes wonder if these projects are actually internal astroturfing, someone trying to make people hate Google from the inside, it's so bad it must be intentional right?

fimdomeio · on June 10, 2019

Originaly it didn't belong to google, it was an aquisition. I remember seeing a ted talk about it.

To me it constantly feels like I'm working for google for free for their AI projects which is very annoying comparing to help a smaller company OCR books.

rhino369 · on June 10, 2019

Trying to convince a robot that you aren’t a robot by teaching a robot how to look at pictures is a pretty absurd state of the world.

When they reboot the Matrix, instead of being used as batteries, the machines will keep humans around for machine learning test sets.

Recursing · on June 10, 2019

I think that was the original story for the matrix https://scifi.stackexchange.com/questions/19817/was-executiv...

r00fus · on June 10, 2019

Well, it might have been too close to the storyline of Hyperion Cantos (which probably got it from somewhere else).

fjsolwmv · on June 10, 2019

You aren't working for free. You get access to a website and the publisher gets bot protection. It's a 3 way win-win-win transaction.

jonas21 · on June 10, 2019

I think two things happened:

1) Computer vision got a lot better over the past few years. It's also become way easier for the average Joe bot operator to run cutting-edge stuff. OCR tasks don't cut it for distinguishing people from machines any more. Every time I see a blog post about a new computer vision architecture or how some random developer trained a neural network to get an X% result on benchmark Y, I think to myself CAPTCHAs are going to get more annoying.

2) The frequency at which most people have to solve a CAPTCHA has gone way down. In the beginning, I remember having to solve a CAPTCHA every single time I did anything on some sites. Now, I can't even remember the last time I had to do more than just check the checkbox. So, the amount of annoyance is amortized over a larger number of sessions, and Google probably feels like they can ask the user to complete more tasks as a result.

MrMember · on June 10, 2019

I've noticed the opposite on #2, especially in the last year or so. I've been solving a lot more captchas than I used to. I run Firefox with a lot of privacy focused add ons and I don't stay logged in to Google, I wonder if those have something to do with it.

iliketosleep · on June 11, 2019

Yes, they most likely do have something to do with it. If Google is unable to ID you in some way (e.g. browser fingerprint, cookies, IP, etc) and determine you're a good Internet citizen, they'll assume that you could be a bot and offer challenging Captchas. It's annoying, but on the bright side it proves that your privacy add-ons are working!

piyush_soni · on June 11, 2019

Same here. When this highly advertized service was launched ('just a click!') it worked perfectly. Slowly, over the past couple of years, they deliberately replaced that wonderful service with another one where we act as Google's unpaid workers.

andromeduck · on June 11, 2019

Captcha Data has been used to traon ML models for a very long time. What's changed recently is that simple stuff like OCR has already been solved and democratized so the simple puzzles no longer work.

piyush_soni · on June 11, 2019

I'm not talking about the simple puzzles or 'words' that reCaptcha initially used to show. I'm talking about their 'improved' way of testing whether you are a bot by just making you click a checkbox. That doesn't work anymore (most of the times).

mleonhard · on June 10, 2019

The frequency goes down as Google identifies you with stronger confidence. Try browsing from a VPN and you will spend half your time solving CAPTCHAs.

nonamechicken · on June 11, 2019

I am also getting way more captchas at least since the last 6 months. Exclusively using Firefox with clear everything on exit, multiple profiles, fingerprint flag on, some addons etc. No VPN. I get captcha almost all the time, even for Google searches from Firefox address bar (one out of 10 searches I think). But never gets a captcha for Google websites (gmail, youtube etc).

baby · on June 11, 2019

2) isn't true at all for me. I've always loved captcha and it has become a huuuuuge annoyance as soon as I'm using a vpn, tor, a weird wifi, a non-typical device, etc.

It is so freaking slow. I sometimes lose 60s to complete a captcha.

neilv · on June 10, 2019

An insightful remark about ReCaptcha on HN recently (I don't have a link) was that it went from being "are you human" to "which human are you".

amenod · on June 10, 2019

Ha, ha, very accurate observation.

And if Google keeps the pressure and nothing hits them back, soon the answer will be "Number 17 of 312 still using Firefox".

I still can't believe how Google has changed their tune - from "dont be evil" to being worse than MS ever was, which is quite an achievement in itself.

neilv · on June 10, 2019

Google is in some ways much more adverse in impact than MS, but I suspect that hiring a bunch of people under the "don't be evil" mantra (and baking that "we're the good guys" into culture) has helped hold them back from some bad behavior.

At the same time an implicit belief in "we're the good guys" (combined with indoctrination including interview hazing rituals) can enable bad behavior, because then: "of course whatever we do is good, by definition, because we're the good guys" and then not questioned. MS did some really underhanded and insidious things with its power, and it's easier to see some of Google's behavior as due more to hubris/brainwashing.

I've started to use the CS101 whiteboard hazing as a litmus test for whether there's any point in trying to do good at Google, for my own career. So long as they insist on subjecting everyone to that (starting with people having just spent 4 years and a quarter of a million dollars on a Stanford CS education, and then people with verifiable experience on top of that), and also considering having been caught on abusive hiring/mobility conspiracy at they executive level, I think the CS101 whiteboard ridiculousness is not a good sign for corporate ego and intentions. It's also not great when CS students focus on drilling for that, to the exclusion of other things. For myself, if I applied anyway, I'd be fooling myself that I wasn't mainly after the compensation package, rather than wanting to have positive impact.

mirimir · on June 11, 2019

> I still can't believe how Google has changed their tune - from "dont be evil" to being worse than MS ever was, which is quite an achievement in itself.

It's called "selling out".

dataflow · on June 10, 2019

It sounds funny but I don't get it. ReCaptcha doesn't identify you does it?

Operyl · on June 11, 2019

To the website? No. To Google? Almost certainly given how it works.

mcv · on June 11, 2019

I can imagine that, if Google already knows enough about you, just clicking "I'm not a bot" would be enough. Though I wouldn't know.

It seems like another way to punish people for caring about privacy.

Operyl · on June 11, 2019

There’s also this to consider: Google knowing enough about you to know you’re a human, and then wanting to use you to train. That’s why in some cases you can get away with just spamming whatever the hell you want in the picture grid. Because it trusts you enough to train it.

Izkata · on June 10, 2019

> 1) The images are so blurry and ambiguous it's really hard to get right, it feels like a test designed to make you fail

On top of that, I think some of the training sets are wrong. Multiple times I've been asked to find traffic signs, but it would only let me pass when including street signs.

ChrisSD · on June 10, 2019

There's also the issue that it will lie to you if the alogrithm decides it simply doesn't like you. Which means you'll end up doing at least a couple of rounds before it decides to let you through.

earenndil · on June 10, 2019

Rather, if it does like you (because you frequently get it right), it'll ask you to give it extra data.

scarejunba · on June 10, 2019

Fascinating. Conspiracy theories around software. Might make for a fun sci-fi creative writing exercise.

therein · on June 11, 2019

I always envisioned their devious model to be something like:

- You want to train on an unlabeled dataset, label it along the way.

- You have a set of untrusted validators, some with no history, some with known credibility and accuracy scores. And you have a lot of them.

- You do kind of a zero-knowledge proof by showing the unlabeled dataset to validators that you know you can trust because of their historical high success rate, which you've already established through asking them to label a dataset that you already have high confidence on.

Kind of like how a blue-green colorblind person could find out which pen is blue, which pen is green if he is surrounded by people he can't fully trust. Ask people around you and maybe even show the same person the same pen (or a really dead-easy captcha) twice in a row. If they lie to you both times, they are not to be trusted.

tootahe45 · on June 10, 2019

If you use Chrome or Brave you can get multiple boxes wrong and still get through i've found, even on a cheap VPN IP.

frenchy · on June 11, 2019

Here's a hint: VPNs do almost nothing to safeguard you from modern fingerprinting techniques. If you're using any browser [1] but Firefox or Safari, Google probably knows exactly who you are and is just doing the boxes for shits & giggles.

[1] except those that reCaptcha doesn't support.

jandrese · on June 10, 2019

You have to answer the way most people would answer, not what is the most technically correct.

I guess if your adversary is a dogmatic AI then that might be by design.

psadauskas · on June 10, 2019

I keep expecting it to eventually ask me to "click on the pictures of terrorists" and them using it to train automatic drone targeting software.

burtonator · on June 10, 2019

They also changed it so that if you've seemed human in the past, they're able to determine if you're probabilistically a human now.

This data is a few years old but I imagine it's the same based on my experience.

They're using your cookie + IP + your account data to determine if you're probably a human.

A LOT of reCAPTCHA sites never prompt you. You only know if it's there because you're on Tor or something.

kccqzy · on June 10, 2019

> A LOT of reCAPTCHA sites never prompt you.

That has only happened to me in Chrome, not Firefox or Safari. Which is the subject of this article.

djsumdog · on June 10, 2019

Yea it was much better when it was run by Carnegie Mellon. I guess selling it to Google seemed like a good idea at the time.

Today I feel like Google uses it mostly for their self-driving-car computer vision projects.

Macross8299 · on June 11, 2019

I believe even worse than showing you new sets of images is when the reCAPTCHA system gives you a "low trust score" and intentionally fades out the selected images, but very slowly, and replaces them with new images of the same type. Just downright feels abusive to the end user. Good luck if if you have tweaked any browser settings to be more amenable to privacy!

I wish more sites would implement a Jigsaw-puzzle-style similar to the Binance login captcha, but I can't speak to the efficiency of that in defeating bots.

distant_hat · on June 11, 2019

Sometimes it is straight up wrong too. I once got a picture of a sign with a traffic light on it asking me to identify the traffic light. If you selected nothing it wouldn't let you go ahead. So I clicked the squares with the sign and it let me proceed. I don't even think it should be that difficult to see that it wasn't a traffic light since all colors were bright. A typical in use light will only show one color at a time.

antisemiotic · on June 11, 2019

>Originally it was an awesome solution based on OCR'ing books that usually worked quickly on the first try, and almost never took more than two.

People kept trolling it by typing the test word correctly, and random garbage instead of the OCR word. It was easy to spot which one was which. Source: I was one of these people.

vasili111 · on June 10, 2019

It is made by google to train their neural networks. Neural networks are evolving and need harder examples for training.

xxxpupugo · on June 10, 2019

Because it is an adversarial system, the busters are getting better, so reCaptcha needs to catchup.

izacus · on June 10, 2019

What happened? The spambot algorithms have gotten better and can now defeat the simple tasks. It's a perpetual arms race of you vs. the spambot developers.

luxuryballs · on June 10, 2019

they’re using the service to train self-driving cars to recognize traffic lights, bicyclists, etc