Originally it was an awesome solution based on OCR'ing books that usually worked quickly on the first try, and almost never took more than two.
Then it turned into a single checkbox (analyzing mouse movement) so it was even faster... and I remember some simple image-based like "select the images of cats" that were also easy to get right. So even better.
But THEN... in the past couple of years, the image-matching started asking exclusively for analysis of street images, that has two huge problems:
1) The images are so blurry and ambiguous it's really hard to get right, it feels like a test designed to make you fail
2) You never know how far you have to go -- you keep clicking items, they keep replacing them with new ones, and there's zero indication of if you're almost done or if you're getting better or worse.
Once I did one for three minutes straight, neither passing nor failing, until I just gave up and left the page... if it's a bug, that should never happen. If that's supposed to be able to happen, that's the apex of asshole design. Either way, it's a failure in every way.
Or it will ask for pictures of crosswalks, and you have to decide if 3 pixels of a crosswalk in the corner of one of the pictures counts.
The pictures are blurry and positioned at weird angles.
There are lots of signs with east-asian letters (I'm not informed enough to guess what kind of alphabet they belong to) and I have no idea wether they are store fronts or not.
Is a sign to a dentist's office a store front? Generally it seems like anything with a sign above some sort of door or window qualifies as a store front.
So yes, it’s arbitrary, but it’s supposed to be. It’s about your gut feeling as a human because that’s the whole reason they’re showing you any of these images.
If it “looks a lot like” a storefront then you’ve really got the same problem as everyone else in the comments: they’re small, blurry, images and it’s hard to tell what it is. That’s also the whole point: their algorithms can’t tell, so they want a general consensus from users. There are images they know and use as a control, but some percentage of the ones you see they’re legitimately not sure about.
I have definitely seen the "fire-hydrant" one, and we don't have fire hydrants (they are underground below well marked covers that are illegal to park on or placed where you can't park).
And coming from a first-world Western country, I have definitely been flummoxed by at least one that was too American for me to decipher. I feel sorry for anyone that doesn't watch American media.
I sometimes wonder if these projects are actually internal astroturfing, someone trying to make people hate Google from the inside, it's so bad it must be intentional right?
To me it constantly feels like I'm working for google for free for their AI projects which is very annoying comparing to help a smaller company OCR books.
When they reboot the Matrix, instead of being used as batteries, the machines will keep humans around for machine learning test sets.
1) Computer vision got a lot better over the past few years. It's also become way easier for the average Joe bot operator to run cutting-edge stuff. OCR tasks don't cut it for distinguishing people from machines any more. Every time I see a blog post about a new computer vision architecture or how some random developer trained a neural network to get an X% result on benchmark Y, I think to myself CAPTCHAs are going to get more annoying.
2) The frequency at which most people have to solve a CAPTCHA has gone way down. In the beginning, I remember having to solve a CAPTCHA every single time I did anything on some sites. Now, I can't even remember the last time I had to do more than just check the checkbox. So, the amount of annoyance is amortized over a larger number of sessions, and Google probably feels like they can ask the user to complete more tasks as a result.
It is so freaking slow. I sometimes lose 60s to complete a captcha.
And if Google keeps the pressure and nothing hits them back, soon the answer will be "Number 17 of 312 still using Firefox".
I still can't believe how Google has changed their tune - from "dont be evil" to being worse than MS ever was, which is quite an achievement in itself.
At the same time an implicit belief in "we're the good guys" (combined with indoctrination including interview hazing rituals) can enable bad behavior, because then: "of course whatever we do is good, by definition, because we're the good guys" and then not questioned. MS did some really underhanded and insidious things with its power, and it's easier to see some of Google's behavior as due more to hubris/brainwashing.
I've started to use the CS101 whiteboard hazing as a litmus test for whether there's any point in trying to do good at Google, for my own career. So long as they insist on subjecting everyone to that (starting with people having just spent 4 years and a quarter of a million dollars on a Stanford CS education, and then people with verifiable experience on top of that), and also considering having been caught on abusive hiring/mobility conspiracy at they executive level, I think the CS101 whiteboard ridiculousness is not a good sign for corporate ego and intentions. It's also not great when CS students focus on drilling for that, to the exclusion of other things. For myself, if I applied anyway, I'd be fooling myself that I wasn't mainly after the compensation package, rather than wanting to have positive impact.
It's called "selling out".
It seems like another way to punish people for caring about privacy.
On top of that, I think some of the training sets are wrong. Multiple times I've been asked to find traffic signs, but it would only let me pass when including street signs.
- You want to train on an unlabeled dataset, label it along the way.
- You have a set of untrusted validators, some with no history, some with known credibility and accuracy scores. And you have a lot of them.
- You do kind of a zero-knowledge proof by showing the unlabeled dataset to validators that you know you can trust because of their historical high success rate, which you've already established through asking them to label a dataset that you already have high confidence on.
Kind of like how a blue-green colorblind person could find out which pen is blue, which pen is green if he is surrounded by people he can't fully trust. Ask people around you and maybe even show the same person the same pen (or a really dead-easy captcha) twice in a row. If they lie to you both times, they are not to be trusted.
 except those that reCaptcha doesn't support.
I guess if your adversary is a dogmatic AI then that might be by design.
This data is a few years old but I imagine it's the same based on my experience.
They're using your cookie + IP + your account data to determine if you're probably a human.
A LOT of reCAPTCHA sites never prompt you. You only know if it's there because you're on Tor or something.
That has only happened to me in Chrome, not Firefox or Safari. Which is the subject of this article.
I wish more sites would implement a Jigsaw-puzzle-style similar to the Binance login captcha, but I can't speak to the efficiency of that in defeating bots.
Today I feel like Google uses it mostly for their self-driving-car computer vision projects.
People kept trolling it by typing the test word correctly, and random garbage instead of the OCR word. It was easy to spot which one was which. Source: I was one of these people.