Tell HN: In NSW, if Google doesn't track you, you can't pay Public School fees
129 points by mastazi 9 days ago | hide | past | web | favorite | 85 comments
I have recently enabled the "resistFingerprinting" option in Firefox[1], in order to prevent tracking based on browser fingerprinting. However I have found out that once I've done that, Google's reCAPTCHA becomes almost impossible to solve.

Normally I wouldn't care too much about Google, the problem is that in Australia, reCAPTCHA is used by Westpac bank, for processing payments on behalf of the Department of Education of New South Wales. In other words, you can't pay your child's public school fees online, unless you agree to Google tracking you.

How to test:

create a form with reCAPTCHA or just use a pre-existing one like [2], then try and solve the reCAPTCHA while resistFingerprinting is set to false (default setting)[1]. Now change it to true, and try to solve the reCAPTCHA once again.

[1] https://support.mozilla.org/en-US/kb/firefox-protection-agai...

[2] https://patrickhlauke.github.io/recaptcha/

I hope Firefox turns this option on by default. Overnight millions will face a hard time with reCAPTCHA and Google might be forced to sit up and take note. Fantasy aside, people will simply switch to Chrome-based browsers contributing to Firefox's dwindling market share. A win-win situation for Google.

Most Google products provide me services in exchange for tracking me, offering a reasonable compromise. With reCAPTCHA, they exploit me as an unpaid worker helping classify and train their ML algorithms, cost me time and wreck privacy leaving Google and the captcha hosting website as the only beneficiaries. Google in this case is more like a corrupt gatekeeper preventing you from entering the town. The town can employ more friendly options, but they don't care as long as the undesirables are kept at bay.

For those who have experience with using reCAPTCHA, is it so easy to setup and deploy that more and more sites are switching to them? Are there no decent non-exploitative alternatives which are tough on bots but solvable in reasonable time for humans without being a test of patience?

It really annoys me when companies I do business with make me do extra work for Google by employing reCAPTCHA.

It truly is pervasive now, and the likelihood of being asked to select crosswalks or store fronts or signals subjectively seems to be rising. I question whether that would be the same if I capitulated and used Chrome.

What would once be a certain no, I now question whether google is actively weaponizing reCAPTCHA in the new browser wars. After all, Chrome has such a large market share in the right places that I'd be surprised if the model didn't take into account user agent to determine non-automated users.

The worst pattern I'm seeing now is when login forms decide to add it after just one incorrect password attempt. I completely understand registration forms -- but login forms?

I didn't even consider that it would be different on Chrome, but I guess that is because nowadays users are supposed to be "logged into Chrome"?

I can only hope that firms will eventually figure out that reCAPTCHA is a bad user experience.

On the login form thing: it's to prevent easy brute force attacks

Isn’t it a bit lazy on the developers side ?

Limiting the number of errors before sending a link to reset your password (for example, I agree there might be different ways to deal with that) is no rocket science, and being dependent on third party for such a trivial thing is, in my opinion, a bad idea.

It really isn't trivial. Rate limiting is great, but not enough. If you lock people out after a certain number of failed login attempts, you allow an adversary to DOS your users by constantly trying to log in as them.

> It really isn't trivial. Rate limiting is great, but not enough.

Rate limiting alone isn't a solution. But it can be part of a solution that doesn't require reCAPTCHA.

> If you lock people out after a certain number of failed login attempts, you allow an adversary to DOS your users by constantly trying to log in as them.

That isn't how the pattern works. On next successful login you basically inform the user that they need to confirm it's them with an email token. It works well. ReCAPTCHA doesn't.

> I hope Firefox turns this option on by default.

Unfortunately, it has side-effects like disabling site-specific zoom levels[1], since it can be used as a fingerprinting mechanism.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1369357

Millions? Firefox will have less than 1% market share next year.

It really doesn't matter at all, especially now that even Edge is switching to chromium.

less than 1% next year, it was just less than 9%. Do you mean it will drop 8 percentage points in less than 1 month, or in 12 months? Even so I think you will find there is some core percentage it won't drop below as long as it is competitive with other browsers. I'm think at worst - 5%;

Actually below 9% is just desktop market. If you include also mobile market, it's already below 5%..

It is less than 9% of the desktop share, so around 5% if you take mobile into account.

I mean, I just don't know how much people care about mobile browsers. I've never even considered switching away from safari on my phone? I don't really want any advanced features on a phone browser, I'm not running 20 tabs etc etc.

I could be the odd one out here - but whilst on desktop choice is fairly normal, web browser on a phone feels like a core OS tool. Realistically this is probably due to how shitty IE was, along with the enforced browser choice dialogs on Windows?

You have an iphone, so all your browsers are essentially Safari. The other features they may compete on are syncing to your desktop and stuff like that.

On android, Firefox Mobbile has uBlock Origin, whereas Browser, Chrome and any other one do not.

Anyone not using an ad blocker is out of their mind, especially on mobile; The simplest way to do that is Firefox Mobile + uBlock Origin.

How effective do you find uBlock Origin on mobile? I find that half the time elements don't get blocked. Disabling most JS using uMatrix seems to be more effective.

Very effective. Depends, of course, on the lists you select (I select most of the general and a few regional that are relevant to me) - and your tolerance to having to whitelist specific domains when things don't work. Mine is very high - and if you're a uMatrix user, it seems yours is too.

(I use uMatrix on the desktop, but only surf casually on mobile, so I find uBO sufficient).

Ok so a lot of that drop can be ascribed to a market rising in share of total browsers, and firefox having a lousy share of that market.

which I don't know what to do about that. I use an iphone and I see no benefit to getting FF on my phone, which I guess I would say is due to monopolistic behavior by Apple.

Google reCaptcha is one of the most pervasive nasty tracking devices because legitimate sites use it for its advertised spam protection without caring about anything else. Suddenly you have Google able to track you on Government sites (like opting out of myhealthrecord), Banks, Exchanges, and a myriad of others. I hate it because they make you choose between letting Google track you fully, letting Google track you only a bit and spend forever trying to solve the captcha (ie if you are logged out of google or have tracking protection on), or not using the service at all. :|

It's also the single largest inhibitor to being able to use Tor on a daily basis.

I really like the idea of Tor and would like to use it as my daily driver. When I try, nearly every website I visit (thanks cloudflare) forces me to spend several minutes trying to solving recaptchas. Despite my best efforts at solving them correctly it usually takes several attempts.

I've written scripts that use the APIs of human-powered captcha solving services but even those can often take a couple minutes so I'm stuck sitting around waiting for a result.

I really wish recaptcha would die. I understand that they're intended to stop scrapers and bots (or at least it's marketed as such). I'd gladly pay a few satoshis to websites to bypass these things.

Yeah cloudflare is silently man-in-the-middling many websites too... Now imagine what could happen if, I don't know, there was some bug that leaked some of that sensitive information that cloudflare deals with throughout the entire web...

Even the UX of reCaptcha has seriously gone downhill recently. I wonder how many hours per week I spend clicking pictures of busses and traffic lights. I would rather pay than complete them at this point.

I wonder how many hours per week I spend clicking pictures

Where are you running into this problem. I spend an embarrassing amount of time online and I'd probably estimate that I average around 1 minute a month "clicking pictures"

Spend a day or two using Tor browser and you'll feel the pain that recatpcha causes real human users. It's beyond frustrating.

I do a lot of crypto trading, and many sites probably have the reCaptcha security set to max.

If you were to pay, then you'd have to enable Google cookies to know that you've made the payment. If you enable Google cookies for those sites then you don't have to deal with the reCaptcha (as much).

I'm not a huge privacy freak - I have the default cookie settings on all my browsers and I'm signed into multiple Google accounts. They know who I am. I can only suspect my adblocker could be blocking something there, but other than that I should be about a regular of a user as they come.

> make you choose between letting Google track you fully

It's not black and white. Turn off blocking when logging in and turn it off when you have the access cookie. Yes, it might be a tracking event, but you are not forced into having your privacy blockers turned off all the time for Google.

Thank the spammers and abusers

The point is that there are plenty of captcha services and self hosted captcha software libraries that don't leak information to Google.

Are they still the 'squiggly letters' solution or have done something better?

"Squiggly letters" are still fine. There's a lot of FUD around AI & ML breaking them but I have yet to find an off-the-shelf tool that can break them; and so do the spammers.

Sure, you might break them if you pay a team of computer vision scientists for a few months but that isn't profitable for spammers, so even though they are technically breakable, in practice they're still good enough to thwart spam & bruteforce.

Google’s reCaptcha must die. I’m not training your AI, Google.

Your reluctance to train the AI has been dully noted.

Is the note taker bored?

I'm kind of surprised that recaptchas are 'training AI' at google.

I went through a phase of trying to use tor just for the principle of it. So I was hit with a lot of recaptchas. And I really couldn't see how they were testing AI directly.

Usually, its the 'pick pictures of road signs' and 'pick pictures of cars' and so on. Almost all images seem to be taken from streetview cars, or possibly waymo cars.

So, the human doing the recatcha has to parse an image to extract the implicit depth information to pick out cars etc.

Whereas google know the exact location of the image so know where the roads are etc, and probably have lidar scan so they know the depth information etc too.

Picking a road sign out of a picture is kinda hard for a 2D image. Picking it out from a 4D scene created from all the different sensors their cars have, on the other hand, is a piece of cake.

Eh, I don’t think they’re training Waymo AI. They’re training their general image classification AI which doesn’t benefit from extra sensory data.

So if google want to train general image classification to pick out things, e.g. road signs:

google can use their 4D datasets to select a gazillion real still images that contain what they _know_ are road-signs or not road-signs or whatever.

And then they can feed these to their image classification stuff.

How does humans clicking on tiles help train their image classification AI?

I always deliberately try to mis-train it. But they can undoubtedly filter it out.

I can't prove it. But I have been hit by unsolvable captcha's when I purposely give it wrong answers.

It's hard to know. I generally try "well enough" and provide a couple of incorrect positives and negatives. I was forced to do about 8 yesterday, and I messed up all 8 slightly. It might depend how overtly you try to game it. I've tried to be subtle, but I don't know whether or not it's effective.

Back in the days of book digitising reCaptcha it was fairly easy to distinguish which word was from the book vs computer generated and only answer half the captcha and pass.

I always inputted the optional word with slight errors. A totally wrong answer would be easy to filter out, but one or two letters being wrong while it's still a proper word would hopefully be harder to get out of the training data.

I don't get your argument.

I get the original post's argument of not willing to be tracked by Google, but what hurts you so much in training Google's AI?

Well, mainly because Google is a 1 Trillion $ company and why should you help them in exchange for nothing? It's not like they're open sourcing all their data for others to use. They use and control whatever they can in their own interest.

You aren't the customer. The website is. They're providing a valuable service to the website. The website is providing a valuable service to you. You get to choose whether or not doing one unit of Google AI training work is worth using the services of the website. Generally speaking, I find that trade to be quite fair.

> You get to choose whether or not doing N units of Google AI training work where N measures how much Google hates you is worth using the services of the website.


Well they do release tons of datasets,and their open source contributions have a lot of importance as well sinxe the past 10-15 years. Sure other companies do OS stuff too,i know. Even, the massive improvements in computational photography in the Pixel phones are explained by them in great detailed blog posts. They even built a special search engine especially for Datasets.

The AI you help train will be used to assist authoritarian governments (e.g. Dragonfly) imprison and murder millions of people so that Google can make a few extra bucks.

How do you see the car and traffic sign recognition AI being used to imprison and murder people?

Because the AI they're perfecting can and will be used for other purposes as well?

How do you see that data improving the AI for other purposes?

I think it's safe to assume that the code used to recognize cars and traffic signs can also be used to recognize other things, such as muslims, people using apps the government can't spy on, people wearing/holding objects that the government want to eradicate (perhaps a banned book). I think it's also safe to assume Google benefit immensely from having millions of people ready to test their changes in order to optimize their code and algorithms. I'm also sure that they do more than just check if you picked the "right" squares, such as storing every square you've ever chosen along with other stats such as completion time and correctness. I wouldn't be surprised if the Chinese government (as well as others) had interest in using such data to detect "rebels".

I would also be surprised if the work wouldn't be beneficial to other AI related work that can be used to build detailed profiles among other things.

Time and effort wasted. Work imposed on me by a hostile company with which I have no direct dealing and that financially doesn’t need this.

Agreed, AI is technological advancement, and Google makes most of its datasets available to everyone and goes great lengths to explain the technology it creates for those who are willing to read it. I don't understand what's the issue with semi-passively contributing to a potential technological advancemet.

This is irrelevant in case when I want to access a service of a third-party whom I'm paying to provide the service, or even worse - a service of public authorities.

> what's the issue with semi-passively contributing to a potential technological advancemet.

You're wastly overestimating your contribution to the technological advancement though this channel.

You're doing work for them for free. It's unpaid Mechanical Turking.

Devils advocate: you are getting access to a service that uses google's assistance for spam or abuse protection, not entirely unpaid.

No, the company that integrates it makes me pay (by doing work for google)

Again, worthy trade off, not much of a work to solve one recaptcha, and the service that you use remains available more reliably by filtering out spam bots that could potentially render it out of service otherwise.

Either way, you train ai and watch ads in exchange for content.

I wish EFF or some other privacy advocate would start making a fuss about recaptcha.

Recaptcha will often make you retry multiple times despite obviously correct answers. It really feels like Google is punishing users for trying to opt-out of their data collection.

Google tracks you anways. Do that in a VM over a dedicated vpn. The problem is legal,your local law needs to prohibit non-consent user tracking of any form.

I think that's why they removed their motto of "Don't be evil". [https://www.searchenginejournal.com/google-dont-be-evil/2540...] Google is now what Microsoft use to be. And, Developers like me end up hating Microsoft for everything.

I do not know how Australian's law say, but fortunately in France all I need is taking a small screenshot or video, publish somewhere and phone (at least for now ANY public administration have human-operated public phone services) signaling a problem. At this point normally other's take the potatoes and work for me as they should being civil servant's...

However I fear a future in witch Windows scenario of the '90s replicate tomorrow with websites and that's far worse since actual IT grow rate and general situation...

Tried enabling the resistFingerprinting option, and while it was (even more) annoying, it wasn't impossible. Seemed to require one or two additional screens. The slowly fading images are really frustrating, but I get hit with them normally.

I'm also curious about how this works internationally. Surely not everyone know what a 'crosswalk' looks like.

Can you file a legal challenge? You are being forced to do for-profit work, for a for-profit company, to pay your pubblic school fees.

Sure, it's just 1 cent worth of work in the grand scheme of things - but the distance from zero to anything is much larger than the distance between 1 cent and 1 dollar.

That's something that courts dislike much more than any tracking.

Are there any good alternatives to reCAPTCHA? If I remember correctly, it started as an academic/nonprofit from Carnegie Melon to digitize books, but then somehow got acquired by Google.

Have any academic groups looked at offering a replacement?

Computer puzzles are dead. AI is too good. Anomaly detection is the future. Track your users’ behaviour yourself, find outliers, handle them appropriately.

Repost of my comment above:

"Squiggly letters" captchas are still fine. There's a lot of FUD around AI & ML breaking them but I have yet to find an off-the-shelf tool that can break them; and so do the spammers. Sure, you might break them if you pay a team of computer vision scientists for a few months but that isn't profitable for spammers, so even though they are technically breakable, in practice they're still good enough to thwart spam & bruteforce.

The web has become fraudsters against bots, leaving humans excluded. The few humans left on the web are hugged to death by ad and tracking networks.

Sounds like some dystopian policy to undermine non-compliance in the general population.

Someone should write a fiction on it.

This isn't what 'Show HN' is for. Take a look at https://news.ycombinator.com/showhn.html

Sorry, I actually realised that shortly after posting but it was too late to edit, thanks to the mods for rectifying.

You can just email them if you need something edited or fixed, they're set up with email in their cages so they're quite responsive.

"Tell HN" would've been more appropriate.

"Tell HN" is kind of implied in every post and comment.

As someone that lives in NSW, I find this to be very sad and distressing. You should complain to Westpac and the government to let them know that this is not okay.

I was able to solve it with "resistFingerprinting" set to true but I had to unblock Google's domains in uMatrix and it took 3 submissions. I really dislike companies that use Google's reCAPTCHA...

I hope someone launches a captcha cracker soon as an extension.

and don't think about doing that while in a country where you don't know their native language. The challenge that Google gives you will be in that language, with no way to change it to English

Did you file a complaint?

To whom?

The Department of Education of New South Wales, I assume.

Is there any decent alternatives to reCaptcha?

