reCAPTCHA v2 is superseded by v3 because it presents a broader opportunity for Google to collect data, and do so with reduced legal risk.
Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.
It's needless to say that the kind of data that is collected by reCAPTCHA v3 is extremely sensitive. Those requests contain data about your motor skills, health issues, and your interests and desires based on how you interact with content. Everything about you that can be inferred or extracted from a website visit is collected and sent to Google.
If you'll refuse to transmit personal data to Google, websites will hinder or block your access.
It’s nonetheless a shame that it’s so universally misunderstood how ad-supported megacorps make their money that even highly sophisticated users of the web still talk about the value of personal data (source: I ran Facebook’s ads backend for years).
Much like the highest information-gain feature for the future price of a security is it’s most recent price: ad historical CTR and user historical CTR (called “clickiness” in the business) are basically the whole show when predicting user cross ad CTR. The big shops like to ham up their data advantage with one hand (to advertisers) while washing the other hand of it (to regulators).
As with so many things Hanlon’s Razor cuts deeply here: if your browsing history can juice CTR prediction then I’ve never seen it. I have seen careers premised on that idea, but I’ve never seen it work.
That may be the case for some people, but that is not my complaint, nor that of many folks I know.
I simply don't care how FB, Google and other surveillance outfits make money. I don't care about marketers' careers or their CTRs. I don't even care about putting a dollar value on my LTV to them.
I care about denying them visibility into my datastream. It is zero-sum. They have no right to it, and I have every right to try to limit their visibility.
Why? None of your business. Seriously - nobody is owed an explanation for not wanting robots watching.
But I will answer anyway. It is because of future risks. These professional panty sniffers already have the raw material for many thousands of lawsuits, divorces and less legal outcomes in their databases. Who knows what particular bits of information will leak in 10 years, or when FB goes bankrupt? I have no desire to be part of what I suspect will become a massive clusterfuck within our lifetimes.
If you're correct that this data has so little value, then it is more likely it will leak. FB and Google are the equivalent of Superfund sites waiting to happen, and storing that data should be considered criminal.
That's entirely fair! But also: You have no right to use my website, and I have every right to limit your access.
Recaptcha is simply part of this negotiation.
What will happen with v3 if I block gstatic.com? Will I be given the highest threat score?
> Recaptcha is simply part of this negotiation.
It is only a negotiation if I know it is there.
"We track all of your activities and provide third parties the ability to do so as well - to provide a better user experience - and we may or may not sell or distribute the collected data at our own discretion and continued use of this site grants us permission in perpetuity. Further, should you decide to sue us, you agree to binding arbitration at a venue chosen by us, conducted by an arbiter of our choosing, in which case you promise to lose regardless of outcome. If you disagree, please leave the site now but just know, by being here and reading this, you have already granted us this power and we've mostly already collected what we needed from you. Thank you. Stop wasting our bandwidth now. Fuck off!"
You're not going to get through a site with a properly implemented captcha just by blocking it.
And it's difficult to imagine legislating that away, as it's sort of fundamental to all network computing.
The idea that FB and Google are openly making a trade with users is ludicrous. I'm horrified that you either sincerely believe that there's a fair negotiation happening or that you don't care (given your employment history).
In the US for example you can't set up your web site in a way that accessing it discriminates against people with disabilities.
An even bigger issue, for me, is having my face added to their facial recognition algorithms, despite never once tagging myself in a photo. Is there a way to opt out of this?
It's as much of a negotiation as offering someone to pay through perpetually indentured servitude is: it's illegal and immoral.
Do I think Facebook/Google/etc are abusing my data right now? Probably not.
But do I think that large scale collection of my data could be abused in the future? Most definitely. If the Cambridge Analytica scandal has taught us anything, it’s that having access to this data is rife for abuse and often it might happen in unexpected ways.
And do I owe an explanation for wanting some basic privacy? Absolutely not. If a random stranger stopped me in the street and asked me lots of personal questions there wouldn’t be an expectation that I have to respond. Yet the likes of Facebook and Google seem he’ll bent on turning the discussion around when it’s data collected online.
I would think, that in the EU, under GDPR, collecting, transmitting and storing that data is in fact criminal, or at least subject to heavy fines. And under GDPR it won't help to just note the data collection in the TOS or ask the user for permission (under threat to not allow access to the service). So I really wonder how google plans to run this in europe.
What use is "we are transparent with our users about the data we collect" when the user does not want you to collect the data in the first place? And they give you no option to opt out of such data collection? (And for what - just so that they can create a better ad network that can better exploit us with our own data?)
(And don't get me started on Safari spying and all their "anonymous" cookie collection crap without giving the user any choice in the matter, essentially forcing everyone of their users to opt-in to be profiled through their browsing history).
I don’t want to single you out personally but there’s a broad trend on HN of bitter-sounding commentary on the surveillance powers of these companies by people who can easily defeat any tracking that it’s economical for them to even attempt let alone execute that reeks of sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes (as a rank and file employee) anywhere else.
Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?
How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
If I use a site and view my profile page and the url contains aa account id or username and some google or facebook analytics is loaded, or a like button is sitting somewhere, how am I to know that before the page is loaded? What if I'm visiting the site for the first time after it's been added?
It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
> quarantine them to a VPN/incognito interaction
Which does very little. I spent a few hours this morning trying to get a system non-unique on panopticlick, but the canvas and WebGL hashing is enough to dwarf all the other metrics. There are extensions to help with that, but for the purpose I was attempting, were sub-optimal (and the one that seemed to do time-based salting of the hashes wasn't working right).
So, I don't have any confidence that a VPN and incognito really does much at all.
It is small comfort for the average user, but the way you do it is use noscript. It makes the web awful, sure, but it won't happen to you.
> It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
I sort of wonder what you envision this actually meaning. If I spam your website and you add a DoS filter for my IP, should I complain you made a profile of me? If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
I mean it to respond to the common response people sometimes give in conversations like these, which is "that's why I don't use Facebook" or "that's why I stopped using Google services". For this conversation, whether you use Facebook or not is irrelevant, they still gather your information, and in the same way myriad other advertisers (or however they bill themselves) do through online tracking. Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.
> If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing.
Here's the thing, we're a market economy. There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction? If they aren't aware of the transaction, there's a name for that when it's a regular good, and it's theft (or fraud). The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply. I admit, they may not make sense to apply entirely, but at the same time, it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
I meant more like in a literal sense, but okay. Point taken.
> No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing
These human metaphors are powerful, but don't map at all to basic analytics concepts. There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions. If we could have done this before now, we would have because it's a whole hell of a lot more ethical.
> Here's the thing, we're a market economy.
I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
> There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction?
Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave. Most folks know local proprietors can and will kick you out and put up a photo if you act up.
> The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply.
This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
> it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
I am trying to read your comment in the spirit it was intended rather than the literal delivery, so please forgive me if there is a subtle impedance mismatch here but...
Welcome to the future, I guess? The top 50% earners of the world has access to computers that would have once bankrupted a nation to produce, and the options are still surprisingly good for the next quartile. With that power, it means that the people around you are going to start noticing things and making decisions about them with the information they can now process.
Ideally, this will be a distributed thing, but right now due to the nature of our society, authority of this sort is highly concentrated. But the dam has broken. A total surveillance system for up to a modestly sized city, with realtime tracking and long term data storage, is well within the reach of anyone with $10000USD to spend on hardware. They can self-host it. The banality of this cannot be overstated. It's boring to do this now. It's not new ground. So much so that average people can monitor their homes with it, or know if their friends have gone missing with it.
To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
I don't think it's creepy because there's a (theoretical) person watching me, I think it's creepy because they're cataloguing all my actions in a systemic was which pierces the veil of perceived privacy (mostly through anonymity).
> I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
I'm not sure how to respond to this without a specific criticism of how you think it's incorrect. That said, it's somewhat tangential to the point, even if it would be an interesting conversation.
> Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave.
I don't know. If every time I entered my local 7-eleven someone picked up a clipboard, flipped to a specific page, looked back at me, nodded to their self and then marked something on the page, I might decide to go somewhere else, at least most the time. If I knew the info was shared with all the other 7-elevens, and the local grocery chain, and some hardware stores, that makes me want to use all the places less.
> This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
I said "this" to qualify what I was referring to (personal information) and distinguish it from other types of protected information, of the type you reference.
> To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
I don't think that's the only response that can work. It's the only one that works completely, as deciding to not care is always a solution to caring, if you can pull it off.
The alternative is new laws. Are they perfect? No. Will they solve the problem adequately? Likely not. Do they have a chance of making a positive difference across the board for massive amounts of people by empowering them with regard to their own information? I dunno. Maybe? I think it's worth pushing for though. Otherwise, why do we have minimum wage and labor laws? At some point we could have thrown our hands up and said "screw it" about that stuff, but people pushed for it, and while they aren't perfect, I think we're all better off for them.
I don't believe there will be any perfect solution to this ever, or even a good or acceptable solution all that soon. I do think it's still worth raising my voice over, because I think there are some possible futures that are better than others with regard to privacy and personal information, and I think that's worth pushing towards.
More sites than you'd expect work without js or with first-party js only. It's annoying when you need to read a news site, because those are usually bloated garbage. Not a huge loss.
What matter is that my personas using other VMs, through other VPNs or Tor, don't get linked to my meatspace identity, to Mirimir, or to my other personas. And that's doable, I think.
My main point is that the amount of effort you have to go through to achieve that is very high, and I wish it was considerable lower. There are technological changes that could help with this, and legal changes that could help with this.
I think a comfortable place would be if you visit the same online location using your main browser using one IP, and a private browsing instance of that same browser on another IP (through a VPN, proxy, or just new public lease), it would be nice if there was some expectation they didn't immediately have a high degree of certainty you were the same individual. For the general populace, this falls on its face.
Tor has quite a few mitigations to help here (e.g. simulated window/screen values), and Firefox has started to adopt some of them, but as mentioned here on HN frequently, Firefox sometimes has problems with CAPTCHAs and certain sites (I haven't had those problems, but I'm also not usually using it through a VPN), and I know Tor is sometimes blocked outright.
The point is that until most these protections (technological and hopefully some legal) are mainstream, completely protecting yourself is a double edged sword, since you also ostracize yourself from some sites and services. Tor is the equivalent of walking around in padded, baggy clothes and a ski-mask. Sometimes, like in the snow, it may seem fairly normal. Other times, like at the beach, it may preserve your privacy, but it's very uncomfortable and may cause people to avoid you, if not outright shun you and run you off. If everyone starts wearing masks and covering their hair, if you do the same you probably have a fairly high degree of anonymity and privacy through it.
In summary, I think Tor is a useful and necessary tool, but nowhere near sufficient for where I think we need to be generally.
That's true. However, it's mostly one-time effort. There are Linux and TrueOS workspace VMs, pfSense VMs as VPN gateways, and Whonix gateway and workspace VMs. All in VirtualBox.
There's ~no configuration required for the Whonix VMs. You just need to point the gateway VM to the pfSense VM that ends the desired nested VPN chain. And if there are multiple Whonix instances, rename the internal network that the gateway and workspace VMs share.
For the Linux and TrueOS workspace VMs, it's just like any OS install. You do have more machines to maintain, but mainly that's just keeping packages up to date. All of the devices are virtual, so you don't have driver issues.
Setting up the pfSense VMs is the hardest part. But once that's done, you can use them for years. pfSense is pretty good about preserving setup for OS upgrades. And there's a webGUI for changing VPN servers. But it's harder than using a custom VPN client.
So yeah, it's not so easy. However, someone could write an app that papered over most of the ugly parts. That even automated VM setup and management.
So, sure, a clean browser and IP and never logging into a site you're previously visiting might be enough, but who does that, and doesn't that halfway defeat the purpose?
My meatspace identity uses a desktop that hits the Internet directly. It displays no interest in technical matters. Just banking, cards, shopping, general news, etc. It never accesses HN, or any of the other sites that Mirimir uses. Or that any of my other personas use.
Mirimir uses a VM, on a different host machine, and hits the Internet through three VPNs, in a nested chain. Some other personas use different VMs on the same host, connecting through different nested VPN chains. Some are Whonix instances, connecting via Tor, and reaching Tor through nested VPN chains.
So basically, each persona that I want isolated uses a different host machine and/or VM, a different browser, and a different IP address.
To me, that seems par for the course for any service that's generating profiles of browsing behavior and trying to make any sort of decisions based on it. It reduces cruft and duplicate profiles while also providing more accurate information. Why wouldn't it be done?
> the information-theoretic validity of your argument
The portion about canvas, WebGL and AudtioContext hashing is not theory at all, it's well known practice from years ago. Jest the other day here there was a story about some advertiser on Stack Overflow trying to use the audio hashing to tracking purposes.
Hell, if you get enough identifiable bits of entropy, you can probably assume weak to strong level matching using a bit-level Levenshtein distance that's low enough.
Cheat sheet: you can’t.
whose results would have you believe that one's footprint is very unique. I'd be interested in hearing more about why this is hard to implement into an efficient process.
I don’t see how this is related to the claim, since it doesn’t solve the problem. But the advertising company that I let run code on my website will certainly do the job pretty well, I’d say.
I would've thought that would be a pretty useful exercise
Are you serious? Have you tried not using their services? Try blocking Google Analytics, Tag Manager, ReCaptcha, fonts, gstatic,... What you will see is that you can no longer access much of the Internet. Want to participate in StackOverflow? Good luck if you block Google.
My beef is not with them trying to find my data when I'm on their site(s). They are however everywhere, on almost every site I visit. Coupled with their (impressive) technical provess it is beyond creepy, and there is simply no way one can avoid them.
I don't know what the solution is or will be, but as far as I'm concerned, this should be illegal.
Blocking those two doesn't seem to break much, does it? I have uBlock Origin and/or Privacy Badger block them everywhere.
ReCaptcha on the other hand…
Just this week I needed it to complete the booking of an airline ticket and just now buying a high chair for my son. And today I've completed the blasted thing ten times in a row because of a game installer that was failing at a certain point (GTA V's Social Club thing); each attempt to figure out what was wrong meant completing the ReCaptcha again.
Fire hydrants, parking metres, pedestrian crossings, road signs, hills, chimneys, steps, cyclists, buses — that's what the internet looks like in 2019.
The right vehicle for this is antitrust, but if you think you can sell that in this climate then I’ve got a great deal for you on the London Bridge.
Edit: That is, ridiculously easy for new companies. Incumbents have been hoarding data for too long and it was actually harder for existing companies to become compliant.
When you’ve built a social consumer business in Europe that is profitable after compliance, send me a term sheet.
I didn't build a profitable social consumer business in Europe after compliance, but I was part of a team that implemented compliance for a long existing company within the US due to them having clients and client's clients in Europe. They're profitable. Do you want my term sheet? Or are you weakly attempting to flex while complaining that people's basic right to privacy is preventing you from earning obscene amounts of money?
It feels like a regulatory moat for the big players who can afford it. Sorta like a complex VAT policy.
If you do everything right from the start, the costs are minuscule.
It literally is:
- you only store data you require to run your business
- you delete data if customer requested deletion
- you give the customer their data if they ask for it
If your profitable business is built upon selling customer data wholesale to third parties, then good riddance.
It's still early days. We'll see what will happen when the DPA's and the courts have fielded a few high profile cases.
These are societal problems. It's good to care about people beyond yourself, and to talk about the professional ethical responsibilities of software engineers with regards to corporate mass-surveillance.
No you can't. Facebook creates shadow profiles for every single person in the world. If any single one of your friends has WhatsApp, Facebook has your phone number. They have your phone number and the entire address book of your friend, who probably has friends in common. If two of your friends have WhatsApp and they both have your number...
You see where I'm going here? There are pictures of me on Facebook that I did not put there. From friends or friends of friends.
I'm not even scratching the surface of what Google knows with GPS and WiFi connections.
No one consented to any of this bullshit.
This is in a sense the worst kind of argument: superficially correct but really meant to tap into a popular groundswell of sentiment.
The question isn’t “can FB use an off-the-shelf CNN to identify me personally” but rather:
“If it weren’t FB who would be doing it instead?”
“Should cheap digital cameras be illegal?”
Those are a complete non-sequitur.
Facebook (and Google) analyse every single photo that goes through their system with state-of-the-art ML (it's so good that it almost beat humans at matching faces ~5 years ago). This is a scale of surveillance which the human race has never encountered before in our history[+], and is a serious problem that we (as a society) need to make a decision on. In many countries, car license plates are OCR'd and automatically tracked whenever they travel on almost any main public road. Facial recognition in public places and on public transport is becoming a prevalent problem. And wearing masks is illegal in many countries -- meaning there is no way of "opting out" of the pervasive surveillance in the physical world. None of these things were nearly as commonplace (or even technologically plausible) ~30 years ago.
Cheap digital cameras are a completely unrelated topic. And if such large-scale surveillance was made illegal then nobody would be doing it legally, and those doing it would be held accountable for the public health risk they pose. We don't let people build buildings with asbestos any more.
[+] The Stazi and KGB only really had filing cabinets for tracking people and physical surveillance measures. The Gestapo didn't even have that (the Third Reich had census data which was tabulated using IBM machines in order to track who was Jewish within the Third Reich).
The reason what these amazingly benevolent companies are doing and collecting matters is because the systems we build today are precisely what will power the dystopias of tomorrow. As the GP mentioned, Nazi Germany used census data to select and track their victims, aided by some primitive computational technology built for the Nazis by IBM. In spite of how primitive all of this technology was, it ended up being quite effective at enabling them to achieve their ends.
Now compare this to the systems we're building today. Genuinely bad people do, and will, manage to take power in any system. It's not a question of if, but when. And these systems that we're building will be at their disposal. It's the same reason that in politics if you're considering granting the government more power you shouldn't think about today, but about tomorrow. Not do I want "this" administration to have those powers, but do I want future administrations - whom I will vehemently disagree with, to have those powers?
Btw the argument you just made applies to any form of surveillance or censorship. Just because your can still find functional VPN services for China, is China's great firewall OK?
And what happens when web services start blocking VPNs?
Netflix does it quite successfully. And I'm sure Cloudflare could provide such a service for free.
I agree that there is a vast and almost impossible to regulate overreach by these companies. Your argument is extremely compelling.
But when HN users complain about being spied on I smell a FAANG rejection letter.
You’re projecting Ben.
I work at a FAANG: here’s my complaint about being spied on.
(This is a genuine question. Many of your comments have added to the discussion and I've upvoted them. But I've also downvoted many that haven't.)
This is not an argument and moreover not even true: there are companies that pay well and don’t collect reams of data on their users.
Even from the inside I didn’t see a way, but I’ve been wrong before.
You are wrong. I block the known IP blocks of the big surveillance shops and a lot of the small ones.
> sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes
Are you sincerely saying you believe people who are uneasy about surveillance are just jealous?
 Twitter is currently an exception, I was playing with something. But I'm going back to blocking them soon.
I wish I had stayed out of this from the beginning, I see no merit in arguing about whether HN has some themes. I’ve been watching it daily for a long time as you can tell from the age of the account.
If you want to do something that would be both a good call as a mod and a favor to a longtime user, just whack this whole thread. I was trying to chime in with some knowledge but just wound up pissing everyone off.
One thing I can offer from years here is: never underestimate the silent readership (I'd say silent majority but...associations). The vast majority of readers don't comment and most don't vote either. It doesn't mean they aren't following and getting a lot out of what you wrote. Usually it's only the most-provoked segment of the long tail that is motivated to respond. That's fine, it's the cycle of life on the internet—but it doesn't represent the whole community.
Please comment more.
Recaptcha tracks users / devices, not IPs. A VPN won't help, it'll only lower your score. At that point: not allowing them to track you just means you can't use large parts of the web.
"You don't want that GPS tracker installed into your skull? Well, we won't force you, of course, but public transportation, government services and most grocery stores can only be used by GPS-skull-people"
With invisible captchas, you can't even sit down and solve a higher number of riddles to prove that you're really human and know what a fire hydrant looks like even though you look kinda strange. If Google doesn't believe that you are human, tough luck. Unless you have a personal connection or a solid Twitter following that an amplify your concerns, nobody at Google cares. Does your government care? It makes their life easier and normal citizens never really had problems with it.
DHL makes me solve a captcha to login and buy postage stamps. There probably are, or will be, public transportation companies that use recaptcha. It helps them to combat voter fraud (crime, abuse, election meddling, fake news, lots of things) if they know where (on the web, for now) you've been in the last 6 months.
You don't like the "implanting" part, because that's unrealistic? Just wait 20 years, and it may not be your head, but an RFID chip in your hand (yeah, those exist already). Until then, carry your gps tracker around and install their software on it, so it can collect data on your behavior to make sure that you're not a criminal.
Then why is this data collected and archived in the first place?
The problem is that the individual bears a very small risk of something very bad happening: "consider the hypothetical case of a gay blogger in Moscow who opens a LiveJournal account in 2004, to keep a private diary. In 2007 LiveJournal is sold to a Russian company, and a few years later—to everyone's surpise—homophobia is elevated to state ideology. Now that blogger has to live with a dark pit of fear in his stomach."
The individual of course gets no benefit from the small chance the company can monetize on this data trove. So even though chances are they aren't harmed at all by this data collection, arguably the expected value of the benefit/harm to the individual is negative (harmful). But that doesn't change the data collector's calculation, of course. That's why government regulation is necessary.
It's mostly all collected because it's easier for them to collect it than not to collect it, and nobody is stopping them from collecting it.
My behavior on the web being tracked by corporations with little incentive to do right by me is worrisome.
The main topic we discuss is corporate surveillance. We are concerned about all the personal data that leaves our control. We are worried that evading this type of surveillance becomes increasingly difficult.
Some HN users may know how to mitigate these risks, but most people may not know how to defend themselves against corporate surveillance.
This is why me must speak up now, and not just for ourselves.
For the record I am inked all over with anti-equation group stuff: I agree that these companies are too big and powerful (and I would know).
I just don’t see a solution with the present judiciary. If anyone has a bright idea my email is in my profile.
I will thank you all in advance for not shooting the messenger.
For the record, many of your comments here have been thoughtful, and I've upvoted them. I've also downvoted many where instead of responding to other people's thoughtful comments, you just insult them instead. Those are also the ones that other people seem to be downvoting. I don't think anyone is shooting the messenger here.
You state yourself that Google/Facebook publicly claim to advertisers that personal data improves CTR prediction. So I have a hard time believing that personal data isn't useful.
Isn't demographic targeting exactly that, based on your browsing history? Will showing an ad for a car wash have the same CTR for people that liked car products as for people that did not like car products? Or is your point that it still has to be a human that inputs "this is about car things, please show it to people that like car things" and it's not a magic AI that optimizes it automatically? And in that case: isn't that just a matter of time? Build the profile today, build the tech that uses it tomorrow?
Can I believe you? ...even if you're telling the truth, big corps can hide their most malicious practices from most of their own employees.
To me, it doesn't matter how e.g. Facebook actually uses my data today, because even if they're telling the truth they could change their policies tomorrow, or get hacked, or some third party (incl. the gov't) could get hacked, etc. It's better as a user to try and prevent such data from ever existing in the first place.
That's great. So if my browsing history is useless then you won't mind not trying to snoop on it.
Why would anyone ever trust a goddamn thing you have to say about their data?
Unless they pay your salary and are asking you to give your expertise on hoarding and abusing user data, obviously.
If we allow users to harass and attack people who have genuine expertise for posting here, does that make HN better or worse? Obviously worse. Mob behaviors like this are incompatible with curiosity.
I have nothing to gain and everything to lose by shedding light on one of the most powerful entities in existence.
But TLDR it’s not as interesting as people like to think.
> If you'll refuse to transmit personal data to Google, websites will hinder or block your access.
I wonder how true this really is. 20% or so of web users have ad blockers, and most ad blockers block scripts like Google Analytics out of the box. It isn't hard to see that most of them will not make exceptions for a new Google tracking script. So any site that does any kind of testing at all is going to see that ~15% or so of their users drop off if they block users who don't have a reCaptcha v3 score. The only sane business decision in response to this is to go with some alternative.
(Of course, there will be some sites that continue to block users, it's just that they will mostly be the sites that already block users running ad blockers.)
Or were you referring to the risk that individuals would sue Google for getting blocked from random, potentially essential websites?
You do bring up a good point about the V3 being potential antitrust issue, but that has always been a potential problem even with earlier versions of recaptcha. With V3, it's also deferring the liability to the webmaster. The action that the website takes with the score is up to them - in the end it's just a number.
Also as a VPN user, I found out that migrating to more expensive, higher grade VPN, solved a lot of my problems.
In the end it is not privacy, not your VPN that matters from the service provider point of view. It matters that your IP address is spewing malicious garbage. I do not want to spend time sorting it out, as I can focus my activities to revenue generating tasks. Harming some cheap VPN users in the process is collateral damage, but I rather take it than build a form with a perfect attack mitigation and 10x cost.
I hope to see some alternative for reCAPTCHA that does not come with such a strong privacy oriented risks. hCAPTCHA https://www.hcaptcha.com/ seems to be interesting, also monetization point of view. But they are not yet well established company and I do not know what other risks their approach would bring.
- Your ISP is a source of a lot of malicious traffic
- You have some browser extension or other adjustments that makes it harder to analyse you as a genuine web browser
For example, using a browser automation like Selenium testing triggers "hard" reCAPTCHA. Not sure if this because of some automated API that Selenium exposes, or just because your browser profile looks virgin (no cookies) without any prior reCAPTCHA solves.
Also my IP address rarely changes and I don't think that any malicious traffic is coming from it.
And I have Comcast, so I hope that they didn't blacklist all of us...
(I did talk bad about Google a few times though, maybe that's it)
If you're actually concerned about that kind of data leakage, you want NoScript, full stop.
> To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages.
So if the article stated that websites were required to put the code on multiple pages (as the comment I replied to did) then the article is factually incorrect.
Running headless chrome is trivial, so just having it sit on the one page where you need to check it won't help much. Collecting more data on the user's action on your site will provide a much clearer picture, much like a video from somebody walking through a store will help you make a decision about whether he's trying to steal something than a single picture of him standing at the check out.
The important difference is that unlike Google Analytics, reCAPTCHA v3 is inescapable. You cannot prevent the collection of your personal data, because then you would loose access to large portions of the web.
And if you use something to prevent tracking - in my case Brave - reCAPTCHA is a huge pain that often takes dozens of clicks to make it through - delayed by Google to wait out bots.
Some times I think reCAPTCHAs main goal is to bring back those opposing tracking back into the fold of Chrome with painful recaptchas.
please consider not using recaptcha.
Anyone got some URL's that I can block all captcha attempts or does it mean I have to also sinkhole www.google.com ?
( I don't have a problem not being able to access captcha enabled sites. )
 quick check tells me I would have to banish this endpoint which sucks because I'd have to parse the URL on every request and can't do it in DNS: https://www.google.com/recaptcha/api.js
Recital 47: “The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned…”
Recital 71: “decision-making based on … profiling should be allowed where expressly authorised by … law … including for fraud or tax evasion monitoring and prevention purposes”
The fact that a third party server handles this is a problem. Because then the publisher has to have a data processing agreement in place with the third party.
This is what makes Google Analytics problematic too. The collection of analytics for improving the service can be a legitimate interest, however the data amendment for Google Analytics basically passes the blame on the publisher. I don't think many publishers read carefully Google's data processing amendment, otherwise they would drop usage of Google Analytics. Actually most publishers aren't even with GDPR for more serious reasons, like not anonymizing the user's IP or sharing data with Google for the purposes of ads targeting.
And there are many questions to be asked here.
Is that data private, for the use of the publisher in question, or is this a shared pool of knowledge between publishers?
If the later, then we have a problem, because even if there is a legitimate interest, it only applies to the publisher being visited. Can a user be blocked due to a profile that was built on another website? We are in murky waters.
Then there's always the question ... does the publisher really have a legitimate interest?
Claiming that you can have one under the law, doesn't mean you actually have it. There's a set of conditions that you have to comply with.
For example for the purposes of preventing fraud, at the very least you have to be able to show that fraud is possible. Just because you have a login form that's about managing the user's color preferences on the website doesn't mean that you can transmit the user's traffic to Google.
The requirements for legitimate interests are hard to comply with. And I have a hunch that in this case many websites won't comply.
I do all my mobile browsing on FF yet when I try to use some websites I always get this Recaptcha failed error(1) while it works flawlessly on chrome though I never use it often. Try it, maybe it will happen for you too.
Same happens on most sites which show you that "checking your browser" page via cloudflare too.
The web is very unusable unless you're using chrome because of such antics.
If only it was Google services alone. CloudFlare loves serving up a ReCAPTCHA for Tor users before they can even passively read site contents. That hugely expands the damage done.
The plugin requires "privacy passes". Those passes can be obtained by solving captchas, but when trying to do so, one is greeted with this message about being blocked: https://i.imgur.com/qXJfl6J.png
If it was developed in conjunction with Tor, how come it doesn't come bundled with the Tor browser or Tails?
Until software developers care -- nothing will happen.
Monopolies need to be broken up because they threaten the free market and consequently our way of life - not because employees revolt.
Juniper patented saying "No" to a client.
Yes I can and do. It's bad enough that some websites won't let you do certain things over Tor, but preventing access to the website entirely is unacceptable. I made this account and comment entirely over Tor.
I don't see how it's okay to block Tor. That generic claim is made, but how are your spam measures doing if you couldn't handle Tor spam?
>You might not be using it for abuse but a large volume of abuse originates from it.
There is infinitely more ''abuse'' coming from Google, and yet it seems most every page I visit contains Google malware.
On principle, I hold the idea that Tor should be a first-class citizen and not disadvantaged in any way. Notice that Google's ''HTTP/3'' is over UDP, which Tor doesn't work with; I don't find that a coincidence.
> like all IP addresses that connect to our network, we check the requests that they make and assign a threat score to the IP. Unfortunately, since such a high percentage of requests that are coming from the Tor network are malicious, the IPs of the Tor exit nodes often have a very high threat score.
Initially it was slow, yes. But totally fine the last few years for normal browsing and reasonable downloads. Speedtest.net, speedtest.googlefiber & fast.com just now gave me 5, 6 & 10Mbps for whatever server in Ghana i got. Only the high ping causes loading times to still be a bit annoying.
But right now the biggest reason not to use Tor for anything "legit" is the many services blocking you, since indeed most current Tor users are not what those services want and the race to the bottom of Tor will continue, if we haven't reached it already.
My own connection doesn't go over 1.6MB download speed, and only if the weather is clear and I have the wind in the back.
You can now achieve a 500KB or more speed in most Tor connection, which is enough to have a confortable browsing experience, imo.
The real downside is the google captcha, which happens sometimes to even denie you to solve a captcha in the first place for web pages where there is no user input.
Given that Tor is a tiny percentage of Internet traffic, most of the abusive volume out there has little to do with Tor.
edit: also didn't try it over tor
The walled garden approach worked for a while for Microsoft, and it's working for now for Google, but eventually, it stops working. Once people leave, walled gardens keep them away.
You can't just opt out of using half the Internet because you value privacy, and nor should you have to. This requires legislation to stop.
This of course doesn’t help explain why Firefox is so heavily targeted by what’s supposed to be a neutral utility like Google Analytics...
The idea of tracking your history across multiple reCAPTCHA loads across multiple domains to build a user profile is what sounds like a giant privacy red flag, even though it's entirely possible given the current implementation.
Additionally asking hosts to include JS directly onto their domain which sets 3rd party cookies/data across every page in addition to tracking referring domains is equally a bad idea. reCAPTCHA 2/3 does require loading 3rd party JS directly on page, which I'd imagine is necessary to create callbacks in the frontend upon verification (as iframe content messaging is very awkward):
Ideally the JS simply loads an iframe of the captcha HTML and handles the callbacks from events in the iframe. That's it. It shouldn't be touching anything else on your website. I'd be curious to see a reverse engineering to see how much the JS really does...
Yeah, no. It certainly can read non-google cookies on the page (not httpOnly cookies, though).
That said, I've no evidence one way or the other!
My understanding is that it comes down to information they can read about your browser (does this look like a bot environment?), and heuristically how the user has behaved since the JS has been loaded (mouse movements, time between actions, etc).
One trick that seems to help fool that awful piece of tech: click slowly on the images, as if you were thinking a second or two before each click. Maybe click a wrong image and deselect it again. In other words, behave like a slow human, and it seems to work better than if I solve it as quickly as possible.
Again, being slower and more error prone seems to be rewarded.
If this reduces the world Google allows me to access, it doesn't diminish mine because of it.
> "If you have a Google account it’s more likely you are human"
So, in the future if we don't keep signed into our google account(and let google know every article we read and every website we browse), we'll be cut off from the half of the internet or even more.
The amount of control a handful of companies have over the internet is suffocating to know!