Hacker News new | past | comments | ask | show | jobs | submit login
Google tracks individual users per Chrome installation ID (github.com/w3ctag)
2292 points by rvnx on Feb 4, 2020 | hide | past | favorite | 615 comments



Not endorsing this, but according to https://www.google.com/chrome/privacy/whitepaper.html#variat...

> We want to build features that users want, so a subset of users may get a sneak peek at new functionality being tested before it’s launched to the world at large. A list of field trials that are currently active on your installation of Chrome will be included in all requests sent to Google. This Chrome-Variations header (X-Client-Data) will not contain any personally identifiable information, and will only describe the state of the installation of Chrome itself, including active variations, as well as server-side experiments that may affect the installation.

> The variations active for a given installation are determined by a seed number which is randomly selected on first run. If usage statistics and crash reports are disabled, this number is chosen between 0 and 7999 (13 bits of entropy). If you would like to reset your variations seed, run Chrome with the command line flag “--reset-variation-state”. Experiments may be further limited by country (determined by your IP address), operating system, Chrome version and other parameters.


This is impressive doublespeak.

> This ... header ... will not contain any personally identifiable information

> a seed number which is randomly selected on first run ... chosen between 0 and 7999 (13 bits of entropy)

They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

> Experiments may be further limited by country (determined by your IP address)

They even admit to inspecting the IP address...

> operating system, Chrome version and other parameters.

...and many additional sources of entropy.

[1] why 24 bits instead of 32? The LSB of the address might be zeroed if the packet is affected by Googles faux-"anonymization" feature ( https://news.ycombinator.com/item?id=15167059 )


> > Experiments may be further limited by country (determined by your IP address)

> They even admit to inspecting the IP address...

I don't think that sentence admits what you say? Chrome could be determining which experiments to run client-side.

Of course, when you visit a Google property, they needs must inspect your IP address to send a response to you, at a minimum. That goes for any site you might choose to visit. The existence of sufficient entropy to personally identify a site visitor is not a state secret. They do not need this chrome experiment seed to identify you, if that's a goal.


Yeah, it's not a "state secret" but it's not common knowledge either. Their privacy policy says that specific header can't be used to identify you, but fails to mention it can be combined with other information to make browser fingerprinting trivial.

If you don't know how all this works, which is true for most human beings, their privacy policy might give you the wrong impression.


> says that specific header can't be used to identify you

That's not what it says. It says the header won't contain PII, which is true. It can be linked to PII, but so can literally every bit of information you send to Google while logged into or otherwise using their services. A disclaimer to this effect would not have any purpose.


That's the whole point. Using any Google service means they can easily personally identify you, that's what the privacy policy should explain.

That's their policy towards privacy, you don't have any. For some reason I can't fathom, you claim mentioning this in their privacy policy "would not have any purpose". Instead of honesty, their privacy policy is a wonder of public relations where it seems like they care deeply about protecting your privacy.


We disagree about the purpose of privacy policies. I believe that privacy policies should describe how data will be used, not how it could be used. I just don't think a policy describing how data could be used is very useful, because it's going to be the same for all services.

Under this formulation, Google's policy is (presumably, lacking any data to the contrary) honest with respect to this value.


"I believe that privacy policies should describe how the data will be used, not how it could be used."

Google's policy does not tell the user how her data will be used by Google's customers. The policy states Google will use the data to "provide better services". That is deliberately vague. That is the "purpose", but how exactly is the data used to achieve that purpose. There are no specifics with which a user could object.

Google does not only serve the search engine user, the email user, the YouTube user, etc. Its business is not free services. As such the policy is misleading as to what are the "Services" it may use the data to improve. Google's business is providing online ad services.

The truth is that Google collects data to provide better services to advertisers. The policy reads as if it only collects data to provide better services to users. The "free" services are just bait to draw users in. The data is collected to improve online ad services.


> The truth is that Google collects data to provide better services to advertisers.

I understand that that is what you believe, but I do not think this is factually true about the data collected from this Chrome header. I believe that Chrome team collects it in order to understand the impact of Chrome experiments on performance.


> I believe that privacy policies should describe how data will be used, not how it could be used.

This is key. If you subscribe to the "how it could be used" version, then even say possessing an android phone would be a violation of the privacy policy. Which is absurd.


This is a fair distinction, though it does not include the option of discussing how the data _won’t_ be used.


Per your observation, I would argue that the intent of the privacy policy as quoted above is pretty clear. When the policy says that the identifier doesn't contain PII, I believe that is meant to convey that it will not be used to identify you. But it's true that that use is not explicitly excluded. I'm not a lawyer so I couldn't tell you if being weasely in this way would count as fraud or not. Otoh, I suspect that Google is actually abiding by the spirit of the policy they wrote because honestly they have little to gain and much to lose by violating it.


If I log in to my Google account once, they can associate that browser id with my account. Even if I log out, clear my cookies (and probably use the incognito mode), Google will be able to identify and follow me all over the Web.

I don't know about your PII thing, but it's personal data under the GDPR.


AIUI GDPR restricts the handling and use of PII, not its existence. So it's PII under GDPR. Is Google misusing it? If so, that's an issue. If not, then it's kinda pointless to observe that it's PII under some possibly distinct legal definition than the one Google is using in its privacy policy.


You can't even login into gmail, at least from firefox in incognito mode.


It works for me, at least with 2FA enabled.


So if you use a VPN service for example, they still know who you are because of this. I would say even if you’re visiting in private mode.

I see your point, but I also see how this will keep you identifiable.


I don't math very much, but I would guess the intersection of these sets of people is nil: people who 1) use VPN to avoid tracking by Google 2) still log in to Google services from one of their networks and not the other 3) use the same Chrome profile on both. But suppose some small number exist who adopt this illogical and contradictory pattern of behavior. If Google is using this token for the purpose of tracking this tiny set of people when the vast majority could be tracked more easily via conventional means, it would imply that they are far more competent than I give them credit for.


So, someone starting up a vpn and opening incognito mode?


> They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not? If the IPv4 address, user-agent string, or some other behavior is sufficient to reveal a great deal of stuff, we have a more serious problem than that 13 bits. I agree that 13-bit seed is a concern. But I am wondering if it is a concern per se, or its orchestration with something else. Of course, how/whether Google keeps those data also matters.


One clarification:

- By default it's much more than 13 bits of entropy

- If you disable usage statistics then you are limited to 13 bits of entropy


Actually, the low entropy provider is used for any field trials that get included in the header.

See: https://cs.chromium.org/chromium/src/components/variations/v...


>Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not?

At the very least, having those 13 bits of entropy along with a /24 subnet allows you to have device-level granularity, whereas a /24 subnet may be shared by hundreds of households.


They have more than 13 bits of entropy

https://cs.chromium.org/chromium/src/components/metrics/entr...

Look how the function is called, high-entropy source :)


But if you disable telemetry, they'll only have 13?


+ip +browser version +some os info +fonts info +screen resolution (well.. viewport size) + + +


> This ... header ... will not contain any personally identifiable information

Except for everything you do on your browser. I'm so glad I haven't used Chrome for almost three years.


Yes, if you have enough bits you can come up with a fingerprint, but that's not what PII means.


It becomes PII the instant you can correlate that fingerprint with any PII.


This.

A bank account number is consider PII. Knowing the bank name & account number will uniquely identify the account holder's name, which is PII.


IP addresses are considered PII under both GDPR and CCPA.


... which is crazy unrealistic, since it's "PII" that can only stay "private" by collective agreement of every node in the network, but no accounting for the reality of network architecture in passing law, I guess.

Maybe a deep expectation of anonymity while accessing a worldwide network of cooperative machines is something people should stop telling the public they should expect?


Under GDPR you can use all the PII you reasonably need to provide expected services, you don't even need separate consent. But, if you have PII, the moment you use it for other purposes, or obtain/retain/share without proper cause, you are breaking the law.

IMHO, that is very reasonable.

Real world example - giving your phone number and information to your car mechanic / doctor / bank teller / plumber is reasonable. Using that information to score girls or ask donation for a puppy shelter would be considered improper.


I totally agree, and I think the GDPR is also reasonable in that it allows you to use the IP address for essential security reasons, such as blocking bad actors based on IP address - it doesn't say "thou shalt not track IP addresses", it says you need consent if you're going to use it for anything that isn't essential for security or in your end user's best interest.


Or they can stay 'private' by not being stored or correlated with other user data. GDPR doesn't apply to the network itself, it applies to whoever is using it.


"Stored" is definitely the purpose of a router. "Correlated" can be necessary for debugging routing issues (or client-server connection issues that are tied to the intermediary fabric near the client doing something weird; hard to determine if an entire subnet is acting up if you aren't allowed to maintain state on errors correlated to IP address).


Where do you get the idea that GDPR doesn't allow you to process PII for the purpose of routing packets?


Don't forget that just about any registration requires recaptcha these days


>Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

Wat? You mean to tell me they can identify you if you log into their service?

Am I missing something here? Who cares?


I care. I care that I even if I log off, even if I use a vpn, even if I go into incognito mode, they still can associate my requests with the account I initially logged in.


The problem is any website can do that. Incognito-bypassing fingerprinting is difficult to prevent, unless you use something like uMatrix to disallow JavaScript from everything but a few select domains.

This is a collection of random-ish unique-ish attributes. Any collection of such things can be used to track you, like installed fonts, installed extensions, etc. If this were just a set of meaningless encoded random numbers, then it's essentially a kind of cookie, but that's not what it is. This is (claimed to be) a collection of information that's useful and possibly needed by some backends when testing new Chrome features. It tells servers what your Chrome browser supports. The information is probably similar to "optimizeytvids=1,betajsparser=1".

So, the only question is if Google is actually using this to help fingerprint users in addition to the pragmatic use case. It certainly could be used that way, and it's possible they are, but they have so many other ways of doing that with much higher fidelity / entropy if they want to. If this were intended as a sneaky undisclosed fingerprinting technique, I think they would've ensured it was actually 100% unique per installation, with a state space in the trillions, rather than 8000.

Yes, this could be so sneaky that they took this into consideration and made it low-entropy to create plausible deniability while still being able to increase entropy when doing composite fingerprinting, but I think it's pretty unlikely. Also, 99% of the time they could probably just use use Google Analytics and Google login cookies to do this anyway.


Maybe one actually useful non-advertising usage could be reCAPTCHA ? If you read carefully, it says nowhere than there is the limit to 8000. There is this limit of 8000 only if you disable usage statistics / crash reports.


Sorry about that, too late to edit it now. That is an important detail. If there are 32 or more different feature flags, then that's 4 billion unique states, which would be an effective fingerprint.

I still think it's pretty unlikely they're using it in that way or would in the future, and I think Google fuzzing this for those who opt out of telemetry is probably a signal of good faith in this instance. They realize the privacy implications and provide a way to disengage, even if they don't intend to abuse the information.

But of course the potential for abuse always remains. And the potential for (arguably) non-abusive tracking, like the possibility of it being used for bot detection by reCAPTCHA, as you say.


reCAPTCHA is the most abusive type of tracking. Google simply denys you usage of captcha if you do not give them enough personal information. It doesn't matter if you enter the captcha correctly 20 times. It won't let you in.


This is part of the bot detection, though. It's probably not "not enough personal information", it's "this truly seems like it is unlikely to be a legitimate device/person", due to the huge datasets they're working with. Same with Cloudflare and Tor. Once you operate a security service anywhere near that scale, you start to understand there are inherent challenges and tradeoffs like these,


reCAPTCHA increasingly doesn't even give me a captcha. Instead, they simply deny me from even trying; They send this instead of the challenge:

  <div>
    <noscript>
      Please enable JavaScript to
      get a reCAPTCHA challenge.<br>
    </noscript>
    <div class="if-js-enabled">
      Please upgrade to a
      <a href="[1]">supported browser</a>
      to get a reCAPTCHA challenge.
    </div>
    <br><br>
    <a href="[2]" target="_blank">
    Why is this happening to me?</a>
  </div>
They probably don't like my non-standard user agent string and they definitely don't like that I block a lot of their spyware, but reCAPTCHA used to work properly for many years with the same/similar browser configuration.

[1] https://support.google.com/recaptcha/?hl=en#6223828

[2] https://support.google.com/recaptcha#6262736


I mean, if you don't want Google to track you, then you probably shouldn't use their browser...


I believe someone else in the thread stated it's cleared for incognito, don't remember if they meant it's not sent or that it's a new value.


Normally you would only expect to be identified and tracked when using Google services when logged in. The significance of this post is that they would be able to identify and track you across all your usage of that browser installation regardless of if you've logged out, or say in an incognito window.


Ah. So I was missing something. Thanks for clarifying. That is alarming.


Yes you are missing something important. Once they've tied the browser ID to your personal account they can track you across all google properties, even the ones that you didn't log into.


Unless you're running some extension that emulates FF's container tabs or something, it logs you into all G services. It would matter, though, if this header is still sent in incognito sessions.


I still don't understand. When I log into gmail, it logs me into all Google services. If I am worried about being tracked, surely my first mistake is logging in in the first place? Or visiting in the first place? After all, even if I click "log out," I'm only trusting Google that they unlinked the browser state from the account. If I trust them to do that, I don't see why I shouldn't trust them to ignore this experiment flag from Chrome, or at least not use it for tracking. If I don't trust them to avoid using the experiment state, I don't really see how you can trust them for anything.

Anyway, if you're not building Chrome from source, then you have to trust that they aren't putting anything bad in it. And if you are building chrome from source, you can observe that they only send this experiment ID to certain domains, and they already know who you are on those domains anyway.


>If I am worried about being tracked, surely my first mistake is logging in in the first place?

Good luck completing a google captcha without a Google account or using Chrome.


If you browse the internet, they could know what websites are visited by the same person, but not who they are exactly.

If you visit a load of websites, then also log into google, they connect the two and they know what websites were visited by you specifically.


he means they can continue to identify you after you log off


I think the argument is they have other methods like cookies they could also use. The fact you trust them not to use those methods extends to this form of tracking.


They key in the wording is: "If usage statistics and crash reports are disabled, this number is chosen between 0 and 7999 (13 bits of entropy)."

"If, statistics are disabled."

In chrome://version you can see the active variations. It seems to be pretty big numbers to be significant, and so far haven't observed duplicates.

Since this header is generated server-side, you have only to believe I guess ? Plus why Doubleclick would need it :)


That's basically saying "even if you opt out, we'll still try to track you, just not as much." Very unpleasant, but then again I'm not surprised to see this attitude from Google.


Combine a few pieces of information like this and you get a decisively unique fingerprint.

For example identifying individuals at work behind the same ip address.


How many people will actually run chrome with a cli flag? It would be pretty impressive if every single person reading this thread did, but it probably won't even be that. Most people don't even touch their settings.

13 bits of entropy is far from a uuid (but to get it to that you need to disable some more settings, which again very few people do), but it's still plenty good enough to disambiguate individuals over time.


And Google is certainly in a position to disambiguate that uuid to an individual as soon as they login to gmail or any other Google property!


Is there a reason for only sending this header to Google web properties and not all domains?


It is an abuse of Chrome's position in the marketplace. Google is using their powerful position to give themselves tracking capabilities that other online players can't access. It is a major competitive advantage for Google.


can't alternate browser makers who base on chromium simply disable that portion? like, I expect identifying users was a key business concern in moving Edge to Chromium. Is there something (other than work) preventing them from making it so it'll report back to microsoft-owned domains instead?


I'm using Vivaldi on MacOS and it doesn't send this header. I'm sure others like Brave don't send it either.


Is it because Google's webapps will have their own a/b tests which use experimental features only available in Chrome perhaps?

I mean personally I think they should do client-side feature detection and be back to being standards compliant and not creepy. The only reason why I'd consider such a flag is because they optimize the payload server-side to return a certain a/b test, but even with that they could do the default version first, do feature detection, and then set a session cookie for that domain only that loads the a/b test.

My other Thought was that they test a feature that is implemented across Google's properties, e.g. something having to do with their account management.


Isn't this what cookies are for?


Cross-site cookies are soon getting blocked by Chrome starting Chrome 80 if I'm right (whereas this header isn't)


So they build a personal back door to a feature that they've chosen to remove for everyone else? Because of it's potential for abuse, yet the very same company is somehow abusing it in a way more sinister way. Antitrust can't come soon enough.


Chrome will only block cross-site cookies that don't use HTTPS and the SameSite=Lax flag. It's easy for trackers to user HTTPS and SameSite=Lax. This Chrome change is mostly intended to protect against Cross Site Request Forgery (CSRF) attacks, not to block trackers.


I can think of a hundreds reasons why they do this. It doesn't make it right in any of those.


Err yeah, because it adds loads of data that can be used to track you.


Couldn't the Chrome installations receive a request from Google that says "Do you want to try out a new thing?", and couldn't the Chrome installations say yes with a certain probability? The only difference I can see is that the subset of users that are guinea pigs couldn't be the same in each test (if Google wanted that the subset is always the same).


So they're tracking people and using them as guinea pigs, the lack of respect for users is astounding.


How does one apply the “--reset-variation-state” flag on a chromebook?


So it’s just a poor excuse to send an evercookie.


Everybody imagine going back 15 years and tell yourself that you're using a web browser made by the parent company of DoubleClick. Your 15 year ago self would think you're a moron (assuming that 15 years ago you were old enough to know what DoubleClick was).


I always believed that tech-savvy people using Google Chrome are morons. It's the perfect blend of Google being evil trying to force it to everyone, the browser being dumbed down to masses so much it's missing the most basic features, and I guess privacy concerns too when using browser from advertising company.


That's a really great way of conceptualizing it, if you assume Google is basically DoubleClick at it's core (which I think makes sense).


Doubleclick ads were, originally, what prompted me to seek an adblock extension.

I think it was around 2006 that I got the extension for Firefox; Google bought them about a year later.


Well, it depends. Do I get a funny animation following my cursor if I do it?


Kind of true. The whole internet was much more of a toy back then. Tracking was not viewed so maliciously as now. Heck I might have even been convinced by a hard sell "this will help your favorite sites maximize their revenue".


I can only speak for myself, but myself from 15 years ago would not have cared so strongly about the choice of browser. I believe I was using the newly-ad-less Opera at the time, and new/cared little about the company making it.


My 15 year ago self would have taken a double helping of DoubleClick if my only choices were that or Internet Explorer 6.


Firefox and Opera existed at the time.


Do you remember what browsers were like when Chrome came out? I switched from Firefox the day it came out, and was very happy with performance and UI.


I don’t use Chrome. Never have, never will. Why do you?


TL;DR I think whoever posted that is trying to bury the UA anonymizing feature by derailing the discussion.

What I'm seeing is an RFC for anonymizing parts of User-Agent in order to reduce UA based fingerprinting, which improves everyone's privacy, that's a good thing!

Then I see someone comments how that could negatively impact existing websites or Chromium-derived browsers, comments which are totally fair and make an argument that may not be a good idea doing this change because of that.

Then someone mentions the _existing_ x-client-data headers attached to requests that uniquely identify a Chrome installation. Then a lot of comments on that, including here on HN.

To me that's derailing the original issue. If we want to propose that Chrome remove those headers we should do so as a separate issue and have people comment/vote on that. By talking about it on the UA anonymizing proposal we are polluting that discussion and effectively stalling that proposal which, if approved, could improve privacy (especially since it will go into Chromium so then any non-Chrome builds can get the feature without having to worry about x-client-data that Chrome does).


I think the concern is that this disarms Google's competitors while keeping them fully-armed.

Ads are a business, and they are Google's business. They are how they make money. And like all businesses, they are competitive. Tracking is a way to make more money off online advertising. By removing tracking from their competitors while keeping it for themselves, Google stand to make a lot of money off this change.

Their motivations are not honest, but they're pushing them as if this is the high road. It isn't. It's the dirty low road of dominating the online ad business, made possible by their dominance in the browser market. And it's always been the end-goal of Chrome browser.


I think this is a common strategy of big players at any industry.

First, they do some dirty thing to gain a competitive edge when the industry is still new and unregulated. Later they develop an alternative way to achieve the same competitive edge, and then criticize other players for doing an old way, saying they should be "mature and responsible".


See also first world countries industrializing/modernizing & becoming rich/lifting people out of poverty using industrial techniques that pollute heavily, then "going green" and criticizing other players (India, China) for doing the same thing, saying they should be "mature and responsible".


Not really. "Going green" is a radical new concept for humanity that goes counter to all incentives and instincts and only recent developments have shown that painful measures are necessary. It was not a trick to get rich at other peoples expense.

India and China are suffering from their own pollution and have incentives to "go green" all by themselves, not because the West demands it.

Green technology is often high tech and tech that is accepted in Western markets and is helping to lift people out of poverty through market mechanism, not finger pointing.

Finally the first world got rich several generations ago. We are not related in any way, shape or form to any real or perceived sins of our grandfathers. Any such idea is old testament biblical theology.


"Going green" is a relatively new concept, even for Western countries. By 1960 the West was seriously polluting its environment, but also extremely affluent and highly developed. There was far more of a gap between the average American and someone in Asia or Africa compared to today. The West polluted the developing world in the same way it polluted the Hudson or Cuyahoga.

It's not really about pulling up the ladder, but a recognizing that growth doesn't have to mean completely destroying the planet. It's more about where the standards are set and what is socially acceptable or understood.


> It's not really about pulling up the ladder, but a recognizing that growth doesn't have to mean completely destroying the planet.

It's not recognizing anything, because it hasn't been demonstrated that growth is still compatible with not completely destroying the planet. It's deciding to not destroy the planet, and if that means pulling up the latter then golly gee sucks to be anyone who isn't up yet. We sure hope there are other ways up, but that's not much consolation.


They should be mature and responsible, the west should have been, too, and has a long way to go.

We both are probably distant cousins due to our relation through Ghengis Khan, but that doesn't mean I should be bitter that I can't make my fortune by pillaging half the world like my distant ancestor did to great effect. It might be easy to make a fortune pillaging, but that doesn't mean I deserve to pillage because someone else did (or does), and I think I am in the right in detesting the scattered bands of warlords left in this world who do make their living by pillaging.


Apples to oranges: unlike software ecosystems which come and go, we’ve only got a single real one!


Just yesterday I had to disable anti fingerprinting I'd enabled in Firefox because despite having a solid IP and and existing cookies to login to Google, it's security system rejected me, even after answering security questions. Turn off fingerprinting and I could log in.

So, this is a round about way of agreeing with the hidden dark patterns that Google are bringing to the web. It must stop.


I have to log into Gmail just to pass captchas. Every time I do it I die a little inside.


All the more reason to keep bad actors in containers isolated from the rest of your web browsing. Google can fingerprint me all they want if that gets their rocks off, all they'd see is my gmail inbox that they see anyway.


Much of such discussions demonize the company, but we need to look broader. Google is a public company and its shareholders, since they share the company, are also to be pointed out. Discouraging such behaviour is better done by the shareholders by dumping shares since Google could very well argue that if it didn't work to maximize ad revenue, it would not be operating according to fiduciary responsibility principles. (IANAL .. just thinking out loud)


That is such short term thinking.

Doing unethical things because "We had to so the shareholders would make money" is such a cop-out. I see it just the opposite way. You have a duty to do things ethically so that in the long run customers continue to want to use your product. So that governments don't start going after you for the unethical things you do. So that other businesses will trust you and continue to work with you.

Here's an example: Huawei. They've reached out to me saying they'll pay me more than my employer and my commute will be shorter. No effing way. I'm sure I could make them a lot of money, but they're history of unethical behaviour is an instant deal-breaker for me. Others will, sure, but in the market of labor they're going to have a reduced supply because I'm surely not alone in this attitude.


I'm with you on the shareholders being complicit in the behaviour (through ignorance or inaction in a lot of cases), but unfortunately I'd guess 90% of said shareholders wouldn't be aware of the scummy tactics Google have undertaken, similar to Microsoft I'd say, outside of the IT/HN realm.

It's unfortunate. Profit of their shares is the only thing a lot of people look at (and willfully ignore anything else unless it slaps them in the face/becomes a major mainstream media event).


"I think the concern is that this disarms Google's competitors while keeping them fully-armed."

Pretty sure that was their main reason for helping push https-everywhere. A good idea generally, but hurt every other entity trying to do tracking more than it hurt Google.


> while keeping them fully-armed.

That's sort of a fragile assumption though. I mean, yes, there's enough specificity in this number that it could be used (in combination with other fingerprinting techniques) to disambiguate a user. And yes, only Google would be capable of doing this. So it's abusable, in the same way that lots of software like this is abusable by the disributor. And that's worth pointing out and complaing about, sure.

But it's not tracking. It's not. It's a cookie that identifies the gross configuration of the browser. And Google claims that it's not being used for tracking.

So all the folks with the hyperbole about user tracking for advertising purposes need to come out with their evidence that Google is lying about this. Occam says that, no, it's probably just a misdesigned feature.


> Google claims that it's not being used for tracking

> Occam says that, no, it's probably just a misdesigned feature.

Allow me to introduce to you "mabbo's razor": If someone can make money by doing X and it's impossible for anyone to tell whether or not they are doing X, then they are probably doing X or else will as soon as you believe they won't.


While I agree with some of your comment, I feel like it’s harsh to paint the whole chrome enterprise with that brush. Chrome was about freeing the world of a truly terrible web browser and a lot of devoted devs have spent a lot of time working on it. There’s an advertising aspect that it’s right to call out, but I think on the whole it was done to make the internet better, because the internet is google’s business too.

EDIT I just wanted to point out that a load of people have poured their lives into making Google Chrome the amazing bit of software that it is and suggesting that the end-goal has been entirely about supplying ads does a great disservice to their personal contributions.


These aren't mutually exclusive things. The people working on Chrome were and are highly motivated, intelligent and passionate people, some of whom I call friends, who want to see the web become a better place. In that regard they have succeeded massively.

But by this point, Google has dropped billions of dollars on salaries for those developers to build Chrome (call it >500 devs, >$200k salaries, >10 years). Google is not a charity. They didn't build Chrome with the intent to lose money on it. Everything else Google made that wasn't profitable is gone now, and yet here Chrome stands. Because it is an indirect profit center.

And you've pointed out the real issue: Chrome was about freeing the world of a truly terrible web browser. 'Was'. But it did that! So what is it about now? Why would Google continue to pour money into it if they didn't expect to extract more money out of it in the future?

You can make the world better and make money while doing it. Ideally, that's what we all are doing.


It wasn’t some noble mission to free the world. Chrome was always about Google controlling the client side of the web to guarantee their advertising access to web users. The ability to extract additional data from the user was a nice bonus.


The way I see it, both of these can be (and most likely are) true. Intentions of the company aren't usually the same as intentions of individual contributors (or even whole teams). The Web is Google's business - the more stuff happens on the Web, the more money they can eventually make of it. Advertising is how they make most of that money, so this is what they're protecting. But beyond that, Chrome answered a real need and a lot of hard-working people made it into a best-in-class browser.


"Chrome was about freeing the world of a truly terrible web browser "

Chrome is about establishing more control over the web to further the business objectives of Google and Alphabet.

The problem with this belief of Google as some kind of 'benevolent actor' is a function of the new kind of branding they helped introduce, something that an entire generation of particularly young people are being duped by.

'Brand' used to be the image that companies presented - it was a decision, a marketing tactic, usually invented by agencies. Google was one of the first to change that, to effectively 'internalize' the brand so that they (staff, even leaders) really kind of believed their own kool-aid. There's an incredible aura of 'authenticity' to this; when leaders really believe their own schtick, it rings more powerfully. (This is an issue for another thread.)

But Google has proven that in the long run, they're just a regular company. I don't think they are bad actors, and in the big picture, they're better than most. But, they're just a self-interested entity: they will do whatever is in their power and which is also legal, to leverage their incumbency and stymie competition.


> The problem with this belief of Google as some kind of 'benevolent actor'

You put 'benevolent actor' in quotes as if the comment you are replying to contained that. It didn't.


Stress quotes. That is just one of the possible devices to achieve that.

I see a lot of that here, people misunderstanding basic speech/writing conventions. Maybe giving the op the benefit of doubt, assuming s/he knows what s/he is doing, can help avoid some of those.


>which improves everyone's privacy, that's a good thing!

Except it does not affect Google, because Google has this install ID to use both for tracking and preventing ad-fraud.

Which means Google competitors are terribly disadvantaged, as they cannot use that.

Which not only reduces market diversity (contrary to TAG philosophy) but represents a significant conflict of interest for an organization proposing a major web standard change.

These issues are very relevant to the original proposal, especially in light of the fact that Noone outside of Google is terribly interested in this change. Any time a dominant player is the strongest (or only) advocate for a change that would coincidentally and disproportionately benefit its corporate interests, the proposal should be viewed very skeptically.


> Except it does not affect Google, because Google has this install ID to use both for tracking and preventing ad-fraud.

So when Apple releases a privacy feature, that doesn't affect them as a business, we praise the feature or we say "except it doesn't affect Apple" and somehow try to argue how the feature is less valuable because of that?


Of course we'd say "except it doesn't affect Apple"...

If there's a privacy gap, (and Apple is actively exploiting that gap)

When Apple patches it, (while leaving it open for themselves)

They'll get called out.


Apple is not engaged in illegal data harvesting to gain a competitive advantage over other services in the same space. Google's collection of personal data with the x-client-data header without user consent is illegal under GDPR.


This relies on the (unfounded) assumption that this pseudonymous ID is being used for tracking purposes and that Google is actively lying about it.


GDPR treats an IP address as personal data. The data is not transmitted through an anonymizing network, so Google has access to the user's IP address when they receive the data.

Anything that is associated with personal data also becomes personal information, therefore Google is transmitting personal data without user consent, which is illegal.

Asking for consent is not required under GDPR when the data collection is needed for a service to function. This is not the case here, Google services function without receiving that header, the data is used by Google to gain a technical advantage over other web services.


> GDPR treats an IP address as personal data.

No it doesn't. GDPR only treats IP address as personal data if it is associated with actual identifying information (like name or address). Collecting IP address alone, and not associating it with anything else, is completely fine (otherwise nginx and apache's default configs would violate GDPR), and through them basically every website would violate GDPR.

Edit: and furthermore, even if it did (I see conflicting reports), if you collect IP Address and another pseudonymous ID and don't join them, the ID isn't personal data.

IOW, the theoretical capability to make changes to a system to use info in a non-GDPR compliant way doesn't make the information or system noncompliant. You actually have to do the noncompliant things.


An IP address is itself personal data, it does not have to be associated with other personal data.

https://ec.europa.eu/info/law/law-topic/data-protection/refo...

> Collecting IP address alone, and not associating it with anything else, is completely fine (otherwise nginx and apache's default configs would violate GDPR), and through them basically every website would violate GDPR.

See my comment about consent not being required when the data is needed to provide a service. Logging is reasonably required to provide a service.

> and furthermore, even if it did (I see conflicting reports), if you collect IP Address and another pseudonymous ID and don't join them, the ID isn't personal data.

The transmission of data is already covered by GDPR, you don't have to store the data to be bound by the law.


See my edit. There's conflicting information on this. A dynamic IP, for example, isn't directly related to or relatable to a specific natural person without other context.

But even if that's the case, if you don't tie the pseudonymous ID to the IP, it isn't personal data. As far as I can tell, the transfer rules you reference are about transferring data out of the EU, and can be summarized as "you can't transfer data to a non-EU country and then process it in a way that violates the GDPR". Article 46 notes that transferring data is fine as long as appropriate safeguards are in place[1], and article 47[2] defines what constitutes those safeguards (in general, contractually/legally binding agreements with appropriate enforcement policies).

This goes back to what I said before: The theoretical capability to do noncompliant things doesn't make a system GDPR-noncompliant. You have to actually do noncompliant things to not comply.

[1]: https://gdpr-info.eu/art-46-gdpr/

[2]: https://gdpr-info.eu/art-47-gdpr/


> > and furthermore, even if it did (I see conflicting reports), if you collect IP Address and another pseudonymous ID and don't join them, the ID isn't personal data.

> The transmission of data is already covered by GDPR, you don't have to store the data to be bound by the law.

This cannot be the actual correct interpretation of the GDPR, because under this logic _all_ IP packets on the public internet (made by/to EU citizens) are covered by the GDPR because you are transmitting data alongside an IP address.


To help other readers:

"The European Commission maintains this website to enhance public access to information about its initiatives and European Union policies in general."

https://ec.europa.eu/info/law/law-topic/data-protection/refo...

"Home > Law > Law by topic > Data protection > Reform > What is personal data?"

"Examples of personal data

...

- an Internet Protocol (IP) address;"


There has been an EU court ruling on this exact question of whether dynamic IP addresses count as personal data even in contexts where the website operator in question does not have the means to associate it with an individual but another party (such as an ISP) does. The Court of Justice of the European Union has ruled on this and it does count as personal data. [1]

Furthermore, GDPR itself specifically refers to online identifiers in Article 4 as falling under the definition of personal data[2] and then clarifies in Recital 30[3] that IP addresses count as online identifiers in this context. There seems to be no legal ambiguity in the EU on this topic at this point, but I would be not surprised to see parties who are not GDPR compliant pretend otherwise indefinitely.

[1] https://curia.europa.eu/jcms/upload/docs/application/pdf/201...

[2] https://gdpr-info.eu/art-4-gdpr/

[3] https://gdpr-info.eu/recitals/no-30/


Interesting, TIL. That doesn't change the major point I was making though, which is that an anonymized identifier (such as the 13-bit ID under discussion) isn't personal info, even if it might have originally been collected along side data which is personal info. If I give you said 13 bit ID, you need other info to back out a single person, the anonymous ID corresponds to multiple IPs.


I think you're still missing the point. Google transmits personal data to their servers without user consent. The value of x-client-data is personal data, because it is associated with an IP address during transit, due to how HTTP requests work. The nature of the data, what is being done with it on the server, and the location of the server are all irrelevant in this instance, the only important part is that personal data has left the browser in the form of a request, and it reached a Google server.

This data collection would only be exempt from GDPR if the data would be required for the service to function, but that is not the case with x-client-data.


> The value of x-client-data is personal data, because it is associated with an IP address during transit, due to how HTTP requests work.

This is not correct. The x-client-data is not personal data. x-client-data associated with an IP address is personal data. As soon as you separate the client-data from the IP, the client data stops being personal data. IOW, the tuple (x-client-data, IP) is personal data. But x-client-data on its own isn't personal data, because it cannot be used to infer the IP on its own.

I don't know where you're getting this "if two pieces of data ever touch and one of them is personal data the other one is now also contaminated as personal data". It's not true. That would make the existence of anonymous data (which the GDPR specifies as a thing) practically speaking impossible to have on the web, since all requests are associated with the IP on receipt. (or actually even worse, it would make the process of anonymizing data impossible in general, since the anonymization process associates the anonymized data with the original personal data).

To be precise, the GDPR defines anonymized data as "data rendered anonymous in such a way that the data subject is not or no longer identifiable.". The x-client-data header is exactly that. The subject of the header is not identifiable by the x-client-data header alone. Therefore the header is anonymous and not subject to strong GDPR reqs.

For the client data header to be personal data, you'd need to describe a scheme such that, given an x-client-data header, and only an x-client-data header, you could identify one (and only one) unique person to whom that header corresponds. You're welcome to come up with such a scheme, but my intro CS classes taught me that bucketed hashing is irreversible, and with 8192 buckets, you're not going to be able to uniquely identify anyone specific.


The Chrome whitepaper is written in a way to make you believe there is only 8000 possibilities.

But read carefully what they say; they say there is only 8000 possibilities if the crash reporting functionality is disabled (not by default).

Otherwise the marker is a huge differentiator (I haven't seen any duplicates personally)


> That would make the existence of anonymous data practically speaking impossible to have on the web

For almost every type of data that is true. Transforming or substituting data doesn't make it anonymous; the patters in the data are still present. To produce actually anonymous data you have to do what the GDPR instructed: corrupt the data ("rendered anonymous") severely enough that the "data subject is ... no longer identifiable". You need to do something like aggregate the data into a small number of groups such that individual records no longer exist. Techniques like "differential privacy" let you control precisely how "anonymous" your data is by e.g. mixing in carefully crafted noise.

> 8192 bucket

While others have pointed out that this isn't actually limited to 13 bits of entropy for most people, there are at least two reasons that field is still very personally identifying. First, "x-client-data on its own" never happens. Google isn't wasting time and money implementing this feature to make an isolated database with a single column. At no point will the x-client-data value (or any other type of data they capture) ever sit in isolation. I used the IPv4 Source Address as an example because it will necessarily be present in the header of the packets that transport the x-client-data header over the internet. Suggesting that Google would ever use this value in isolation is almost insulting to Google; why would they waste their expensive developer time to create, capture, and manage data that is obviously useless?

However, lets say they did make and isolated system that only ever received 13 bit integers stripped of all other data. Surely that wouldn't be personally identifiable? If they store it with a locally generated high resolution timestamp they can re-associate the data with personal accounts by correlating the timestamps with their other timestamped databases (web server access logs, GA, recaptcha, etc).

> you'd need to describe a scheme such that, given an x-client-data header, and only an x-client-data header, you could identify one (and only one) unique person to whom that header corresponds

You should first describe why google would ever use that header and only that header. Even if they aren't currently using x-client-data as an identifier or as additional fingerprintable entropy, simply saving the data gives Google the option to use it as an identifier in the future.

[1] https://www.youtube.com/watch?v=pT19VwBAqKA https://en.wikipedia.org/wiki/Differential_privacy


> You need to do something like aggregate the data into a small number of groups such that individual records no longer exist. Techniques like "differential privacy" let you control precisely how "anonymous" your data is by e.g. mixing in carefully crafted noise.

Correct, and another anonymization technique (in place of differential privacy) is k-anonymity. In k-anonymity schemes, you ensure that in any given table no row corresponds to any fewer than k individuals. Why is this useful? Well let's say you have some, say, 10-15 bit identifier. You can take a request from a user that contains information that might when combined, be identifying. Say: coarseish location (state/country), device metadata (browser version, OS version), and coarse access time (the hour and day of week). Combining all 3 (or 4 if you include the psuedonymous ID) is enough to uniquely identify at least some users. Then let's say you also track some performance statistics about the browser itself.

But any single piece of data (plus the pseudonymous ID) is not enough to identify any specific user. So if you use the psuedonymous ID as a shared foreign key, you can join across the tables and get approximate crosstabs without uniquely identifying any specific user. Essentially, if you want to ask if there are performance differences between version N and version N+1, you can check the aggregate performance vs. the aggregate count of new vs. old browser version, and with 8K samples, you're able to draw reasonable conclusions. And in general you can do this across dimensions or combinations of dimensions that might normally contain enough pieces of info to identify a single user.

This is essentially the same idea as differential privacy, although without the same mathematical precision that differential privacy can provide. (By this I don't mean that the data can be re-identified, just that differential privacy can be used to provide tighter bounds on the anonymization, such that the statistical inferences you can gather are more precise. k-anonymity is, perhaps, a less mathematically elegant tool).

Specifically, I'm describing k-anonymity using x-client-data as a Quasi-identifier in place of something like IP or MAC address. You can find those terms in the "See Also" section of the differential privacy wiki page you linked. Google is mentioned in those pages as a known user of both differential privacy and k-anonymization in other tools.

Hopefully that answers your question of why Google would want such a thing.

> simply saving the data gives Google the option to use it as an identifier in the future.

Yes, but that doesn't mean that they're currently in violation of the GDPR, which is what a number of people keep insisting. I'm not claiming that it's impossible for Google to be doing something nefarious with data (although I will say that in general I think that's an unreasonably high bar). Just that the collection of something like this isn't an indication of nefarious actions, and is in fact likely the opposite.


This is the equivalent of a protest, people are objecting to Google's illegal data harvesting practices in places that receive engagement, since that's the most effective way to get the word out and warn others.

Google's reasoning that this is not personal data is meaningless in the face of GDPR, which considers an IP address personal data. Google has access to the IP address when they receive the data, therefore they are transmitting personal information without user consent and control, which is illegal.


It could be argued that a similar violation is present (since March 2019) in Chromium for the Widevine CDM provisioning request, see https://github.com/bromite/bromite/issues/471

Basically all users opening the browser will contact www.googleapis.com to get a unique "Protected Media Identifier", without opening any web page and even before any ToS/EULA is accepted (and there is no user consent either).


I think the Widevine CDM request is needed for the service to function, though they could certainly delay it until a website requires DRM. GDPR allows the use of personal data without consent when it is required to provide a service for the user.

The personal data collected with the x-client-data header is not required for Google sites to function. Google uses the data to gain a technical advantage over other sites on the web, this is why the data collection in this case requires consent.


Whether consent is legally required or not, as a user I want that service, whatever it is, to not work until I consent to the exposure of my personal data. Given that it apparently has something to do with DRM, I would be disabling the service anyway.


> Whether consent is legally required or not

Lets not guess it, lets file a complaint, and see if we can get Google sued for n billions of euros.


The poster is the author of Kiwi browser, which unfortunately is closed source [0], but I have reason to believe he is familiar - as I am for the Bromite project - with all the (sometimes shady) internals of the Chromium codebase; it is indeed off-topic to discuss the header issue there but I would say that there is no explicit intention to derail it (and no advantage), just incorrect netiquette.

[0]: https://github.com/kiwibrowser/android/issues/12#issuecommen...


The Google employee argues that through UA-CH Google wants to disincetivise "allow" and "block" lists.

After many years of testing HTTP headers, IMO this really is a non-issue. Most websites return text/html just fine without sending any UA header at all.

What is an issue are the various ways websites try to coax users to download, install and use a certain browser.

Another related issue with Google Chrome is users getting better integration and performance when using Chrome with Google websites than they would if they used other clients. ^1 Some make the analogy to Microsoft where it was common for Microsoft software to integrate and perform better on Microsoft Windows whereas third party software was noticably worse to integrate and perform on that OS.

This leads to less user agent diversity. Users will choose what works best.

UA diversity is really a more important goal than privacy, or privacy in Chrome. The biggest privacy gains are not going to come from begging Google to make changes to Chrome. They could however come from making it easier for users to switch away from using Chrome and to use other clients. That requires some cooperation from websites as well as Google.

Those other clients could theoretically be written by anyone, not just large companies and organisations that are dependent on the online ad sales business. It would be relatively easy to achieve "privacy-by-design" in such clients. There is no rule that says users have to use a single UA to access every website. There needs to be choice.

For example, HN is a relatively simple website that does not require a large, complex browser like Chrome, Safari, Firefox, etc. to read. It generates a considerable amount of traffic and stands as proof that simpler websites can be popular. Varying the UA header does not result in drastic differences in the text/html returned by the server.

1. Recently we saw Google exclude use of certain clients to access Gmail.


https://cs.chromium.org/chromium/src/components/google/core/...

Just thinking out loud.

What happens, let's say, if someone malicious buys youtube.vg and puts a SSL certificate on it ? Will they be able to collect the ID ?

I guess so ?


Yes, but they would also need a valid TLS certificate?

A country's government could also take over the TLD and grab its traffic overnight.


The original issue is supposedly fingerprinting and privacy related.

If that's true then Google should be called out for their poor behaviour.


As long as web developers continue to create (app-)sites that only work in the latest versions of Chrome(and Chromium-ish) browsers, giving users little effective choice over what browsers they can use, this sort of abusive behaviour will continue. The sort of "feature-racing" that Google engages in is ultimately harmful for the open web. Mozilla struggles to keep up, Opera surrendered a while ago, and more recently, Microsoft seems to have already given up completely.

I feel like it's time we "hold the Web back" again. Leave behind the increasingly-walled-garden of "modern" appsites and their reliance on hostile browsers, and popularise simple HTML and CSS, with forms for interactivity, maybe even just a little JavaScript where absolutely necessary. Something that is usable with a browser like Dillo or Netsurf, or even one of the text-based ones. Making sites that are usable in more browsers than the top 2 (or 1) will weaken the amount of control that Google has, by allowing more browsers to appear and gain userbases.


This proposal would not accomplish what you intend. By slowing the adoption of open web technologies, developers and users would lean more heavily on mobile apps, which are also under Google's control considering Android's huge market share.

Developers who want to level the playing field need to develop sites that fully support Firefox and other browsers that are not based on Chromium. Users who want to see a more open web need to use Firefox and non-Chromium browsers, and complain to developers who don't properly support them.


I'm talking about the vast majority of things people use websites for, which do not need a webapp much less a mobile app.


I wish, but that's not what most people want. Hell, it's not what designers want. Thinking back to the Myspace days, people would have the worst websites imaginable. Granted, that was all done with little more than HTML and javascript, but I have little doubt what they would have done with things like HTML5 and even more javascript.


I have to agree with this.

The last decade or so has really reinforced to me that we all ignore or are ignorant of fundamental structural problems with most of the systems we rely on - with us wanting them to "just work."

We're all guilty of this, we just see it up close for the things that we're building and chide others who don't care. Meanwhile we ignore other fundamental structures of modern society.


There's got to be a balance between every website looking exactly the same and fading out of my memory with one identical hamburger menu after another and dancing babies on geocities.


Are there really that many popular extensions not available on Firefox? I may be just one anecdote, but I think I'm pretty typical, and I've found the transition to Firefox to be quite pleasant, and uneventful.


Popular - no. Essential - yes. Case in point - my bank (top 5 in my country) which uses Chrome plugin for security purposes, you need it to create digital signature. So once a year I HAVE to install Chrome (key expires every year) and then delete it. I've also found at least one payment processor not working in Firefox, my city portal for public transport and several small sites. The worrying thing is the trend - with Firefox share dropping below 10% recently it will be abandoned more and more.


In those cases, have you tried IE instead rather than installing Chrome?


Installing Chrome was strictly needed only for banking plugin. Didn't have a chance to check yet with a new Chrome-Edge but will definitely try it.


Firefox is really good.

My issue is with certain sites that typically either uses non standard Javascript apis that only work in blink or relies on non standard behavior of standard components (numeric form inputs was mentioned here yesterday).


It doesn't happen often but sometimes, when a website doesn't work, I switch to chrome and it works there.


HTML is not enough. It’s why templating languages / libraries were invented, and it’s why SPA’s are so popular. There’s a difference between “sites” and applications. The web has been trending toward supporting applications more and more for a very long time.

The only thing that will make people who want to preserve the content-web happy is if we split the protocols somehow, and that will never happen. This is not likely to change ever.


I havent had js on by default in years. Using a js enabled browser is a drastically worse experience.

suckless surf lest you enable js with a hotkey on a per-process basis if you really want it for something, but 90% of the time, I just close the tab that wants to waste my time.


I think we at HN have a particular responsibility to keep the web free and open. This really is an arms race and only those of us building the tech have the power curtail FAANG's overreach. It might me time to choose a side and firmly push your work toward open web friendly tech.


> [...] this sort of abusive behaviour will continue.

Can you elaborate what exactly is abusive behavior?

> [...] reliance on hostile browsers, [...]

What exactly is a hostile browser?


What is mentioned in the title of this article.


What article? The link is a github issue. And it's not like you referenced anything of that anyway. It's more like it just triggered you to output a general rant. So again: Care to elaborate?


> Making sites that are usable in more browsers than the top 2 (or 1) will weaken the amount of control that Google has

You do realize/remember that Google is also a search-engine company, one that only stands to benefit (in terms of increased capability of advertising targeting) from a web that's simpler, and therefore more machine-legible.


I’m not so sure about that. Google has the resources, a simpler web makes it easier for competitors, seems like google is already quite competent at machine reading just about everything, even sometimes things that you can’t fond/visit. Domination by web-apps is the equivalent to widening the moat.


It's fine for Google to benefit from things that everyone benefits from.


Credits to the ungoogled-chromium project [0] for the patch [1] which is also used in Bromite since 15 February 2018 to prevent this type of leaks; see also my reply here: [2]

[0]: https://github.com/Eloston/ungoogled-chromium

[1]: https://github.com/bromite/bromite/blob/79.0.3945.139/build/...

[2]: https://github.com/bromite/bromite/issues/480#issuecomment-5...


You can see all the domains they add the header to here: https://chromium.googlesource.com/chromium/src/+/master/comp...

Previous discussion: https://news.ycombinator.com/item?id=21034849



This seems like a cut-and-dry case of getting caught in monopolistic behavior. The code is right there. The Chrome codebase has special features for Google’s own web properties.

I hope all these AGs suing google have some good tech advisors. It’s hard to keep track of all the nefarious things google has been up to over the past decade.


Perhaps you can send a summary to them, including the evidence?


> This seems like a cut-and-dry case of getting caught in monopolistic behavior. The code is right there.

???

Is "Darn, their browser only gets to track me on their own websites; if Google were playing fairly, they'd send the tracking header to all websites so I can be tracked more and have less privacy" the argument you're making here?

And it's debatable that this header is actually serving a tracking purpose at all. Being limited to their own web properties cements it as a diagnostic to me. What use is a tracking header that only gets sent to domains they already know you're visiting?


You realize that whenever a user visits a page that uses AdWords, AdSense, or login via Google, they download a script file from one of those domains, right?

So a user can log into Google and then log out, tying that header data to whatever PII Google has attached to them, and future visits to any sites using those and probably other services can be attached to the individual, despite them having intended to be logged out of Google services.


All I’m saying is the optics are not good. This is the kind of code you could show a jury. A high schooler who took “intro to CS” could understand what it’s doing.

It’s literally a conditional attached to a list of strings comprised solely of google advertising domains and hosts that distribute scripts from those domains.

When you’re talking about anti-trust, it doesn’t look good. Will this be a nail in the coffin? Unlikely. Will it help Google with its legal trouble? Definitely not.


Security flaw? Surely some entity is squatting youtube on some TLD?!

If there is a country TLD of X where Google owns google.X but entity Y owns youtube.X then entity Y gets the X-CLIENT-DATA header information. See usage of IsValidHostName() in code.


Note this would be a privacy flaw which is not covered by the Chrome Rewards program (which only covers security flaws) so I haven’t bothered logging it as a bug since I don’t want to waste my time verifying it for nothing!

https://chromium.googlesource.com/chromium/src/+/master/docs...


like youtube.vg that is available ?


According to this source code [0], it looks like this is in Chromium as well. Does that mean this affects Electron applications?

[0]: https://chromium.googlesource.com/chromium/src/+/master/comp...


Electron maintainer here. Electron does not send this header.


Thanks for clarification.


Edge ("Edgium") doesn't appear to send this header. Neither does Chrome in Private or Guest Mode.


Checked that Vivaldi doesn't seem to be sending this header.


If you strace chrome on linux it also picks up /etc/machine-id (or it did back when I looked), which is a 32 byte randomly generated string which uniquely identifies you and on some systems is used as the DHCP ID across reboots.


First I thought reading /etc/machine-id would be expected if Chrome uses D-bus or pulseaudio libraries which depend on D-bus, and /etc/machine-id is part of D-bus. But no, they really use it for tracking purposes.

And in a sick twist they have this comment for it:

  std::string BrowserDMTokenStorageLinux::InitClientId() {
    // The client ID is derived from /etc/machine-id
    // (https://www.freedesktop.org/software/systemd/man/machine-id.html). As per
    // guidelines, this ID must not be transmitted outside of the machine, which
    // is why we hash it first and then encode it in base64 before transmitting
    // it.


In fairness, the guidelines they reference suggest you do exactly what the comment says they're doing (assuming they're keying the hash). The guidelines seem explicitly written with the idea that unique identifiers _derived from_ this value are not similarly quarantined, provided that you cannot take the derived value and "reverse" it back to the original identifier.

Quoting from https://www.freedesktop.org/software/systemd/man/machine-id....:

This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network. If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly. Instead the machine ID should be hashed with a cryptographic, keyed hash function, using a fixed, application-specific key. That way the ID will be properly unique, and derived in a constant way from the machine ID but there will be no way to retrieve the original machine ID from the application-specific one.


> Instead the machine ID should be hashed with a cryptographic, keyed hash function, using a fixed, application-specific key.

Reading https://cs.chromium.org/chromium/src/chrome/browser/policy/b..., I do not not see it being hashed with a key, just unkeyed sha1.


I think it doesn't make much sense to protect it because in popular Linux distributions an unprivileged user can access such identifiers as MAC addresses of network interfaces, HDD serial numbers etc.

> If a stable unique identifier that is tied to the machine is needed for some application,

Ideally there should be no stable identifiers accessible to untrusted applications.


Now go and read http://jdebp.uk./Softwares/nosh/guide/commands/machine-id.xm... and RFCs 3041 and 4941.


What else is going to break if one randomises that ID (per boot or per hour, say)?


What about running Chrome inside a container?


What about not running Chrome?


> which is why we hash it first and then encode it in base64 before transmitting it.

This made me chuckle. "As per the rules, we'll put on a boxing glove before we punch your lights out". You wont get privacy, but at least there is some security!


> As per the rules, we'll put on a boxing glove before we punch your lights out

This also made me chuckle


"Tracking purposes" is such a weasel word, when we're really talking about device management in an enterprise setting, and this code only gets activated if the root/administrator user has installed a token file on your computer.


That really is a cynical comment. It almost bothers me more than this header.


Which (among many other things) can be faked with firejail, if you absolutely have to run Chromium (e.g. for testing):

    --machine-id
        Spoof id number in /etc/machine-id file - a new random id is generated inside the sandbox.
    
        Example:
        $ firejail --machine-id


Chromium doesn't seem to read that file.


When puppeteer first came out I was nervous to use it for scraping because I could totally see Chrome pulling tricks like this to help recaptcha in identifying the bots. I’m still not convinced they aren’t.


firefor / tor also read this file


What does tor do with it? Maybe pass it along in packet timing intervals, or something ... ;o)



True, more precisely - 16 bytes, 32 hex characters. Your link is in agreement "The machine ID is usually generated from a random source during system installation or first boot and stays constant for all subsequent boots." And See https://wiki.debian.org/MachineId at least one distro uses it for the DHCP ID.


"At least one distro" is not correct either. It's used by systemd-networkd, specifically.

* http://jdebp.uk./Softwares/nosh/guide/commands/machine-id.xm...


Now you are nitpicking. Your new link says exactly this “ This broadcasts the machine ID (hashed with a known fixed salt) over the LAN as the unique client identifier part of the DHCP protocol. (Other DHCP clients tend to use MAC addresses for this.) It also broadcasts the machine ID locally on each link as part of Ethernet LLDP, if enabled.”


It is far from nitpicking to point out the gross inaccuracy of conflating one particular software with an entire operating system. systemd-networkd is not Debian.


Nobody is conflating these two things, you are interpreting it that way.


And this is a legal thing to do?


I'm surprised this hasn't gotten any mainstream tech press attention. Chrome's Privacy Whitepaper describes a number of privacy-questionable nonstandard headers which are only sent to Google services. Just try searching for X- here:

https://www.google.com/chrome/privacy/whitepaper.html

And for ease of reading, a few others:

> On Android, your location will also be sent to Google via an X-Geo HTTP request header if Google is your default search engine, the Chrome app has the permission to use your geolocation, and you haven’t blocked geolocation for www.google.com (or country-specific origins such as www.google.de)

> To measure searches and Chrome usage driven by a particular campaign, Chrome inserts a promotional tag, not unique to you or your device, in the searches you perform on Google. This non-unique tag contains information about how Chrome was obtained, the week when Chrome was installed, and the week when the first search was performed. ... This non-unique promotional tag is included when performing searches via Google (the tag appears as a parameter beginning with "rlz=" when triggered from the Omnibox, or as an “x-rlz-string” HTTP header).

> On Android and desktop, Chrome signals to Google web services that you are signed into Chrome by attaching an X-Chrome-Connected and/or C-Chrome-ID-Consistency-Request header to any HTTPS requests to Google-owned domains. On iOS, the CHROME_CONNECTED cookie is used instead.


Holy rotten metal batman... those are pretty bad. Why in the world isn't everyone up in arms over this?....


PII concept is not the same for everyone/everywhere. For GDPR we have:

> Article 4(1): ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

If this chrome browser ID is matched against a (for example) google account, then they can track every single person. And that is just a couple of IDs, let alone all the quantity of data they have.

It's against GDPR to not be clear about this kind of ID. If my browser has an unique ID that is transmitted, then this ID can be coupled with other information to retrieve my identity and behavior, so it should be informed (in the EU).

EDIT: TD;LR, hiding behind "there is no PII in that ID" is not enough.


Who's going to raise this issue though? And what if they put this in the browser's T&C?


I thought they needed explicit consent. T&Cs ain't that.


> Who's going to raise this issue though?

I'm sure there is someone out there who takes these kind of things seriously. Not me. I use firefox for that matter.

> And what if they put this in the browser's T&C?

Then the rest of GDPR applies: a clear message about the browser sending this info has to be shown, explaining why, with who they'll share it, the time they will keep this info, plus no auto opt-ins, the possibility of asking Google (or whatever) all the info relative to this ID and the option to cancel all the data, etc.


This is why I consider the GDPR to be unrealistically broad in its definition of PII; it denies even innocuous feature-mode-distinguishing headers intended to allow for bug-identification of massively-distributed software installs.

If I'm given a forced choice between "more privacy" and "better software quality" I'm going to lean towards "better software quality."


Me too. Then a breach happens and someone with a straight face tells you: "we take your privacy very seriously", asking apologies, because the breach used some of your data to push some political campaign or to bother you with spam/extortions because that night you were watching some porn.

Programmers should stop pushing buggy or incomplete software as is, and start releasing software that works. Otherwise upper levels have an excuse to do all this "experience" telemetry, and we all are smart enough to see the consequences of a data breach.


> Programmers should stop pushing buggy or incomplete software as is, and start releasing software that works

If you demand a perfection-of-function guarantee from something as complicated as a web browser, you'll never get a web browser with more features than the ones released in the '90s (and I'm not even sure we'd be that far along by now).

If I'm given a forced choice between "more privacy" and "the software ever having the features I want to use" I'm also going to lean towards "the software ever having the features I want to use." And we know this is true for users in general because of the number of users who had Flash installed back-in-the-day in spite of the fact that it allowed a total bypass of the browser security model, because it had features that the browser lacked otherwise.


Instead of giving my privacy away, I prefer software like anything that you have installed from a CD-ROM back in the 90's and didn't needed a weekly update. Games, 3D-Studio, Autocad (to name a few) were more complex than a web-browser (a today's web-browser) and didn't needed a weekly update or the hunger for user-requested features, let alone dialing home because. The world worked relatively fine without the up-to-date wankery we see today.


I remember them.

They were also buggy and could crash their resident OSs all the way to a stuck state, and if they did, the solution was "Try not to trigger that bug again."

Software quality has significantly improved in the era of easy patch access and auto-patching.


Holy Jesus. Those things were chock full of security holes. If you used a web browser that arrived on a CD ROM you'd be advertising massive pwnability.

In fact, you could easily simulate this by using last year's Firefox.


Firefox, chrome, linux ... all are full of unnecessary complexity. The point being - we need daily patches to keep it from falling apart.

I have links (or lynx) on an old SuSE, maybe even a Mandriva CD. Would they be massively pwnable?


Hard to say, but not necessarily a great example; exploits on software are a function both of attack surface / complexity and installed userbase (i.e. nobody bothers to see if lynx is pwnable because a zero-day against that browser will be worth, what, twenty bucks to gain access to the five people who use it?).


Perhaps. Perhaps not. As a thought experiment:

How long would it be safe to go without browser updates with a browser of complexity/capabilies of links, if 50% of people used it?

With many people combing through it, would it become effectively unexploitable?


Probably not very long. Even with a small attack surface, if half the world uses it, the zero-days are valuable. Links is still vulnerable to

* application-layer attacks (it is still an HTTP client and HTML parser, and the protocols themselves are complicated to implement soundly, even if the newest features aren't included)

* protocol attacks (is links immune to buffer-overruns triggered by intentionally-malformed queries? Probably not, since it has no total-soundness verification. And the source code isn't open-source so )

* dependency attacks (it uses svgalib [https://www.cvedetails.com/vulnerability-list/vendor_id-84/p...], and every third-party library is a potential attack vector)

* good old-fashioned UI spoofing (is links' UI design immune to allowing web pages to show an image that tricks the user into thinking they're looking at the links UI itself?)

In this thought experiment, any successful attack has massive value so we can expect bad actors to be hammering on the system and finding most such exploits available on the application.


> .. source code isn't open-source ..

Not sure what you mean, but then what is this: http://links.twibright.com/download/

> In this thought experiment, any successful attack has massive value so we can expect bad actors to be hammering on the system and finding most such exploits available on the application.

Precisely, and because of that, with 50% people using it, an orders of magnitude smaller attack surface and a mostly fixed feature set (you could at least have a LTS version), just how many vulnerabilities are there to find? How many man-years of work until there is nothing¹ left to find? Do you think that just any code has exploitable vulnerabilities, you just need to look hard enough? And with each fix, you can repeat that ad nauseam?

With the current browser development efforts, would we end up with a 100% formally verified browser, including its dependencies, networking, and maybe even relevant parts of a linux kernel?

Judging by the change log[2], links is currently developed by 1 developer and occasional contributions.

¹ Nothing of sufficient importance, frequency and lack of reasonable mitigations like not clicking on browser look-alikes, server-side CSRF protections, etc.

[2] http://links.twibright.com/download/ChangeLog


> you'll never get a web browser with more features than the ones released in the '90s

I would actively prefer a web browser that lacks the features added since the '90s.


That's understandable, but it isn't what most people want---developers or users alike.

Browsers aren't just thin-clients to support HTTP protocol and HTML rendering. They've grown to adopt a new distributed computing paradigm, not unlike UNIX and its descendants grew to support a new multi-user-cum-multi-process paradigm. The things web development offers---location agnosticism, platform agnosticism, combined multimedia interaction, a workable security model for multi-source aggregate-component content---are eating software development, and the browser is becoming the OS of the modern era. We know users want this because users were willing to use Flash (even though Flash broke out of the security model of the old browser).

There'll always be a place for small text-based pages much as modern computing will always have a place for command-line tools, but the genie is out of the bottle and it won't be put back in.


> it isn't what most people want---developers or users alike.

I'm fully aware of this, and this, at heart, is why I'm certain that the day will come when I can no longer use the web at all.


The mozilla suite in 1998 included a browser, an email/newsgroup client, an IRC client, an address book and an html editor.

Modern browsers for all their bloat actually have less features.


Such suites are not browsers. They include a browser.


> This is why I consider the GDPR to be unrealistically broad in its definition of PII

And I consider it far too narrow.

> If I'm given a forced choice between "more privacy" and "better software quality" I'm going to lean towards "better software quality."

Fair enough. I would go for "more privacy", personally. There is no technical reason why both of our preferences couldn't be honored.


Well why does Chrome send this special header to only Google properties like YouTube and search and not the rest of the internet.

It really seems fishy and a lot of double speak. I really don’t trust Google here.


> and not the rest of the internet

Privacy issues aside, this might not help an antitrust case if one is brought against them.


This it outrageous. Browsers are user-agents, not advertising accelerators. They should hide as much personal identifiable information as possible. This is exactly why using a browser from an advertising company is not a good idea. They use it to improve their service... The lie gets old...

This comment was sadly written in Chrome, since I need it for testing...

edit: pretty much exactly 10 years ago they already tried their shit with a unique id. We should have learned from that experience.


Well when the browser is created by an advertising company...


>This comment was sadly written in Chrome, since I need it for testing...

You realize you can have multiple different browsers installed, right?


According to https://www.google.com/chrome/privacy/whitepaper.html

"We want to build features that users want, so a subset of users may get a sneak peek at new functionality being tested before it’s launched to the world at large. A list of field trials that are currently active on your installation of Chrome will be included in all requests sent to Google. This Chrome-Variations header (X-Client-Data) will not contain any personally identifiable information, and will only describe the state of the installation of Chrome itself, including active variations, as well as server-side experiments that may affect the installation."

While this header may not contain personally identifiable information, its presence will make every request by this user far more unique and thus easier to track. I do not see Google saying they won't use it to improve their tracking of people.


One click while logged into any Google property will be enough for them to permanently associate this GUID with your (shadow) account, they know it, and they know you know it too


So, an extremely unique identifier for tracking purposes, that effectively no one knows exists, and no one knows can be changed at all?

With an obscure white paper that allows Google to claim they comply with the law because "they totally offer a way to change that and they even published that information to the web for anyone to find"?

Gotcha.


Don't be evil...

Until we are deployed enough that users don't have a choice...

Now that Google has cornered the market for Internet browsing, they're using that foothold to change how it works to suit their dominance. This is why they are not concerned about per-site tracking that Google Analytics does, as long as THEY as a company have direct browser-based tracking, they no longer need to provide tracking services to other private companies to know what is trending everywhere. This is also probably why they're trying to kill ad blockers and certain browser privacy extensions.... But they won't really matter to Google if everything is done at the browser level to begin with from now on. :/

If they make moves to scale back [free] Google Analytics, which they probably will at some point, it will only highlight this ideal... They may turn to selling their privately collected metrics and qualitative studies to companies after Google Analytics is rendered useless, and then that's unadulterated monopolistic profit for them and shareholders...

Diabolical.


True. But luckily you actually have a choice. Many opt for DuckDuckGo on Firefox, for instance.


You are right, but they also know most people won't switch. They have an entire generation of folks that don't even think about privacy.


There's also the subset of all of us who must use Chrome because <solution X> needed for work requires said browser. Google's dominance through Chrome extends to the whole ecosystem. Same thing with Apple inside their own (which is nowhere near a monopoly at 10-15% market share worldwide, thus totally fair game by comparison).


On the other hand, people hate ads, so going to Firefox might actually be better option for new users.


You can probably be identified on Firefox too: https://amiunique.org


They might and I used to be one of them, but now I use Google on Firefox isntead, because DuckDuckGo no longer yields useful results. The number of times I don't go "oh ffs, fine, !g" has been in steady decline over the last year, and at this point I've given up.


Why do people still dredge up Google's historical "don't be evil"? It's not been applicable for half a decade now, and even in 2015 when it was officially removed from the last company documents, it was already a dead phrase.

Google had already cornered the market back in 2012, when it surpassed every other browser, with an absolute majority dominance (>50% market share) achieved way back in 2015.

Google has been in control for a long time now.


Because of the deep irony? If you have a moto that binary and later decide to remove it, what is the world to infer?


> Why do people still dredge up Google's historical "don't be evil"?

Historical? It's not like it was 50 years ago.


In a world where broadband internet hasn't even been available for 2 decades, 5 years is a bloody long time.


Please don't post blatantly false statements that are trivial to refute.

wikipedia.org/wiki/Don't_be_evil


Reminds me of this.

"There’s no point acting all surprised about it. All the planning charts and demolition orders have been on display in your local planning department in Alpha Centauri for fifty of your Earth years, so you’ve had plenty of time to lodge any formal complaint and it’s far too late to start making a fuss about it now"


Beware of the leopard!


Are you talking about the same thing? Because the identifier above is claimed to have 13b of entropy. Is there another high entropy identifier?


13b, if usage statistics are disabled (not the default). Otherwise, unspecified amount of entropy.


Actually, the low entropy provider is used for any field trials that get included in the header.

See: https://cs.chromium.org/chromium/src/components/variations/v...


thanks. and ugh.



13b plus IP is already huge, but browsers leak so much more than that.


By default it's much more than 13b. Seems to be 13b only if you disable analytics/crash reports.


Your comment is factually incorrect.

13 bits of entropy is not an extremely unique identifier.

The first three letters of your first name have more bits of entropy than that. It would be quite a trick to uniquely identify you by the first three letters of your first name.


I fear the factual incorrectness isn't mine: the random string used is 13 bits of entropy only if usage statics is disabled, which isn't the case by default. By default, it uses an unspecified entropy (and you can bet real dollars that it'll be more then 13 bits worth).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: