Hacker News new | past | comments | ask | show | jobs | submit login
Protecting Against Browser-Language Fingerprinting (brave.com)
138 points by sam_lowry_ on June 16, 2022 | hide | past | favorite | 77 comments



As a non-English speaker, I find this pretty useless. I always want to get English version of any website, and not some poor translation, and honestly the fingerprinting entropy is negligible. This seems to be PR-driven development, just to score some points from the privacy aware community.

Every time I see tech being proposed for privacy instead of legislation I wonder how the topic is kept so vague. There are a handful of companies that can track you across multiple websites, so any real solution has to start by enumerating and addressing those companies.

It feels like were always discussing about curing "diseases" without explicitly saying that malaria, TB, etc are the targets.


> There are a handful of companies that can track you across multiple websites, so any real solution has to start by enumerating and addressing those companies.

I think it's more like hundreds than handfuls. But they're all connecting your behavior across sites using the same few techniques:

* Explicit methods: cookies, link decoration, and other browser-supported ways of adding entropy. Browsers are working on removing these, but if they move too aggressively here then adtech just moves to:

* Fingerprinting: using existing browser entropy. Generally worse than explicit methods because the user doesn't have control (ex: shared fingerprint between successive private browsing sessions). Browsers are also working on reducing this, see the article, but it's very hard because the number of techniques is large and they generally use features users/sites depend on.

* Timing attacks (pretty sure no one is doing this commercially yet)

You might be interested in https://github.com/michaelkleber/privacy-model

(Disclosure: I used to work on ads at Google)


> I think it's more like hundreds than handfuls.

Could you be a bit more explicit here? I'd be curious how these entities coordinate themselves, since I don't have hundreds of cookies set by any website, so there must be few networks to correlate this data (FB, Google, some IAB groups, who else?). Going after those networks would seem like the obvious next step then.

Thanks for the link. It seems like an honorable direction, but it's a bit nebulous on how browsers would be incentivized to implement with good intentions.


> I don't have hundreds of cookies set by any website

Are you sure? These are third-party cookies, and it's not easy to get a full list. One way to do it is to go to a major publisher (NYT, CNN, etc) with devtools open and networking enabled. Filter to third party requests and look for ones sending cookies. Trying this on the NYT front page I saw 3p requests with cookies to amazon-adsystem.com, doubleclick.net, prebid.media.net, rubiconproject.com, adnxs.com, 3lift.com, openx.net, google.com, scorecardresearch.com, casalemedia.com, pubmatic.com, bluekai.com, adsrvr.org, bing.com, twitter.com, everesttech.net, criteo.com, dotomi.com, bidswitch.net, mfadsrvr.com, agkn.com, pswec.com, adtdp.com, demdex.net, bidr.io, adition.com, brand-display.com, intentiq.com, w55c.net, pippio.com, rlcdn.com, and adsymptotic.com before I got bored and stopped counting. Some of these might not be for personalized advertising, but most of them look like it.

> browsers would be incentivized to implement with good intentions.

Browsers compete on privacy, and what they do is open source. So while their incentives aren't perfect, external groups (and competing browsers!) can help keep them honest by paying attention and calling attention to bad decisions.

A great example of this was Mozilla's thorough and careful privacy analysis of FLoC (https://blog.mozilla.org/en/privacy-security/privacy-analysi...), and looking at Topics (https://github.com/patcg-individual-drafts/topics) Chrome seems to have spent a lot of time addressing that feedback.


The data from multiple sources is collected by aggregators. In theory, its "anonymous" because it doesn't have your name and address on it. But in reality it contains hundreds of data points including everything from your TV watching habits (from your smart TV and streaming services) to your credit card history (sold by the card companies and retailers) and much more.

Customers who match the criteria selected by advertisers can be targeted for other ads -- on different websites, streaming channels, by mail, etc. So this is how advertisers have access to information about you which they, themselves, did not collect.


I hope timing attacks will not be commercialized or we will see quite some performance degradation in Brave ;-)


Sometimes legal solutions is what you want, but... For starters, one can't legislate for the whole world. It means this kind of legislation will farther fragment the internet (unlike purely technical solutions). Legislation targeting particularly dynamic technical fields may not age well. Also, it is a product of various compromises which often produces a minefield of consequences.


Counterpoint: https://en.wikipedia.org/wiki/Brussels_effect If a market like the EU requires you to set-up specific capabilities to be GDPR compliant, like data export and opt-in flows for cookies, it is often cheaper just to treat everyone like EU citizens instead of trying to differentiate. Or, at least to build those capabilities to be able to use them where relevant. So while the EU can't make Apple use USB-C on all the products they sell around the world, they CAN make the alternative (having dual models for EU and other markets) undesirable.


Apple previously made a dual SIM iPhone for the Chinese market. Depending on the regulation that eventually gets enacted, they will either make a port-less iPhone and force wireless charging, or make a special USB-C iPhone for the EU market.


See that's a somewhat bad example because it's a small implementation detail. A different SIM enclosure that's double-sided is put in the phone and it's suddenly a chinese dual-SIM.

Imagine the pain of selling accessories for two types of iPhones, USB-C and Lighting. The engineering that needs to accomodate two ports. The amount of people turning to the grey market to get a USB-C iPhone.

I'd be shocked if Apple ever releases two models. My bet is the first iPhone that falls under the European mandate will have no ports.


> My bet is the first iPhone that falls under the European mandate will have no ports.

I'd give that maybe a 75% chance. I'm not totally sure they're quite ready to release a portless iPhone, which I believe would be a very unpopular change overall. Angering users hasn't always been a big concern for Apple, but I think it's more of a concern than it used to be. Wireless charging just isn't a good fit for a lot of charging scenarios.

Selfishly, I do hope that they don't release a portless iPhone anytime soon. I'd have to choose between upgrading my iPhone -- something I do every year or two -- and having CarPlay work in my car (which needless to say I very infrequently upgrade). I suppose a dongle or attachment for this purpose would be inevitable.


I'm sure Apple engineers are hard at work replicating the throughput of USB 2.0 speeds on a magnetically locked induction connection. Then they can offer a dongle so we don't have to change our cars!


> The amount of people turning to the grey market to get a USB-C iPhone

And the amount of people turning to the gray market to get a Lightning iPhone. I personally live in the EU, and I'll strongly consider importing my new phone if Apple decides to go with two separate models.


I see what you mean, but I don't think GDPR is really a valid example, because data-protection became at that moment a global concern, so quite a lot of companies were ready for it, there was an expectation that many jurisdictions will folow the suit anyway, and generally it wouldn't be bon ton to resist it. There are lots of very visible cases when it doesn't work like that: search, app markets, availability of news sources, medical information - what you see is dependent on your location, and sometimes even on your passport color, and as I see it it's more often restricting you as a user, than protecting.


I wish we could rely on legal means to prevent tracking, rather than break useful functionality.

It's already annoying that developers assume my language based on my location, with features like this my real preference will be harder to determine.


In this case that functionality is broken right from the start. There should be a visible lang switch on a website. All other solutions are too technical, and/or have unexpected consequences, and/or lack flexibility.


There should be both: picking a user-specified default on first visit to address the common case, but still provide a switch to change the language manually.

As a user, what I actually miss is the ability to configure language preferences in the browser based on CCTLD.


The HN title is unnecessarily editorialized; it should be replaced with the original title: "Protecting Against Browser-Language Fingerprinting".

And regarding the editorial comment of the submitted title:

- What it does is report only the most preferred language by default

- You can turn it off (i.e., just the language obfuscation) if you want to report all language preferences


If you speak an uncommon language, this is effectively almost the same as disabling it completely, because so few websites will support your language. They will probably default to whatever the default language of that website is, which may or may not be a language you speak.

I understand that it's easy to disable it, but it feels like a default that is strongly biased in favour of speakers of English and other widely spoken languages.

Maybe there should be a way to request all languages together and let the client pick whichever one it wants (I'm sure the text on a typical page when compressed, even times 50, is still negligible compared to the 50 MB of JavaScript frameworks it is probably pulling in) That would not sacrifice usability for privacy.


Actually it would be good if the default would be that you can request language based on domain name and tld. Actually i normally would want German as default for all .de sites and would chose English for the rest.

But here I see no added privacy in a normal setting if I do not use vpn, because 90% of IP addresses from Germany would report the same two languages. So only e.g. if I travel e.g. to Japan it makes me quite fingerprintable.

So I think the ideal would be if some entropy score could be displayed/predicted based on context (e.g. source or target address as above) and I could dynamically chose the trade-off between a bit of privacy and convenience.

The funny thing is that e.g. if you are in a country with a nonenglish speaking majority that has English as second language, just reporting either language assigns you to a smaller subgroup and makes no sense.


That would be amazing. Report German for all .de sites, just English for everything else. Not because of privacy, but because that’s what will probably give me the best quality ;)


It would make sense if websites had a standardized response with the language preferences of _their content_, and browsers simply responded with whatever their user chooses among the available ones.

Something like

{ 0: de, 1: [en,fr], 2: [ru,es], 999: [zh,jp,...] }

where the keys are an arbitrary 'priority' score chosen by the server. 0 would be the original language, then in this example maybe 1 could be full human translation, 2 partial human translation, 999 machine translation.

The browser could keep its language preferences client-side and simply request their favourite language among the available ones.


Yes. 'Breaks' is quite judgmental. It would be like saying "Gloves break fingerprinting for more anonymity"

We could replace 'breaks' with 'fixes', and we'd have the same kind of problem, but with the opposite bias: "Brave fixes language reporting in browser for more anonymity"


Changed now. (Submitted title was "Brave breaks language reporting in browser for more anonymity in strict mode".) Submitters: please follow the site guidelines, which ask: "Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html


I have a hard time remembering the last time I’ve seen a site using that header for its intended purpose…


All public-facing sites I've ever worked on, used that header for its intended purpose. Europe has a lot of different languages in a relatively small area, so if you want to do ecommerce there, you'll really need it.


I've worked on and written a few sites like that too, but in my experience quite a few larger sites use the IP address for this. It's pretty annoying if you don't speak the local language well or just prefer the English version. Google is a major "offender" for example, but there are many others too. PornHub even machine translates stuff.


Tell me about it, as a native English speaker living in Germany. This happens even when the website has an English language version. I don't understand how they think this is helpful. My browser literally told you the language I want to be served in its first header!

Related gripe: Twitter will only offer the report form (for e.g. a harmful tweet) inside Germany, in German. I do understand conversational German but not German legalese; I will not bother to select which exact subparagraph of the communications legislation the tweet violates, I will just close the tab and let someone else report it.


RE:Twitter, I suspect that's because Germans get a different report form than other visitors - Germany has some specific laws (e.g. NetzDG) that apply to visitors from Germany, rather than visitors who speak German.


Wgat's wrong with showing some flags? I speak a small language and regularly buy things from things in neighboring countries. But even when I don't speak that lanugage at all, I can just look for a british flag or a language dropdown or similar. I have no idea what language my browser reports and I hope it reprts a single "English" and no fallbacks (despite that not being my first language).


Which flag should you use for the English language?

Union Jack? English Flag? USA Flag? Some horrible hybrid between them?

None of them is a perfect fit. As a techie I'd say ISO codes ('en', 'en-gb', 'fr', 'de') but I'm not sure how much those are understood, and probably not so hot for non-latin scripts.


The union jack for en-GB, but I'd say the UK flag useful regardless. Even for selecting en-US! No one outside native english speaking countries really cares deeply about what way color is spelled in a web shop, and everyone recognizes the flag(s). Having a flag doesn't preclude showing "en" or "English" next to it either. But I wouldn't confuse users with full ISO codes like en-GB or en-US. Just "en" and it can be the natural en of the site location. Everyone expects en to be en-US for a US site.


And it's not 1-1 the other way; there are many countries with more than 1 official language. The best way IMHO is a drop-down with the language name represented in its own language (e.g. "Deutche"), with the default based on the Accept-Language header.


What decides the accept-language header though? Where do modern browsers pick it from? The OS language?


At least in Firefox, it's in the browser settings, under General > Languages > Choose your preferred language for displaying pages > Choose...

This lets you specify an ordered list of whatever languages you prefer. If you haven't set it manually, usually it defaults to whatever your system language is.


It's configurable from within the browser, though it moves around a lot and has been getting hidden lately. I think default is based on the OS language, but of course that can vary between browsers and versions.


Flags represent countries, not languages.


I always set it to US English because I don’t like translations, yet I pretty often get translations.


It's so standard in web frameworks and CMSes that you must see it in action every single day without realizing it, as long as the sites are using the built-in i18n capabilities - which in almost all cases they are. And in Apache/Nginx configs that involve header-based redirections to a local language domain/subdomain, same deal.


Plenty of sites do, at least in Europe.


Meanwhile, I set intl.accept_languages in Firefox (via about:config) to en-AU,en-GB,en,en-US. I doubt being so specific has ever actually helped, but I’m stubborn. If you’ve ever seen that particular Accept-Languages sequence in your logs… it’s very probably me.


Congratulations, you've made yourself unique and thus easy to track. Use the resistFingerprinting preference in Firefox or just use Librewolf - they automatically enumerate a standardised set of accepted languages that a large userbase all share.


I feel it's a good thing as safari also does this and it hasn't broken web seriously


Not the Strict mode change that sends only English


Randomizing q sounds like it will make the Brave browser stand out, but it probably already does (?)


There's two ways of appearing anonymous.

In one approach, you can look like everyone else. This is hard to maintain, as any singular value can make you stand out from the crowd, making it easier to sift and isolate you. However, it is easier to implement, because you just need a bunch of constants in the software.

The other approach, is that every single time, you look like someone completely unique. Whilst difficult to get right, this approach does mean you look unique and you do stand out. But every single connection has that feature, which makes it rather difficult to get two completely unique profiles and determine if they are the same person.


It's not either/or. Tor Browser has many defaults that it asks you not to change, so that all Tor users look the same. The changing IP address however makes you look like a different user.


I didn't think about that, good point.


The setting that enables this appears to be causing some issues:

https://github.com/brave/brave-browser/issues/23093


Control-F for "canvas" - it's not discussed in the article. Canvas fingerprinting undoes all of these "protections" which Brave is talking up. Their mitigations are about as effective as changing your UA string these days - not at all. Most of Brave is like this - sounds technical and impressive, but it's mostly meaningless gibberish when viewed in full context. I mean, a "privacy browser" based on Chromium - I feel like I'm taking crazy pills.


They actually do handle canvas fingerprinting (I don't know to what extend):

https://github.com/brave/brave-browser/wiki/Fingerprinting-P...

and a bunch of others fingerprinting methods.


That's just one of many ways to fingerprint you. This blog post is only concerned with showing you how they are eliminating a different one.


Cool, just enabled it right now. Will be great to have similar stuff for timezone.


Is it just me or is brave (especially on linux) getting slower and more power hungry than simply using chromium? If anyone has any fixes or tips on how to reduce resource usage they'd be much appreciated.


Am I wrong here that all the emphasis on fingerprint protection is largely moot when not using a proxy? The entire design of IP networking means you’re fingerprintable by IP, doubly so on IPv6?


You still have a fingerprintable IP with a proxy. It's just a slightly different set of integers.


This is one of the reasons for Apple to build "iCloud Private Relay" [1], so you at least have a new IP address every so often.

[1] https://support.apple.com/en-us/HT212614


Multiple people can share one IP


I mean yes, with NAT, especially on IPv4 but fingerprinting at the household level still seems exceedingly valuable.


There is also CGNAT which is used by mobile provider and increasingly on fixed line internet.


Most people are on dynamic IP's.


They’re usually not actually that dynamic. I’m on Comcast and my IP hasn’t changed since before COVID/wfh started. I know because I have to update my works whitelist for the VPC every time it does.

Even if it changed weekly, that’s valuable data. How many websites do you access in that time?


Using a common household internet access here in Germany, i.e. consumer level, one gets assigned a new IP roughly every 24 hours.


Yeah. I could get a static IP for 5€/month, but besides that everything is dynamic.


This greatly varies between ISPs. We too have such a whitelist for wfh. Some people have had the same IP for years. Others have a different IP every week, sometimes even more frequently.


I’ve stopped using Brave on iOS since I can no longer log into Twitter, Reddit nor Hacker News.


I'll take a look later today. What's the issues or errors preventing the logins on the sites?


Looks like the login is not “taking” in my valid username and password.

You supply the username/password, click SUBMIT button and they all return back to the same login page, but still not logged in.


Log-ins works under Edge/iOS, Firefox iOS, Chrome/iOS, Safari, Firefox Focus, Tor+VPN/iOS, Aloha/iOS, DuckDuckGo/iOS, and Orion/iOS.


I’ve tried the log in effort under

- Brave Shields Up - Shields Up, all 16 combination - Shields Down

So because shield is down and login does not work, I’ve stopped all testing with Shield-related effort.

Version 1.38 (22.5.13.17)


That's going to be fun for users of web sites that set language based on that header...


From TFA:

> By default, Brave will only report your most preferred language. So, if your language preferences are “English (United States)” first, and Korean second, the browser will only report “en-US,en."1 Brave will also randomize the reported weight (i.e., “q”) within a certain range.

> If fingerprinting protections have been set to Strict, Brave will instead always report the language preference as “English,” which ensures the largest available anonymity set2. And here, too, Brave will randomize the reported weight (i.e., “q”) within a certain range.

> ...Brave users who wish to share more information about their language preferences with websites can easily configure Brave to do so. Users can disable the font / language protections by visiting brave://settings/shields and toggling off Reduce the identifiability of my language preferences.


I don't think you read more than the (misleading and unnecessarily editorialized) headline on hacker news.


I did. It will still wreck sites that don't have the primary language available, but where the secondary might be.


I started to use Brave after DDG went "rogue" and I have to say, it has been a good run so far.


What did DuckDuckGo do exactly?


Their browser doesn't block Microsoft tracking scripts (but blocks everything else) due to a contractual agreement. IMO DDG did a poor job of explaining this, choosing to hide behind marketing buzzwords.

https://news.ycombinator.com/item?id=31490515




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: