Hacker News new | past | comments | ask | show | jobs | submit login
Identifying People by Their Browsing Histories (schneier.com)
64 points by signa11 on Aug 31, 2020 | hide | past | favorite | 21 comments

So researchers at Mozilla have replicated the findings of a 2012 study that allows users to be identified through the collation of data from third-party trackers as the users browse popular websites.

Question: Was Mozilla replicating using a "vanilla" browser (e.g. no adblocking/tracker protection)? Or is this even after tracking mitigations are put into place? Mozilla's own Firefox now has built-in "tracker" protection in all versions of its browser, so it seems like they would be well-positioned to test whether the de-anonymization is thwarted with tracking protection toggled on or off.

Neither. They used a plugin to anonymously track people who opted-in to the experiment.


Not sure about with Firefox's built in tracker protection. But I recall reading that having ad/tracker blocking could actually make a user easier to uniquely identify, because the browser now behaves differently than most: https://panopticlick.eff.org/about. My hope would be that built-in tracker will make ad-blocking browser traffic more ubiquitous.

Pretty sure mine is highly distinctive for exactly that reason, no cookies + no JS, but if all you can tell from me is when I'm online, what useful info will that give you to sell me to advertisers. Nothing. I'm worthless. Track away.

"High uniqueness hold seven when histories are truncated to just 100 top sites." This is similar to the app finger printing they do with mobile phones, identify you by the unique assortment of apps on your device.

Should the top concern be about identification or deep collection of browsing history?

The point is that browser history collection is the same as cross-site tracking. Any 3rd party analytics operation like Google Analytics is able to access your browser history. To such a point that whether they do or not shouldn't matter and couldn't be proven anyways.

Does iOS allow access to that information?

Making a request to any 3rd party domain on every page of the internet is what gives people access to that information.

It's not talking about a browser.getHistory() API. Owning the 3rd party resource (CDNs, analytics) that is loaded on most big websites is far better than that.

Page 11 of the original doc has "Theoretical third-party reidentifiability rates" by company: https://www.usenix.org/system/files/soups2020-bird.pdf

I'm surprised how many companies (Facebook, Verizon, Adobe, Oracle, Twitter) are almost matching Google's tracking networks. Google's makes sense based on the amount of Adsense / Analytics trackers there are out there, but I hadn't realized these other companies are just as pervasive.

Edit: typo.


I get that a browsing history C is unique, but if I clear it, how can you identify that my new history D is tied to C and not unique?

This isn’t about the history stored on your computer, this is about the browsing habits observed in real time by eg: ad agencies who get lots of information about where you have been because their ads run everywhere.

Because you visit the same domains across sessions. Throw in some less common domains like HN (i.e. top 1000 instead of top 100 website), and you're trivial to reidentify. (This is what "reidentification" is referring to in the quoted part of the paper in TFA)

Also even if say you and I visit the same exact 100 sites. Order, time on site, sub pages visited, and time of day I use the sites, all can mater. Ad network can also leak out data as you may see a different set of ads than me.

Now I feel even better about disabling my browser history in Firefox.

The only thing that does is prevent you from getting anything useful out of your history. It does not prevent the topic at hand.

Exactly, for this they'd need to run an ad blocker and a script blocker, such as uBlock Origin and NoScript.

Indeed! The day I started ignoring my bank statement was the day I became truly secure.

I doubt you could accurately identify a specific person

33 yes-no questions which each split the audience in half, uniquely identifies slightly more than the world population.

But any given website is much tighter than that: a regular visitor to Cambridge Evening News is unlikely to be based in राजनांदगांव, and vice versa.

Someone is regularly accessing the website of one local take-away restaurant in Larnaca, a gay men-only dating website, and Ars Technica? That’s probably already got you down to 2-15 people out of the ~3e9 on the internet, with just three specific websites in their history.

Why do you doubt this? Even if you can't always trust the 'unique' part, it is still information which can be combined to produce a more accurate profile.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact