
Identifying People by Their Browsing Histories - signa11
https://www.schneier.com/blog/archives/2020/08/identifying_peo_9.html
======
AdmiralAsshat
So researchers at Mozilla have replicated the findings of a 2012 study that
allows users to be identified through the collation of data from third-party
trackers as the users browse popular websites.

Question: Was Mozilla replicating using a "vanilla" browser (e.g. no
adblocking/tracker protection)? Or is this even _after_ tracking mitigations
are put into place? Mozilla's own Firefox now has built-in "tracker"
protection in all versions of its browser, so it seems like they would be
well-positioned to test whether the de-anonymization is thwarted with tracking
protection toggled on or off.

~~~
pflock
Not sure about with Firefox's built in tracker protection. But I recall
reading that having ad/tracker blocking could actually make a user easier to
uniquely identify, because the browser now behaves differently than most:
[https://panopticlick.eff.org/about](https://panopticlick.eff.org/about). My
hope would be that built-in tracker will make ad-blocking browser traffic more
ubiquitous.

~~~
throwaway_pdp09
Pretty sure mine is highly distinctive for exactly that reason, no cookies +
no JS, but if all you can tell from me is when I'm online, what useful info
will that give you to sell me to advertisers. Nothing. I'm worthless. Track
away.

------
rvrabec
"High uniqueness hold seven when histories are truncated to just 100 top
sites." This is similar to the app finger printing they do with mobile phones,
identify you by the unique assortment of apps on your device.

Should the top concern be about identification or deep collection of browsing
history?

~~~
elliekelly
Does iOS allow access to that information?

~~~
hombre_fatal
Making a request to any 3rd party domain on every page of the internet is what
gives people access to that information.

It's not talking about a browser.getHistory() API. Owning the 3rd party
resource (CDNs, analytics) that is loaded on most big websites is far better
than that.

------
sct202
Page 11 of the original doc has "Theoretical third-party reidentifiability
rates" by company:
[https://www.usenix.org/system/files/soups2020-bird.pdf](https://www.usenix.org/system/files/soups2020-bird.pdf)

I'm surprised how many companies (Facebook, Verizon, Adobe, Oracle, Twitter)
are almost matching Google's tracking networks. Google's makes sense based on
the amount of Adsense / Analytics trackers there are out there, but I hadn't
realized these other companies are just as pervasive.

Edit: typo.

------
natcombs
ELI5?

I get that a browsing history C is unique, but if I clear it, how can you
identify that my new history D is tied to C and not unique?

~~~
hombre_fatal
Because you visit the same domains across sessions. Throw in some less common
domains like HN (i.e. top 1000 instead of top 100 website), and you're trivial
to reidentify. (This is what "reidentification" is referring to in the quoted
part of the paper in TFA)

~~~
sumtechguy
Also even if say you and I visit the same exact 100 sites. Order, time on
site, sub pages visited, and time of day I use the sites, all can mater. Ad
network can also leak out data as you may see a different set of ads than me.

------
ravenstine
Now I feel even better about disabling my browser history in Firefox.

~~~
reificator
The only thing that does is prevent _you_ from getting anything useful out of
your history. It does not prevent the topic at hand.

~~~
squeezingswirls
Exactly, for this they'd need to run an ad blocker and a script blocker, such
as uBlock Origin and NoScript.

------
ScannerSparkly
I doubt you could accurately identify a specific person

~~~
ben_w
33 yes-no questions which each split the audience in half, uniquely identifies
slightly more than the world population.

But any given website is much tighter than that: a regular visitor to
Cambridge Evening News is unlikely to be based in राजनांदगांव, and vice versa.

Someone is regularly accessing the website of one local take-away restaurant
in Larnaca, a gay men-only dating website, and Ars Technica? That’s probably
already got you down to 2-15 people out of the ~3e9 on the internet, with just
three specific websites in their history.

