
Is preventing browser fingerprinting a lost cause? (2012) [pdf] - cpeterso
http://www.w3.org/wiki/images/7/7d/Is_preventing_browser_fingerprinting_a_lost_cause.pdf
======
rblatz
When the EFF's Panopticlick says my iPhone running the latest version of iOS
is basically unique I start to get suspicious. There are millions of phones
just like mine out there.

~~~
saurik
I just tried it out, and 1) their definition of "nearly unique" was "1 in
2068276.0 browsers", which I agree is not good but also seems to be a non-
intuitive rating scale, and 2) I swear from looking at the results that they
are failing to take into account cross-variable correlations (but have not dug
into this enough to be sure, so don't quote me on this or anything: do your
own research), by which I mean that each entropy source is considered in
isolation, even though my screen size and user agent (which tells you I have
an iPhone running iOS 8.4) should massively discount the screen size statistic
(which is effectively just telling you I have an iPhone 6), as well as my
"browser plugin details" (as they are likely the same for every single
iPhone), and probably to some lesser extent even my time zone (I live in
California, where I would imagine having a semi-recent iPhone would be more
common, though I could see an argument that my using an out-of-date firmware
is less common ;P). FWIW, if this is not yet including any knowledge of my IP
address, then one in millions is brutal, as I am probably one of only a small
handful of people in Isla Vista that has this specific iPhone configuration.

~~~
jrapdx3
It is disconcerting to see the browser "fingerprint" and realizing it's very
difficult to significantly reduce its "uniqueness" based on all the factors
listed.

I guess that's the point, each element in the list contributes to the
uniqueness "profile". I don't think it matters that certain elements just add
up to identifying an iPhone 6 or whatever we're using.

I gather the profile is strictly by the numbers, here tallied as bits of
entropy. The statistical inference of uniqueness reflects the ability to use
the indexed bits as a pattern to identify the particular browser and
presumably the individual user (if not by the user's name).

It's one case where using common, generic components offers the greatest
anonymity. I'd guess the HN crowd is more likely than average to go for exotic
or bleeding edge configurations which no doubt will have the most distinct
fingerprints of all.

~~~
saurik
Yes, I understand that; my point is that if "one in ten people are using a
device with this exact screen resolution" and "one in a thousand people have
this user agent" then it is not true that "one in ten thousand people have a
device like this one" if "half the people with this user agent have this exact
screen resolution": it is only then true that "one in five thousand people
have a device like this one". If everyone with that user agent has that screen
resolution, the result would just be "one in a thousand people have a device
like this one".

I am concerned that they are calculating entropy incorrectly as I feel very
confident that there are pretty obvious correlations between many of these
variables: given my user agent you can guess my screen resolution with a
probability much better than random chance (as you know I have an iPhone, and
there are only so many screen resolutions for an iPhone, and some of these
iPhone configurations are more common than others in the wild) and it might be
the case that you can guess my "plugin configuration" with 100% accuracy.

~~~
jrapdx3
OK, IIUC I'd agree that the "entropy" or uniqueness calculation would be
misleading if in fact all user agents with a particular string identifier had
the same screen resolution. IOW in that case it's redundant information, and
therefore would represent only 10 bits of entropy vs. 13.

However I imagine keeping track of all correlations of this type would be a
big burden for the identifying algorithm. Perhaps it's more practical to
sacrifice some accuracy for the sake of expediency.

If the goal is determining the probability that this browser is the same
browser that previously connected to my server, the estimate just has to be
good enough. Additional refinement may simply not have a big enough payoff to
make it worth the trouble.

Just my view from the sideline...

~~~
saurik
The "goal" of this website is to educate people about browser fingerprinting,
not to provide a practical means for you to fingerprint people on your
website. I do not think it is OK if their algorithm is flawed (and I stress
"if", as I wanted to open a discussion about whether or not their algorithm is
flawed, and was somewhat shocked to end up in an argument that somehow it
doesn't matter that the algorithm is flawed ;P), as then they would
essentially be using pseudo-science to scare people into believing something;
and even if the fear is correct, I'm really a strong believer that the
knowledge leading to the fear needs to be accurate or you end up with people
making really dumb decisions either at a personal or federal level to try to
mitigate the misunderstood problem.

~~~
jrapdx3
I sure didn't intend to be argumentative, and as I said, I do think you are
right, the fingerprinting example is flawed just as you point out.

I was only considering the calculations in the abstract, as representing how
such algorithms might be used by a real entity attempting to track its users.
I was putting myself in the place of the tracker and imagining how I'd deal
with the issues you brought up. I certainly _do not and never have_ done any
such thing on the servers I run in real life.

A year or so ago when I was first aware of the Panopticlick site, I was
impressed with how easy it was to more or less uniquely identify my browser. I
tried numerous ways to make it less unique. Best I could do at the time was
reduce the "entropy" from 22 to 15 or so. Even though I knew these results
were hypothetical, it was still educational.

So the Panopticlick site makes its point even if it's not 100% accurate. How
much it scares people is hard to say, but probably not a lot. Bottom line is
I've been aware that fingerprinting could be happening, but also not a whole
lot to do to reduce my browser's uniqueness. Obviously, it hasn't stopped me
from getting on the web anyway.

------
mnot
For the current thinking about this in the W3C (or at least in the TAG), see:
[http://www.w3.org/2001/tag/doc/unsanctioned-
tracking/](http://www.w3.org/2001/tag/doc/unsanctioned-tracking/)

There's also a document being put together by the Privacy Interest Group:
[https://w3c.github.io/fingerprinting-
guidance/](https://w3c.github.io/fingerprinting-guidance/)

------
schoen
Does anyone know who the author of this presentation is? It doesn't seem to be
credited to anybody within the presentation itself.

~~~
dsp1234
It appears to be Brad Hill[0] according to
[https://www.w3.org/2002/09/wbs/1/tpac2012followup/results](https://www.w3.org/2002/09/wbs/1/tpac2012followup/results)

[0] -
[https://twitter.com/hillbrad?lang=en](https://twitter.com/hillbrad?lang=en)

------
finid
Based on test results using Panopticlick given at
[http://linuxbsdos.com/2015/12/18/trying-to-prevent-
browser-f...](http://linuxbsdos.com/2015/12/18/trying-to-prevent-browser-
fingerprinting-the-odds-are-against-you), it appears so, though using Tor
Browser Hardened tilts the odds in your favor slightly

------
taf2
An interesting topic I think it's easy to forget is that at least with
browsers we have the option to study and understand how the browser can be
used to track us. For all the pro app crowd the tracking capabilities of a
native app is not only far superior it also requires significantly more work
to understand. It is also not a matter of trying to understand 3 or 4 browsers
but rather a near infinite number of apps...

------
CyberDildonics
Has anyone made a VM specifically for this? I would think you could randomly
clock up and down the VM speed to get around the profiling fingerprinting.

------
alfiedotwtf
With CSS media queries, can we finally get rid of the user agent string?

~~~
greggman
No, because too many features are still not easily discover-able. My latest,
whether or not the Web Audio API can analyse streaming audio data. Currently
iOS WebKit and Android Chrome can not but there is no _easy_ way to detect
that :(

~~~
alfiedotwtf
How about falling back to Javascript to detect?

