

With a Few Bits of Data, Researchers Identify ‘Anonymous’ People - aficionado
http://bits.blogs.nytimes.com/2015/01/29/with-a-few-bits-of-data-researchers-identify-anonymous-people/?_r=0

======
dang
Discussed yesterday:
[https://news.ycombinator.com/item?id=8970129](https://news.ycombinator.com/item?id=8970129).

------
jaynos
Published study for those who want it.
[http://www.sciencemag.org/content/347/6221/536.full.pdf](http://www.sciencemag.org/content/347/6221/536.full.pdf)

~~~
chubot
So basically it's saying that removing the names and addresses isn't
sufficient to anonymize a data set. Didn't we already know that 10 years ago?

They are saying you just need to know 4 times and places that a person has
been, and you have a 90% chance of identifying their entire history via the
user_id in the data set. And if you know the price of a transaction they made,
the probability goes up. Is that very surprising?

~~~
DanBC
It's a bit surprising that they get such accuracy from so little information.

The obvious case is a lot of information can identify people with greater than
chance probibility.

~~~
chubot
But it requires the attacker to know a lot of information in the first place:
4 time/date pairs.

In other words, if you already know a lot of information about a person, you
can get even more information from the "anonymous" data set. Why is that
surprising?

------
beagle3
You only need 33 bits of information to uniquely identify a living person (and
probably no more than 38 bits to identify any person who ever lived).

Everything provides some information, 0.01 bits here, 3 bits there - e.g. a
bit of information such as "understands english" is already 2.5 bits. It's
just a meter of integrating all those observation into one coherent estimate.

------
jdawg77
Eff.org did similar a while back; also, browsing habits for years have been
clearly a "Fingerprint," same as the keystrokes, depth, etc, were found
recently on a monitor.

Even better, creepier, was the government study 15 years ago that could
_identify people by how they walk_ , only, via video camera / surveillance.

Or, remember the AOL study? That was only IP addresses, and many people were
"Unmasked," this was meta data / search logs, only and identified individuals.
That was a decade back. Seemingly, this article only uses citations that are
0-4 years old, it's been a well trod issue for a while, even in journalistic
circles.

~~~
lepht
I believe this is the EFF project you're referencing, for the curious:
[https://panopticlick.eff.org/](https://panopticlick.eff.org/)

