
When Anonymous Isn’t Really Anonymous - chmars
http://brooksreview.net/2014/01/i-see-you/
======
lotharbot
Over 60%, and potentially up to 87% [0], of the US population can be uniquely
identified just from gender, zip code, and birth date. This isn't all that
surprising [1]. The EFF's browser fingerprinting (through Panopticlick) shows
that a similar percentage of browsers are uniquely identifiable through fonts,
plugins, etc. [2] If you're trying to identify someone, every "common" trait
they have will still eliminate a large number of people from consideration --
gender eliminates roughly 50% of the population, birth state eliminates
85-99.8% of the population (depending on whether you were born in California
or Wyoming), birth date eliminates over 99% of the population and much more if
you have a year, and pretty soon you're not one face in 300 million, you're
one face in a thousand, and then one face in ten, and then you're just you.
And whatever it is that made someone want to try to figure out who you were is
now out in the open.

[0] Depending on whether
[http://www.citeulike.org/user/burd/article/5822736](http://www.citeulike.org/user/burd/article/5822736)
or
[http://www.truststc.org/wise/articles2009/articleM3.pdf](http://www.truststc.org/wise/articles2009/articleM3.pdf)
is more accurate

[1] [http://godplaysdice.blogspot.com/2009/12/uniquely-
identifyin...](http://godplaysdice.blogspot.com/2009/12/uniquely-identifying-
people-by-birth.html)

[2] [https://panopticlick.eff.org/](https://panopticlick.eff.org/) and the
explanation at [https://panopticlick.eff.org/browser-
uniqueness.pdf](https://panopticlick.eff.org/browser-uniqueness.pdf) \--
particularly sections 4 and 5.2.

~~~
yaddayadda
I had a doctor that used first name, last name, birth month, and birth day as
their primary identifiers. I was checking in one year for my physical, and was
asked for these four data points. The next question was about current
medication. The receptionist nurse asked me the question, but before I could
answer, she got a really, really odd look on her face, then started glancing
between me and the computer. Finally, she asked, with a bit of a smirk on her
face, what year I was born. When I responded, she chuckled. Turned out the
only record that came up was for someone 30 years my senior, and on a
medication that only someone that age would be on. The receptionist nurse dug
around in the system and was finally able to find my records, but if she
hadn't noticed that the listed prescription didn't match my apparent age, my
checkup info would have ended up in someone else's record and it could have
gotten messy.

~~~
msrpotus
That seems like it should violate medical records confidentiality rules (I
don't know if it does, but it should).

~~~
yaddayadda
Based on the interaction, I could obviously deduce that there was someone else
with the same four data points that saw the same doctor and was 30 years
older. I don't know if that, in and of itself, is breaking confidentiality.

If they had started entering my data into that person's record and
subsequently realized the error, then they might have had to break some
confidentiality in order to clean up our conjoined record.

------
gamerdonkey
I agree that the things you can do with these large datasets are pretty cool.
However, people ignore the exceedingly high error rates when they try to apply
these techniques to "important" problems (e.g. identifying potential
terrorists).

Take his example of the quiz identifies your home region by dialect quirks.
It's really an exercise in confirmation bias. For those who get an accurate
result, the quiz is amazing, and they tell all their friends. For people, like
me, who were told they grew up across the country from where they actually
did, it's just another silly, easy to forget quiz.

It's the same for the 20-questions device. We're amazed when it guesses the
right answer, but so quickly forget the ones it screws up on.

That's why, when I hear about how we just need a big enough dataset to
identify threats to our nation or accurately predict the stock market, I worry
about that 10-30%.

------
bad_alloc
Something that annoys me about how the NSA scandal is treated even here on HN
is the extreme self-centric approach. Not all 300 million US-citizens are
identifiable by looking at word choice, since not all of them are native
speakers. The 6.7 billion other people on the planet aren't native speakers
either, and probably can't be traced as easily or even be located in the wrong
place (my non-native English was located in New Jersey according to the test
page they mentioned).

Now that the EFF, Doctorw etc. have started their thedaywefightback.org, where
they recommend putting up quotes from Benjamin Franklin and call spying
"unamerican". When Obama told us that the US "are only spying on foreigners"
there was almost no reaction to be found in the US media (while the rest of
the world was rightfully pissed).

Yes, it's about an American agency is spying, but it's spying on everybody and
not only US citizens want to do something against that. It's sad, that the
people raving about how great the internet is are missing this chance to use
_global_ outrage to keep it what it is supposed to be: A free communication
channel for people all over the world. Not just Americans.

</rant>

~~~
dredmorbius
It's worth noting that Doctorow is neither a US citizen nor resident.

------
gojomo
You only need 32.6 bits of information to uniquely identify everyone:

[http://33bits.org/about/](http://33bits.org/about/)

------
amjaeger
Does it really matter if I'm not anonymous online?

~~~
chestnut-tree
_" Does it really matter if I'm not anonymous online?"_

Even if you're not anonymous online, consider the following scenario:

You visit website A; your're not anonymous and that's fine for you.

You then visit website B. Again you're not anonymous and that's fine too.

Then you visit website C, followed by website D.

Again, you're not anonymous and that's fine because you know that website A
doesn't know you visited website B and website B doesn't know that you visited
website C.

Website C does know that you visited website D, but it doesn't know what you
did once you got there. So although you're not anonymous online, you don't
feel like someone is watching over your shoulder looking at everything you do.

But unknown to you (or maybe even known to you) company X has a little bit of
analytics code on each of these websites and does know that you visited
website A, and website B, and also website C, D, E, F, G, H, I, J, K, L and
more.

So maybe it's a question of the degree to which you can be anonymous online.
On individual websites, you might be fine not being anonymous, as long as
those websites don't know what you're doing elsewhere on the web. But is
anonymity important if you know everything you do online is being tracked and
joined together from all your disparate journeys?

