

Netflix Prize 2: (Privacy) Apocalypse Now? - lnguyen
http://arstechnica.com/tech-policy/news/2009/09/netflix-prize-2-privacy-apocalypse-now.ars

======
randomwalker
As one of the poeple who broke the anonymity of the first Netflix Prize
dataset, here's my take on why this is an issue we should worry about.

First, some links: Paul Ohm's paper on the consequences of re-identification
for privacy laws and public policy:
<http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006> Ohm's article
about Netflix Prize 2: [http://freedom-to-tinker.com/blog/paul/netflixs-
impending-st...](http://freedom-to-tinker.com/blog/paul/netflixs-impending-
still-avoidable-multi-million-dollar-privacy-blunder) Our FAQ about the re-
identification attack on the first dataset:
<http://www.cs.utexas.edu/~shmat/netflix-faq.html>

An excerpt from our paper, showing the kind of information you can learn about
a person just from movie preferences:

 _First, his political orientation may be revealed by his strong opinions
about “Power and Terror: Noam Chomsky in Our Times” and “Fahrenheit 9/11,” and
his religious views by his ratings on “Jesus of Nazareth” and “The Gospel of
John.” Even though one should not make inferences solely from someone’s movie
preferences, in many workplaces and social settings opinions about movies with
predominantly gay themes such as “Bent” and “Queer as folk” (both present and
rated in this person’s Netflix record) would be considered sensitive. In any
case, it should be for the individual and not for Netflix to decide whether to
reveal them publicly._

A quote from Paul Ohm on why this is problematic even if no one cares about
movie privacy:

 _The "accretion problem" is this: once an adversary has linked two anonymized
databases together, he can add the newly linked data to his collection of
outside information and use it to help unlock other anonymized databases. ...
Because of the accretion problem, every reidentification event, no matter how
seemingly benign, brings people closer to harm. Had Narayanan and Shmatikov
not been restricted by academic ethical standards (not to mention moral
compunction), they might have connected people to harm themselves._

He then goes on to postulate the "database of ruin":

 _It is as if reidentification and the accretion problem join the data from
all of the databases in the world together into one, giant, database-in-the-
sky, an irresistible target for the malevolent._

At a first reading, this might all sound like science fiction, but it's a lot
more plausible than most people think. On my blog <http://33bits.org/> I have
several more examples of re-identification. And let's not forget that there
are companies such as Acxiom and Choicepoint which already specialize in
aggregating every available piece of information about people, and then
selling it.

~~~
jonmc12
"...one, giant, database-in-the-sky, an irresistible target for the
malevolent."

I'm not sure, but is this such a bad thing if this giant database is open? I
mean, the most incentivized and most malevolent of people can already buy my
information - and they will continue to figure out how to buy more of it as it
becomes available.

The important thing to me personally, is that if a company has data about me,
they will let me see what it is, and then allow me to correct and (in some
cases) redact this information.

I guess what I am saying - put that database in the sky so we can all see it.
Then once the info gets tracked back to my identity, have it subject to some
kind of regulation so that I can reasonably control what is available. The
database should not have any trouble finding my contact information to tell me
that it knows something about me.

------
dfranke
My contest entry:

1\. Use the given demographic data and LexisNexis to deanonymize the data.

2\. Search Facebook, Twitter, message boards, etcetera for movies that they've
talked about but haven't rated yet on Netflix.

3\. Identify keywords in these posts that indicate positive or negative
sentiments.

4\. Recommend the movies that they've spoken favorably about.

~~~
JacobAldridge
Or just track down their phone numbers, call them up, and ask them for help.

"Hi, this is dfranke. Netflix gave me your details ... kinda ... and I'd like
to know what you thought of Napoleon Dynamite?"

You could even split the million bucks, and buy them all lattes or something.

~~~
dfranke
I am so submitting this proposal. In cartoon form.

------
jrockway
Sounds like this problem could be eliminated if Netflix simply asked if it was
OK to reveal your choices, perhaps in exchange for a free month of service or
something. I would be happy to publish my movie preferences -- if someone
wouldn't like me because of the movies I've watched and liked, I don't want to
know them anyway. (Same goes for anything else; hire me, sleep with me,
whatever.)

~~~
frossie
I agree. I think there are really two levels of privacy. I would like to keep
my Netflix preferences private in the sense of not posting it on my website
(or Facebook or whatever). But, if someone was interested in sinking enough
effort to deanonymise me so that they can embarrass me by revealing that
despite my sophisticated world-cinema sub-titled exterior I did, in fact, rent
Mamma Mia last weekend - well, I would just pity them for not having a life.

In other words, privacy through obscurity is probably enough for most people
in this case, and so asking people whether they want to opt out of the program
is good enough - I think most people would stay in. Netflix can always pick
another person in that age/gender/geographical bucket to replace a person who
decided to opt out.

~~~
jrockway
Ah. I would put it on my personal website... but I don't think anyone would
actually care enough to read it or investigate it. With 6 billion people on
Earth, why would anyone care which movies jrockway watches?

------
jpwagner
Does anyone care to keep their movie preferences private?

(Let alone the fact that this groups people broadly into demographic clusters
and "preferences" are defined vaguely as a rating between 1 and 5.)

~~~
wmf
Some people would be embarrassed to admit that they watched certain movies.
Also, maybe Netflix can tell if you're gay.

~~~
Shamiq
It's a shame that being gay is embarrassing.

~~~
sofal
There are reasons for wanting to hide that you're gay that don't come from
feeling ashamed. For example you could have close friends and family members
who have severe homophobia, but whom you actually love and respect (people are
multidimensional). It may or may not be inevitable that they'll find out at
some point, but you'd rather have some control of when and how. Throwing up
your hands and yelling "if you don't accept me for who I am then screw you!"
is not the best way to handle these things.

I'm not trying to sidetrack the conversation into gay issues. This applies
without loss of generality to other personal identity issues that conflict
with one's surrounding culture.

------
rw
This is very convincing. Why did Netflix do it?

~~~
jonmc12
well, they haven't actually started the new contest yet, so the netflix legal
dept likely has not put their final stamp of approval.

