
As Data Collecting Grows, Privacy Erodes - peter123
http://www.nytimes.com/2009/02/16/technology/16link.html
======
tokenadult
"The way Mr. Rodriguez’s positive steroid test result became public followed a
path increasingly common in the computer age: third-party data collection. We
are typically told that personal information is anonymously tracked for one
reason -- usually something abstract like making search results more accurate,
recommending book titles or speeding traffic through the toll booths on the
thruways. But it is then quickly converted into something traceable to an
individual, and potentially life-changing."

That's troubling even for someone who is not an A-list celebrity like A-Rod.

~~~
akd
If it troubles you, it's still very easy to avoid being surveilled by most of
these systems. Pay cash instead of using a credit card. Don't use EZ-Pass.
Turn off your phone when you're not using it. Yes, it is inconvenient, but
trading privacy for convenience means you don't value it very much. People got
along fine before EZ-Pass came along.

Until we have mandatory legislated tracking devices in cars, etc. (not a far-
fetched idea at all, and something that must be fought), you are largely the
keeper of your own privacy.

~~~
Harkins
In Illinois, you pay twice as much for tolls without EZ-Pass; a literal cost
of privacy.

And it's getting significantly harder to avoid these kinds of surveillance.
You forgot to mention: never have a utility bill, or a mortgage, or a car...

"Do it yourself" is no longer an effective option.

------
aneesh
Most of us recognize the value of having such detailed data, and also the
consequences of it (like in this article). I would prefer not to have to
"trust" the company storing the data, and I think leaks like this make it
clear this can't be the approach in the future. So how can this problem be
solved from a technology perspective?

I'm aware of k-anonymization, where you change the data to make it
mathematically impossible to identify an individual data point more
specifically than among k entries. So for example

    
    
      Age Weight Disease
      18 150 Cancer
      45 203 Diabetes
      37 197 Heart Disease
    

becomes

    
    
      Age Weight Disease
      * * Cancer
      [36-45] [195-210] Diabetes
      [36-45] [195-210] Heart Disease
    

for k=2, and it's now impossible to know for sure the disease of the 45-year-
old, whereas you could deduce that information from the earlier records.

Another approach, used by some major search engines for ad-targeting, is to
insert random noise into the data.

What other solutions are there? If you build software that captures sensitive
data, how do you deal with it?

~~~
pierrefar
One set of data is not the issue.

It's when you start compiling knowledge (I almost want to say "evidence") from
multiple data sets that privacy is really invaded. Couple your anonymous
dataset with the search histories of the people involved, and you'll likely
get your answer.

Or couple it to their travel info to see which hospital are they likely to
have visited so regularly - does the hospital have a specialist diabetes or
heart disease unit?

------
aneesh
I remember a research talk where the speaker said that, empirically, there are
3 reasons consumers give up their privacy: convenience, (the perception of)
security, and money.

