
Website data leaks pose greater risks than most people realize - tonicb
https://www.seas.harvard.edu/news/2020/01/imperiled-information
======
ThePhysicist
Most companies still don’t know what anonymization means and confuse
anonymized with pseudonymized or masked data.

Part of the problem is that there are still no good criteria available to
define anonymity. Concepts like differential privacy are a step in the right
direction but they still provide room for error, and in many cases they are
either too restrictive (transformed data is not useful anymore) or too lax
(transformed data is useful but can be easily re-identified).

~~~
ravenstine
It's not that most of them don't know what anonymization is or are confused
about it.

Society is a tapestry of bullshit and low-level swindling is generally
tolerated or quickly forgotten about. Thus, there's nothing to prod the
unprincipled in charge to do the right thing. As long as something _seems_ to
be good(anonymized, in this cage), and problems can be hidden behind the
corporate veil long enough, the unwritten rule is to half-ass security
solutions because, well, security is boring and there's other things to devote
company time and resources to(that will advance upper management).

Security measures, especially those that protect the users, don't make money.
At best, they're insurance against the fallout that might occur when it's
revealed that your company has been silently screwing people over. Like most
human beings, businesses often put off serious consideration of the future in
order to enjoy quick and immediate gain.

I wouldn't put it past most companies to screw up an approach like
differential privacy. Not enough people actually care that much.

~~~
dfxm12
_Security measures, especially those that protect the users, don 't make
money._

This is why the government has to make regulations with teeth in this space
(of course, the government could be the "unprincipled in charge" you referred
to).

~~~
ravenstine
> of course, the government could be the "unprincipled in charge" you referred
> to

Not specifically, but I suppose I wouldn't say that politicians are more or
less principled than corporate executives. I know some would argue otherwise,
but I'm too black pilled at this point to have faith in any "public servant".

Nevertheless, government regulation is probably the way to actually address
these issues. Government may lock competence or will, but at least it provides
us _some_ leverage, little it may be.

------
inciampati
Differential privacy provides a system that can allow the sharing of databases
without allowing an external observer to determine if a particular individual
was included.

If companies were required to aggregate information in this way and throw away
their logs, perhaps leaks would be much less risky for their users.

Today this might seem far-fetched, but it could come to pass in the future,
when people raised in this environment and able to understand the implications
and technical aspects come to political power.

[https://www.cis.upenn.edu/~aaroth/privacybook.html](https://www.cis.upenn.edu/~aaroth/privacybook.html)

[https://en.wikipedia.org/wiki/Differential_privacy](https://en.wikipedia.org/wiki/Differential_privacy)

~~~
bostik
Differential privacy provides a lot less protection than you would think (or
want to believe). A few months ago I saw a talk by E. Kornaropoulos, about his
paper "Attacks on Encrypted Databases Beyond the Uniform Query
Distribution"[0].

The main take-away from the talk - an in fact all the talks I saw on the same
day - was that while DP is touted as a silver bullet and the new hotness, in
reality it can not protect against the battery of information theoretical
attacks advertisers have been aware of for couple of decades, and intelligence
agencies must have been doing for a lot longer. Hiding information is _really
hard_. Cross-correlating data across different sets, even if each set in
itself contains nothing but weak proxies, remains a powerful deanonymisation
technique.

After all, if you have huge pool of people and dozens or even hundreds of
unique subgroups, the Venn-diagram-like intersection of just a handful will
carve out a small and very specific population.

0: [https://eprint.iacr.org/2019/441](https://eprint.iacr.org/2019/441)

~~~
DarthGhandi
Australian government released "anonymised" healthcare data to researchers.
Within months a good chunk of it was deanonymised, including celebrities and
some politicians themselves.

There's a lot of privacy snakeoil out there and even large govt departments
fall for it.

[https://pursuit.unimelb.edu.au/articles/the-simple-
process-o...](https://pursuit.unimelb.edu.au/articles/the-simple-process-of-
re-identifying-patients-in-public-health-records)

~~~
sroussey
This has happened with NIH data in the US as well. There is a preprint
available.

------
mjevans
I've considered how I would like E.G. GPS / driving apps to anonymize data.

For freeways, lots of small segments, and fuzzing of timestamps to co-mingle
users. Where there's a stoplight snap the intersection cross-time to the green
light (guess) for anyone in the queue.

The anonymity would come from breaking up both requests and observed telemetry
to fragments too small to tie back to a single user or session (and thus form
a pattern; I hope).

Do NOT record end-times, only an intended route. Do NOT associate that
movement to any particular user or persistent session (ideally in memory on
the mobile device only, not saved: though it could save favorite routes
locally). Packages of transition times between various freeway exits would
generally help add to anonymity.

That would also be part of generally improving the UI for the user. The
application on the device should be making most of the decisions, by asking
about the traffic in a given region on a grid. I also want it to show me (the
driver) the data (heatmap) on the rejected routes so I know what isn't a good
option.

------
redis_mlc
Largely true, but there are HHS rules and guidelines that are accepted in the
US healthcare space:

[https://www.hhs.gov/hipaa/for-
professionals/privacy/special-...](https://www.hhs.gov/hipaa/for-
professionals/privacy/special-topics/de-identification/index.html)

~~~
kube-system
HIPAA data is not immune to a data leak... not even the organization that
wrote those guidelines are immune:

[https://www.deccanchronicle.com/technology/in-other-
news/201...](https://www.deccanchronicle.com/technology/in-other-
news/201018/us-cms-says-75000-individuals-files-accessed-in-data-breach.html)

There's tons of PHI on the internet. Your local hospital's online medical
chart, your insurance companies bill-pay, etc...

------
SiempreViernes
The title refers to _claims by marketing companies_ that they have
appropriately anonymised the data, and is not an attack on the concept of
anonymisation itself.

------
akavel
What does "computer science concentrator" or "statistics concentrator" mean?
It's a first time I see such a title (?)

~~~
hwbehrens
Harvard calls their fields of study "concentrations", not majors [0]. Thus, a
CS concentrator is an undergraduate student who is majoring in CS.

[0]:
[https://en.wikipedia.org/wiki/Academic_major](https://en.wikipedia.org/wiki/Academic_major)

------
ComodoHacker
Students have found data enrichment techniques exist and can be effectively
applied to breach datasets. Good for them.

~~~
ghostpepper
Yeah, I was a bit surprised when I read this was a project for a first year
course Privacy and Technology (CS 105). I don't see it being reported anywhere
other than Harvard's own website.

------
ansmithz42
I think this should be sent to the government officials that they were able to
find in their research, it might get them to wake up and stop treating it so
lightly.

------
lwb
Relevant XKCD: [https://xkcd.com/792/](https://xkcd.com/792/)

------
kache_
Is it just data leaks? How about Google's reports on how busy a certain area
is (restaurants, malls)? That is pretty much telling a potential terrorist the
optimal time to target an area. We leak data everywhere, and all we need is a
single bad actor to utilize it for a catastrophe to occur.

