
Pseudonymisation is helping firms comply with GDPR - tormeh
https://www.economist.com/news/science-and-technology/21740165-stripping-our-identifying-information-they-are-still-able-do
======
jypepin
is pseudonymisation really a new thing? We do this with prod data for our dev
and staging database. a subset of a dump is processed and names, emails and
other PIIs are replaced with random strings, etc.

Not only it makes the data handling safe and anonymised, you also avoid crazy
stuff like mistakenly sending a batch email to prod users while you are
testing stuff in dev/prod (been there, done that).

I found that clever when I first saw we were doing that, but it seemed simple
enough that I just assumed every company did it.

~~~
roel_v
It's not as simple as that. What's the k-anonymity on your datasets? I'd be
mightily impressed if anyone at your company had ever calculated it, yet
'pseudonymisation' is meaningless without quantitative assessments of its
results.

From what I've seen in research, when you get an 'anonymized' dataset from
e.g. a government institute, hospital or school, someone will have replaced
the names in the 'Name' column with 'Subject 1', 'Subject 2' and so on, and if
you're lucky they'll have removed the DoB column. How many people are there in
the average organization who are even qualified to have an opinion on whether
a dataset is sufficiently anonymous for a certain purpose?

The first few years of GDPR lawsuits are going to be about obvious things,
hopefully we'll get to see a few more interesting ones about stuff like this
once that basic stuff is settled :)

~~~
saltcured
And, if you are interested in ethics rather than loopholes, things like
k-anonymity are not properties of datasets but of datasets with respect to an
assumed corpus of other information. You have to consider how this new data
could be integrated with other previous or future data releases and what level
of anonymity remains after that integration.

Eventually, if you work this analysis through sincerely enough, you will
probably abandon all hope of maintaining k-anonymity in real data management
practices. Then, you face your real ethical decision as you either drastically
restrict the data or turn to lawyers and loopholes to assuage your guilt as
you continue with the naive levels of protection demanded by your application
and organization...

------
tephra
I think it is worth noting that pseudonymization is not just this big loophole
in the GDPR and pseudonomyzed data can still be considered personal data and
fall under GDPRs jurisdiction.

Pseudonymisation != Anonymization. And as the article 29 working party has
concluded [0] might sometimes not be sufficient to protect users privacy.

[0]
[http://ec.europa.eu/justice/article-29/documentation/opinion...](http://ec.europa.eu/justice/article-29/documentation/opinion-
recommendation/files/2014/wp216_en.pdf)

------
neonate
[http://archive.is/e40uP](http://archive.is/e40uP)

------
pcunite
_The result is a new set of data that contains no personal information, but
retains the format and statistics of the original. The only way that each
field in the new data set can be returned to its old state is by applying the
key used to generate the hash_

 _these keys are held by the accounts teams. The development teams working on
the pseudonymous data never see them_

Right ... but I would feel better if _I supply_ this hash/key back to them. I
understand I can request erasure, but I would like the option to request
"hashed" (or a user friendlier term) when I want to keep my data on their
server, but I control it.

~~~
jypepin
You supply this key? So you are implying that you store the key and the
company doesn't have access to it, hence doesn't have access to your data?
Then is there any reason for the company to store your data?

What if you loose the key? You can't restore your account? I'd be interested
to know the % of users loosing their password and needing to use "forgot
password" feature. How would that work with a key you own?

