
How will the GDPR impact machine learning? - jonbaer
https://www.oreilly.com/ideas/how-will-the-gdpr-impact-machine-learning
======
gerdesj
Well, you have to hand it to the EU (Brit here): privacy has been legislated
for in a way that is absolutely breathtaking. The GDPR is the first decent
effort to curb the dreadful abuse of privacy that many massive multi nationals
see as their right (omm nom nom data).

I am extremely proud to be associated with the GDPR (I'm still a European for
now). It is an absolute belter of a set of regulations. If you read it, it is
actually pretty concise for a legal thingie. It is also very prescriptive
which is pretty odd for a legal thingie. It is up there with the "quietly
enjoy your own property" basic right that is sort of enshrined in English Law
nowadays (IANAL but a search would tell you what I'm on about).

If complying with GDPR is considered a problem then good luck with monetising
that weakness.

~~~
henrikschroder
Agree.

Not very long ago there were a bunch of articles posted on HN about how ML and
deep neural networks produce knowledge, without humans being able to explain
all the steps or any of the steps, and asked the question if this knowledge is
still good.

The GDPR answers this question for business decisions regarding humans: No.
It's not good enough. Your business is absolutely allowed to use ML for
business decisions, but if a person asks how you reached that decision, you
can no longer hide behind the machine, you can't treat it as some kind of
oracle in a black box that spits out correct answers without any human knowing
how the mind of the oracle works.

And I think that's an important foundation to have going forward. It's like
how math tests work, you can't just print the correct answer, you have to show
your work as well. Because the alternative is a dystopian "the computer is
never wrong" future, and that's gonna suck for everyone.

~~~
dogma1138
You do not need to say how you reached a decision you at most may be required
to identify which information was used to reach the decision and how it was
obtained.

Businesses may be required to state the reason why a decision was made under
other regulation but in those cases those are essentially one liners that do
not provide any real explanation e.g. credit score does not meet our treshold.

There is however a real problem with GDPR and ML and by real I mean it’s not
you’re getting fined out of business but that no one has a good answer for it
and that is if I use your information to train a model and you then tell me to
stop and delete all your information do I need to retrain my network? (While
most would say no it’s not that simple)

Because for example there are other questions like does a trained model
constitutes anonymization or not? And again before you say “yes” there are
ways to reconstruct the training data from models e.g.
[http://www.deeplearningbook.org/contents/autoencoders.html](http://www.deeplearningbook.org/contents/autoencoders.html)

There’s also the issue of using the data for decision making once a request to
stop processing has been issued primarily does processing happens only during
training or also during inferencing?

~~~
henrikschroder
> There is however a real problem with GDPR and ML [...]

You are pretty much just repeating the questions and issues from the original
article now. It addresses them better than I can.

~~~
dogma1138
I’ll go read the article then :)

But again the fact that a blog post says X isn’t enoguh and while I have no
doubt that eventually sense will prevail it still doesn’t fix the current
problem and that is that the lack of prescriptive guidance from the 28 DPAs is
problematic because it’s not a small risk.

What people don’t seem to understand is that most people who have problems
with the GDPR don’t have problems with its principles but rather with the fact
that it has been rolled out without almost any clear guidelines and rulings.
If you can’t be 100% sure that your business model is compliant with the GDPR
it can be a pretty big risk to take on and this ambiguity is much more
disruptive than the requirements themselves.

BTW the machine learning problem isn’t unique to GDPR, HIPAA also has this
problem most companies elected to classify their trained models as patient
data to be on the safe side unless they were only trained with public
information.

------
narrator
China is not going to care. Their complete obliviousness to privacy has made a
lot of their machine learning projects move ahead at a breathtaking pace. I
was at a presentation by a Chinese surveillance camera company last year and
the amount of automated behaviors they can recognize that allow automatic
crime detection, etc. is astounding! All the data processing will probably end
up running through China as Chinese courts will ignore any GDPR violation
fines for local companies, etc.

~~~
gerdesj
Why on earth would a Chinese court consider GDPR? GDPR is a EU thing.

If you want to do business with the EU then GDPR will probably apply in some
way but not for say US to China relations.

However the GDPR is a damn fine set of standards to live up to - bear in mind
that you personally are a person. Would you not want your rights as an
individual protected in the same way. Go on ... live a little (and be decent).

GDPR may be what everyone needs.

~~~
biztos
I don't think it matters whether a Chinese court would consider it, the point
is that if a Chinese company is doing business in the EU, or to any
significant extent with EU nationals, then GDPR is an issue for them. Under
threat of fines and worse for their EU subsidiaries.

No idea how much that's really an issue, but I'm pretty sure lots of people
order from AliBaba in the EU (I have).

------
tw1010
If I ignore GDPR for the next year or so, waiting until this all simmers down
to a level of abstraction below the surface level I care about, will I be any
worse off?

~~~
zitterbewegung
I think that Jacques Mattheij has some thoughts about this that you can think
about it in the abstractions that he gives. Note that IANAL and YMMV.

EDIT: Also Jacques is not a Lawyer either.

1: [https://jacquesmattheij.com/gdpr-
hysteria](https://jacquesmattheij.com/gdpr-hysteria)

~~~
oldcynic
The most rational piece on GDPR I've yet seen.

So of course you are being downvoted.

~~~
zitterbewegung
I have been downvoted for both reasons on HN. I don't feel bad about it at all
I think its funny.

------
downandout
Companies outside of GDPR-land _will_ have a competitive advantage over those
inside of it, especially in the area of machine learning.

From the article: _"...one of the first major distinctions the GDPR makes
about ML models is whether they are being deployed autonomously, without a
human directly in the decision-making loop. If the answer is yes—as, in
practice, will be the case in a huge number of ML models—then that use is
likely prohibited by default."_

If users explicitly consent to it, then it's permitted under GDPR, but what
percentage of users are going to do that? Most will be hitting the "decline"
button by default on all websites, even if a given application of ML will be
beneficial to them.

This is a showstopper for most ML in the EU. If you have an ML startup in the
EU....either close up shop or move.

~~~
jimnotgym
If you are using ML to make automated decisions for say insurance risk then
you will have to say so explicitly. There is a pay-off for a customer to
agree, and that is convenience. If all insurance companies are doing this then
a customer will likely have to consent, or not get a fast insurance decision
out of hours that they seem to want. I think the big difference under GDPR is
that the user will be able to request that the decision is made by a human,
but if this is the case they will not get to set up their insurance at 10PM
for tomorrow morning like they can in the automated system. It is completely
reasonable that a customer would have to wait a couple of days for this.

In practice I wonder if insurance companies (for instance) are actually doing
a live ML evaluation on customers, or do they just use ML to develop some
'bandings' that they fit you in to? They could still do this of course,
because they can use anonymous data to build a model.

I don't see where the competitive advantage you mention lies? Yes a company in
the US could use ML on US citizens, but not EU ones. Presumably a company in
France could set up a server in AWS US-West region and do ML on US citizens,
but not EU citizens? Surely it is a matter of where the data-subject lives,
not where the company is based?

~~~
downandout
_I don 't see where the competitive advantage you mention lies?_

Well, if you can't perform ML on data about the population that is most
accessible to you, you are at a competitive disadvantage to those who can. I
can't think of many EU companies that control large amounts of data on US
citizens. Though I'm sure they exist, there are _many more_ US companies that
would have larger volumes of data on US citizens.

~~~
jimnotgym
So you are arguing that US firms operating in the US are in competition with
UK firms operating in the UK? I don't see how they are in competition at all?

~~~
downandout
Humans are humans, regardless of what country they live in. If I have a larger
dataset than you, and I can do more things with it, then I can predict human
behavior in a given situation better than you can. That gives me a competitive
advantage over you. I may be able to then go compete with you in the EU
without ever having to violate the ML provisions of GDPR, because I know what
humans - whether in the EU or US - will do.

Granted, there are some country-specific behaviors that this will not apply
to, but then nobody will be finding out what those are in the EU because it's
now illegal to do there.

------
andrenotgiant
I want to live in a world where the individual has control over their data,
but I don't think GDPR will help create that world.

Imagine you are a company that makes $500k+/year off of user data. You are not
just going to stop doing it because of GDPR, you are going to spend up to
$500k on lawyers figuring out how to get around it.

~~~
gkya
That world will come around when a majority of people will understand that
running untrusted code on your computer is just madness (i.e. JavaScript). I
wish some mainstream browser like Firefox created an easy UI for JS
whitelists, and/or a dialog box asking consent for JS execution per FQDN, like
it does for location etc. Among the big players, only Google has major reasons
to not do such a thing, but FF, IE and Safari can try.

------
MichaelMoser123
Is it possible to cover GDPR requirements with another line in an EULA that
specifies consent?

~~~
tombone12
No, you have to ask and receive consent explicitly for the processing you want
to do, not as part of some larger package.

~~~
MichaelMoser123
a modal pop-up dialog asking for consent will be required?

~~~
erichocean
yes, that's the intent of the law

------
outside2344
I can't wait to issue denial of service requests to every EU company when it
comes into effect. I am going to send a nightmare letter to all of them.

------
zerostar07
Does GDPR apply to animals too? I mean if its going to be hard to manage human
data, perhaps replacing some with monkeys would be OK.

~~~
tombone12
GDPR applies to "natural persons", which I think mostly excludes animals (I
base this on us killing animals all the time and it doesn't count as murder).

