
Will GDPR Make Machine Learning Illegal? - Vaslo
https://www.kdnuggets.com/2018/03/gdpr-machine-learning-illegal.html
======
ThePhysicist
Personally, I think many people and organizations currently exaggerate the
possible changes and risks (for companies) brought on by the GDPR, often with
the goal to instill fear (and generate business) or get press coverage.

The relevant article ([https://gdpr-info.eu/art-22-gdpr/](https://gdpr-
info.eu/art-22-gdpr/)) clearly states that automated decision making is
allowed under the GDPR, it just gives the data subject the right to demand a
manual assessment of that decision. In practice, this means that you're
absolutely free to use any kind of automated algorithm (AI-based or not) to
make decisions on individuals, but if people complain and demand an
explanation you need to review the decision manually.

Addendum: If you want to learn more about the motivations behind article 22 as
well as articles 13 & 14 (that concern the transparency requirements), I
highly recommend the official document of the Article 29 working party:

[http://ec.europa.eu/newsroom/article29/item-
detail.cfm?item_...](http://ec.europa.eu/newsroom/article29/item-
detail.cfm?item_id=612053)

~~~
montrose
If you have to be able to review any decision manually that people complain
about, that is significantly limiting. It means you can only write software to
do things humans can do.

I don't think it's alarmist to worry about the GDPR. It's an extremely far-
reaching law covering types of technology that are changing very rapidly. Our
default reaction should be skepticism, because the default outcome of that
combination is disaster.

~~~
raverbashing
> to review any decision manually that people complain about, that is
> significantly limiting. It means you can only write software to do things
> humans can do

So are you happy with a black box that just says "computer says no" without
any explanation or way to review this decision?

"Sorry, your insurance premium is now 300EUR higher than next year, computer
says so"

~~~
geocar
A better example: An insurance claim might use a sophisticated anti-fraud
algorithm to reject claims, but they actually have to be rejecting claims for
reasons of fraud, and not because "computer says so".

------
matthewmacleod
No.

It may make it illegal, via the aggregate "right to explanation", to make
decisions about people solely on the basis of machine learning models. That
is, "we fed some data into a computer and it said 'no', so too bad" is not
going to be a valid explanation.

That sounds like a fucking amazing outcome to me.

~~~
John_KZ
But unfortunately it's very easy to find a way around it: Make all decisions
based on ML models, and only when a customer manages to go through the
bureaucratic, expensive process of requesting an explanation, have a human
review the case and come up with a plausible reason.

It's definitely a whole lot better than not having that law, but we're not
quite there yet.

~~~
protomyth
> only when a customer manages to go through the bureaucratic, expensive
> process of requesting an explanation, have a human review the case and come
> up with a plausible reason.

That doesn't sound exactly legal as the actual reason had to exist at the time
of the decision and an after the fact reason would not be the truth.

Don't the ML models log why they made decisions in the first place?

~~~
dogecoinbase
The point of interpretability in ML is that the "reasons" that model arrived
at a decision/classification/etc are incomprehensible (and, given e.g. the
existence of adversarial examples in image classification, this should be
expected -- no reasonable answer exists for "why" the model thought this
static was a cat).

~~~
philipodonnell
I have heard approaches described that train a simpler ML algorithm to predict
why a complex ML algorithm made a decision. Decision-tree models predicting
neural net models. Paypal or Square I believe.

~~~
vinn124
how does this solve anything? if a simple decision tree could predict the
outputs of more complex deep nets, why not use the decision tree in the first
place? also, what do you do when a decision tree isnt powerful enough, as in
the case of many interesting problems such as speech, computer vision, etc.

~~~
philipodonnell
The uses cases for this were around fraud detection systems where you need to
provide a reason for the flag.

I don't have to know the complexities of how all the factors are interacting
within a neural net to be able to tell someone that if they increased their
credit score by 100 points, they'd be approved.

------
stochastic_monk
No. This is an absurd victim roleplay. People’s rights to control what happens
to their information is not going to make machine learning illegal. It’ll just
give all users a little more privacy and make selling their data a little less
lucrative.

Besides, there’s been plenty of work into improving interpretibility. Stop
pushing the myth that we don’t understand anything about neural networks and
they’re the biggest, blackest box since Schwarzschild.

~~~
gnud
GDPR might make certain uses of machine learning more difficult.

A hypothetical insurance company that uses ML to decide who gets offered which
rate, might run into trouble when a customer asked for "meaningful
information" about the logic behind the decision.

But for the same insurance company, using ML to flag possible fraud cases, and
then having a human review them, seems not to pose a problem. All decisions
were made by humans, the machines just showed where a decision was needed.

Of course, IANAL.

~~~
disgruntledphd2
Insurance companies are already prevented by law from using methods which
cannot be explained, so they will be fine.

I actually think that most ML will be fine, as the outcomes are trivial. The
particular set of results that Google shows you on Google.com are not a
decision about you, and I can't see those needing to be interpreted.

Anything more consequential (i.e. job hiring) probably will, but also has laws
that prevent the use of indiscriminate data already.

So, I don't see this an issue.

~~~
maksimum
> The particular set of results that Google shows you on Google.com are not a
> decision about you

I think that's easily demonstrated as not true: when I search for something
like `luigi` in a regular window, I get the github link 1st. When I search for
it in an anonymous window the github link is 3rd. Try it for yourself with a
technical term that also has a common non-technical meaning.

I believe that search personalization is useful, but it may be a privacy
issue.

~~~
disgruntledphd2
See I get you, but I disagree. My example is python, whereby biologists and
such-like people get snakes, people hanging out here get the programming
language and so on.

I personally believe (with no evidence) that the number of potential SERPs is
much lower than the number of users, as I reckon they use some kind of
dimension reduction technique and pick vectors of results "close" to yours.

But I completely accept that I could be entirely wrong on this (i still think
such a decision isn't consequential enough to trigger any GDPR provisions, but
again I could also be wrong on that :) )

------
singron
Regardless of the whatever the GDPR actually says and even if ML is a complete
black box, I would be very happy at least knowing my input features to ML
algorithms.

e.g. If I'm denied a loan, I want to learn the various "facts" about me that
led to the ML coming up with the denial. Then I can fact-check it (e.g.
"actually I don't already have $X of debt") and come up with an appeal.

~~~
maxxxxx
Agreed. We are building up this level of faceless bureaucracy the more
automated algorithms with multiple inputs we are using.

I know someone who had to deal with identity theft and it's a sheer nightmare
to figure out why you get denied a rental car or airplane boarding. The people
facing the customer know that the person is blocked but don't know why and
have no way to figure out why.

------
hn_throwaway_99
What's that saying about "The answer to any headline that is posed as a yes-
or-no question is 'no'."

~~~
mitchty
Betteridge's law of headlines I believe.

[https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...](https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines)

------
tzs
In the case of things like being rejected by a machine learning algorithm for
a mortgage, it should be pretty easy to automate coming up with an explanation
for the applicant.

Run the application again with $5k higher income. Keep raising income by $5k
until you find an income level that results in approval. If that income level
is reasonable for people in general who were approved for similar mortgage
amounts and whose other inputs are in the same ballpark as the rejected
application, tell the applicant the rejection was for income too low.

You can do a similar thing with other parameters, such as credit rating,
length of employment, and whatever else is input to your algorithm that is
under the applicant's control. In some cases you might not be able to find a
good single parameter tweak for approval and will have to resort to
combinations. There will usually be many different ways then to tweak for
approval. Use comparisons to similar applications that were approved to pick a
reasonable combination of tweaks to base the rejection explanation on.

~~~
paulddraper
The only reason I could see this wouldn't work is your explanation had to be
"good" one.

E.g. if you lived 10 miles to the west, had the same income, had a mortgage
before 2005, and didn't apply for credit between Sept 2010 and June 2011, then
you would have been accepted. (There are other situations that would have
resulted in acceptance; this is just one.)

------
emilfihlman
I find it extremely concerning that people, especially on this platform, don't
care about what the law says and choose to just hand wave it as "it doesn't
apply to us / they don't enforce it like that".

The letter of the law is even more important than the spirit of it because it
allows governments, corporations and other entities to simply say "this is
what the law says" and because you are small you don't have the money/power to
challenge this rendition if the spirit of the law disagrees with them. Not to
mention that laws are _laws_, they are not meant to be really interpreted (it
is a byproduct of having to do so to have a functioning society).

It's an extremely slippery slope to not care what the law says. If we don't
we'll have some dystopian laws that allow for anything quite soon.

------
polskibus
I have a related question - can EU citizen under GDPR demand that a company
"unlearns" all of their machine learning models from his/her data? I'm not
asking if this is technically feasible (you could do that by removing old
model, removing user data from training set and then reruning learning). I'm
asking whether GDPR will give citizens power to demadn that.

~~~
amarkov
GDPR only requires erasure of personal data on request, where "personal data"
is defined as information relating to some identified or identifiable person.
Machine learning models trained on personal data aren't going to be personal
data themselves.

~~~
yxhuvud
Well, unless they can be used to identify the person.

~~~
emilfihlman
But we are (reminds me of black mirror...) entitled to having knowledge and
memories. Corporate entities will start to claim that you cannot ask us to
remove our memories (for example, identifying people in images).

------
JepZ
Nobody else has a problem with the mix up of 'Machine Learning' and 'Deep
Learning' here?

Last time I heard of those terms, 'Machine Learning' was a much broader
category than 'Deep Learning'. Its like someone says 'Next years Porsches will
be illegal' and the next person calls 'Next year cars will be illegal'?!?

------
akshatpradhan
It's in your best interest to reduce your risk by going through a De-
identification process for data collected:

>De-identification is adopted as one of the main approaches of data privacy
protection. It is commonly used in fields of communications, multimedia,
biometrics, big data, cloud computing, data mining, internet, social networks
and audio–video surveillance.

[https://en.wikipedia.org/wiki/De-
identification](https://en.wikipedia.org/wiki/De-identification)

------
petercooper
Rather than explaining how the machine learning _system_ works as an
"explanation", perhaps auditing could be added to machine learning algorithms
so that you very much _could_ produce a very long but accurate description of
the _process_ , a la pages full of "X was compared to Y, X was larger, and
thus we move on to step 261". A bit like disassembling machine code.

Of course, in machine learning, the datasets backing up the comparisons could
not be shared as they contain variations of confidential and personal data,
but you might still end up with a legally tolerable record of the algorithmic
steps involved in the decision making process even if they're not useful to
see.

~~~
piqalq
I see what you’re getting at, but this would be a very lengthy and arduous
process. Not to mention, many ML and DL algorithms are incredibly
mathematically complex that describing them with literal step-by-step detail
sounds like hell.

~~~
petercooper
It would be if you had to be involved, but I'm suggesting algorithms could
have some sort of instrumentation so such "explanations" could be
automatically generated and thrown into a data warehouse for possible future
use. (This is all a cynical attempt to meet legal requirements rather than
anything actually useful for the user, of course.)

------
gaius
Serious question: is there a web based application of ML that isn’t about
exploiting or deceiving customers/users?

ML has its legit uses sure, but it always seems to be used to outwit people in
practice

~~~
amarkov
Well, I mean, what counts as exploitation? If I learn a lot about what shoes
people like in my job as a shoe salesman, and I use that experience when new
people come in to make recommendations, I think everyone would call that just
good customer service. But when Amazon sets up an ML model to do the same
thing, many people argue they're exploiting your personal data for the sake of
profit.

Off the top of my head, I can't think of a widespread web-based ML application
that some people don't consider exploitative. But is that a property of the ML
or the people?

~~~
gaius
I’d argue that “people who bought X also bought Y” is too simplistic to be
considered ML. But say, charging a little extra for X to people who previously
bought Y would be exploitative.

------
reificator
As per tradition: No.

[https://news.ycombinator.com/item?id=9232419](https://news.ycombinator.com/item?id=9232419)

------
nmca
For those interested in the cutting edge of interpretability, check out
[https://github.com/marcotcr/anchor](https://github.com/marcotcr/anchor) I'm
pretty sure this sort of approach will be how most businesses handle the
problem.

