The relevant article (https://gdpr-info.eu/art-22-gdpr/) clearly states that automated decision making is allowed under the GDPR, it just gives the data subject the right to demand a manual assessment of that decision. In practice, this means that you're absolutely free to use any kind of automated algorithm (AI-based or not) to make decisions on individuals, but if people complain and demand an explanation you need to review the decision manually.
Addendum: If you want to learn more about the motivations behind article 22 as well as articles 13 & 14 (that concern the transparency requirements), I highly recommend the official document of the Article 29 working party:
I don't think it's alarmist to worry about the GDPR. It's an extremely far-reaching law covering types of technology that are changing very rapidly. Our default reaction should be skepticism, because the default outcome of that combination is disaster.
If you create an algorithm that by pure coincidence happens to discriminate against a protected class then any judge will take a dim view of you claiming to be unable to explain it.
I’d like to encourage everyone, even if just for fun, to insist on an explanation for every algorithmic decision they may be subjected to
So are you happy with a black box that just says "computer says no" without any explanation or way to review this decision?
"Sorry, your insurance premium is now 300EUR higher than next year, computer says so"
The entire debate is, like the name suggests, about the regulations of how personal information can be used in an economic context. Unsurprisingly, some of these use cases are illegal now and existed in a loophole before.
To simplyfy, framing a question this way is similar to asking if rights of gun ownership are infringed by outlawing shooting people.
For the things computers can do, they typically are way faster and way more reliable, but I don’t see how there could be anything software can do that humans can’t (quantum computing may change that)
Also, being able to explain what the computer did doesn’t imply being able to do it.
As a (silly) example, “We hashed the c.v. you sent us, getting a 150-digit number N. The N-th digit in the hexadecimal expansion of pi isn’t a zero, so you didn’t get invited for an interview” would be a perfectly good explanation of your hiring practices, even if you don’t know how to compute any of those decimals.
There are some scenarios where that isn't possible, but... if nobody knows why decisions impacting people's lives are made and it isn't possible to determine what they were done for, is that something we really want?
We do this every day with human cognition and nobody complains.
For example, for a classification task we usually train an algorithm on labeled data (i.e. good debtor vs. bad debtor). When trained, we apply the algorithms to ALL future decisions on whom to give a credit. This makes algorithms so dangerous as compared to humans, because if 10 out of 1000 bankers are racist and discriminate against certain groups of people we still have 990 that don't. If our algorithm is trained on data that contains racism it will discriminate in every single case if it has the data to do so.
And not only do we do this, we have to.
Generally speaking, individuals do not make decisions. They may influence decisions or find novel ways to apply rules, but anything organizationally important is written down somewhere.
Typically for legal purposes.
It's unlikely that a human would be able to produce the exact same ordering, for any possible search term for all the pages and pages of results that Google shows.
Did I misunderstand?
> The controller should find simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm. The information provided should, however, be meaningful to the data subject.
There's also a detailed example in the document that should make it clearer what kind of explanation is required (and what is not required). Again, personally I understand that it's not clear what exactly will be required here, but titling an article "Will GDPR Make Machine Learning Illegal?" is just an attempt to garner attention by instilling (unfounded) fear.
>>> The information provided should, however, be meaningful to the data subject.
Although non binding, this makes the intent very clear. "AI: you have been refused insurance. Me : Why ? Insurer : Because our AI has reached that conclusion based on these data : X,Y,Z". Looks perfectly fine to me. Because with enough explanations like this, we can form an opinion about how the AI is working, which in turn will allow to balance the powers between me and the insurer some more. That looks good and balanced to me (notice that this argument doesn't consider the cost of implementation of GDPR, just the way the intereste of parties are better balanced)
"Articles 13-15 provide rights to 'meaningful information about the logic involved' in automated decisions."
Your scenario doesn't explain the logic. Saying "that's the AI's choice and we're going with it because it's 99.9% accurate" isn't the logic involved in the decision.
You need an interpretable model to ensure that the AI isn't discriminating based on a protected class (race/gender/etc). "You were denied a loan because the AI determined that you're Polish, and we don't like Polish people" is partly what this law wants to prevent.
Forcing models to be explainable makes sure that we aren't illegally discriminating, so we need to make sure that we can tell why the AI made it's choice, not just what the choice was.
In my job, we grant decisions to help people or not. We could use some kind of AI to give, for example, a "pre decision". That AI would be trained on our current data but, in the end, it would interpret the profile of the person. So basically, it'd say "based on the profile of X, we've decided that ...". Now if nationality, for example, was in the list of data in the profile, I'm 100% sure that we'd have a lawyer at our door (rightfully).
The need for the explanation is because an AI can learn to discriminate against protected classes even if they aren't explicitly part of the dataset. You might not have included race in the inputs, but you did include their name, and it figures out that people named "Jakub" should be declined for a loan. The AI can't say that it's because they are Polish, but it learned to discriminate against Polish sounding names because of all the racism in the training data. We could uncover that if the AI was able to explain that it denied the loan mostly because of the name, and that the other pieces Y and Z did not factor into the decision as heavily. Just saying X Y and Z doesn't help us figure out which of those pieces are the important parts for denying a loan.
It may make it illegal, via the aggregate "right to explanation", to make decisions about people solely on the basis of machine learning models. That is, "we fed some data into a computer and it said 'no', so too bad" is not going to be a valid explanation.
That sounds like a fucking amazing outcome to me.
It's definitely a whole lot better than not having that law, but we're not quite there yet.
(Alternatively is your argument "but companies might blatantly and intentionally break the law"?)
Well I mean in aggregate yes, but that's just because there are so damn many of them, that's not different from saying "people murder eachother all the time". Presumably you mean normalized to the number of companies/people involved/something.
Also yes if you mean "misinterpret civil law in their favor", or just "not quite comply with some regulation - in the regulators opinion".
But not in a case like this where it would appear to be fairly unambiguous and criminal law which you could go to jail for. When that happens it's newsworthy.
Besides which, even if that were true, the right response isn't to create more laws on top of this GDPR thing, it's to create it and then enforce it, and fraud. Once you send a few hundred people to jail for fraud I imagine that the rest would fall in line.
That doesn't sound exactly legal as the actual reason had to exist at the time of the decision and an after the fact reason would not be the truth.
Don't the ML models log why they made decisions in the first place?
If the outcome of algorithm/model can't be explained by its regulators then it shouldn't be making any decisions of any significance or importance to human lives.
When engineers fail to address cracking in a bridge and people die when it falls, humans are held to account and explain why this decision was so. If machine learning statistical models can't do the same, they better not be used to scan any bridges for structural cracking where I live and vote.
I don't have to know the complexities of how all the factors are interacting within a neural net to be able to tell someone that if they increased their credit score by 100 points, they'd be approved.
That's almost certainly never going to happen. Our most accurate models are unintelligible, and our most intelligible models are inaccurate. There's a trade-off here and without some sort of magic I don't see us transcending that trade-off.
Going further, how in the heck do they justify using something like this on credit applications or anything else that can adversely affect their fellow humans? The ethics of such a lack of knowledge should give pause and frankly I hope they get sued if this is the truth of the matter. That is monstrous.
They understand that by evaluating the model on a "test dataset" that is, hopefully, representative of the real-world. This does allow you can explain why a given model makes the decisions it makes - it only allows you to understand how well it performs on the data you feed it.
Using an inaccurate or biased model to detect assign credit ratings is indeed immoral. This is orthogonal to interpretability: you can have a model that is both inaccurate and interpretable (i.e. it's wrong and you can say why it's wrong); you can also have a model that is highly accurate but not interpretable (i.e. it works, but you cannot explain why).
Sometimes it's possible to extract a sort of conceptual understanding in some cases, eg you might say that this layer performs edge detection or whatever. But that's not much of an explanation.
They do it because it works. You trust your Uber driver to get you to your destination even though you have no idea how his neurons do it. People are going to trust machine intelligence in the same way.
Given the xenophobic history of the human race, I have some severe doubts about that. Frankly, the first data scientist who ends up in court because the automated car decided to kill some kid on the sidewalk to keep the death count down is going to find out real quick what a jury thinks of machine intelligence.
Thinking about it, I can see the black box in a car having to record the last minute of instructions run by the car's cpu. The whole idea of unknowable isn't exactly going to sit well with the NTSB. I can truly see this if data scientists testify in a Congressional Hearing that they don't understand how their creation came to its decision.
I guess I am in awe of folks who do not have tools to figure out what model all the learning has built. Where is the DTrace for ML? If your neural network has 50 million parameters then how the heck did the data scientist have a data set to teach it from that is in anyway complete enough to trust it?
I can just not see how it ethically can be unleashed upon people in a final go / no go decision affecting people's lives or livelihood. After reading this thread, I now hope people who are rejected for loans ask for an actual explanation and the exact manner the decision for their rejection was reached.
Also accuracy is hard to talk about if you don't have the whole distribution you are estimating from; which generally we don't... Many highly accurate models (in the sense of working on the test set and n-fold xvalidation) underperform in production.
Besides, there’s been plenty of work into improving interpretibility. Stop pushing the myth that we don’t understand anything about neural networks and they’re the biggest, blackest box since Schwarzschild.
A hypothetical insurance company that uses ML to decide who gets offered which rate, might run into trouble when a customer asked for "meaningful information" about the logic behind the decision.
But for the same insurance company, using ML to flag possible fraud cases, and then having a human review them, seems not to pose a problem. All decisions were made by humans, the machines just showed where a decision was needed.
Of course, IANAL.
I actually think that most ML will be fine, as the outcomes are trivial. The particular set of results that Google shows you on Google.com are not a decision about you, and I can't see those needing to be interpreted.
Anything more consequential (i.e. job hiring) probably will, but also has laws that prevent the use of indiscriminate data already.
So, I don't see this an issue.
I think that's easily demonstrated as not true: when I search for something like `luigi` in a regular window, I get the github link 1st. When I search for it in an anonymous window the github link is 3rd. Try it for yourself with a technical term that also has a common non-technical meaning.
I believe that search personalization is useful, but it may be a privacy issue.
I personally believe (with no evidence) that the number of potential SERPs is much lower than the number of users, as I reckon they use some kind of dimension reduction technique and pick vectors of results "close" to yours.
But I completely accept that I could be entirely wrong on this (i still think such a decision isn't consequential enough to trigger any GDPR provisions, but again I could also be wrong on that :) )
e.g. If I'm denied a loan, I want to learn the various "facts" about me that led to the ML coming up with the denial. Then I can fact-check it (e.g. "actually I don't already have $X of debt") and come up with an appeal.
I know someone who had to deal with identity theft and it's a sheer nightmare to figure out why you get denied a rental car or airplane boarding. The people facing the customer know that the person is blocked but don't know why and have no way to figure out why.
Run the application again with $5k higher income. Keep raising income by $5k until you find an income level that results in approval. If that income level is reasonable for people in general who were approved for similar mortgage amounts and whose other inputs are in the same ballpark as the rejected application, tell the applicant the rejection was for income too low.
You can do a similar thing with other parameters, such as credit rating, length of employment, and whatever else is input to your algorithm that is under the applicant's control. In some cases you might not be able to find a good single parameter tweak for approval and will have to resort to combinations. There will usually be many different ways then to tweak for approval. Use comparisons to similar applications that were approved to pick a reasonable combination of tweaks to base the rejection explanation on.
E.g. if you lived 10 miles to the west, had the same income, had a mortgage before 2005, and didn't apply for credit between Sept 2010 and June 2011, then you would have been accepted. (There are other situations that would have resulted in acceptance; this is just one.)
The letter of the law is even more important than the spirit of it because it allows governments, corporations and other entities to simply say "this is what the law says" and because you are small you don't have the money/power to challenge this rendition if the spirit of the law disagrees with them. Not to mention that laws are _laws_, they are not meant to be really interpreted (it is a byproduct of having to do so to have a functioning society).
It's an extremely slippery slope to not care what the law says. If we don't we'll have some dystopian laws that allow for anything quite soon.
Last time I heard of those terms, 'Machine Learning' was a much broader category than 'Deep Learning'. Its like someone says 'Next years Porsches will be illegal' and the next person calls 'Next year cars will be illegal'?!?
>De-identification is adopted as one of the main approaches of data privacy protection. It is commonly used in fields of communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks and audio–video surveillance.
Of course, in machine learning, the datasets backing up the comparisons could not be shared as they contain variations of confidential and personal data, but you might still end up with a legally tolerable record of the algorithmic steps involved in the decision making process even if they're not useful to see.
ML has its legit uses sure, but it always seems to be used to outwit people in practice
People are (rightfully) upset by the amount of spam and harassment happening on platforms like Twitter. Machine learning is a good way to tackle this problem at scale, in an automated way, by training models to spot and auto-moderate bad posts.
Under GDPR, anyone can demand an explanation for why they were marked as spam / abusive. This will allow trolls and spammers to improve their workarounds, by demanding an explanation for why their posts were blocked, then tailoring their later posts to avoid these signifiers. It will also let them impose a tax on platforms that try to block them by flooding them with demands for explanations under GDPR.
Off the top of my head, I can't think of a widespread web-based ML application that some people don't consider exploitative. But is that a property of the ML or the people?