Hacker News new | past | comments | ask | show | jobs | submit login
Will GDPR Make Machine Learning Illegal? (kdnuggets.com)
111 points by Vaslo on March 18, 2018 | hide | past | favorite | 95 comments

Personally, I think many people and organizations currently exaggerate the possible changes and risks (for companies) brought on by the GDPR, often with the goal to instill fear (and generate business) or get press coverage.

The relevant article (https://gdpr-info.eu/art-22-gdpr/) clearly states that automated decision making is allowed under the GDPR, it just gives the data subject the right to demand a manual assessment of that decision. In practice, this means that you're absolutely free to use any kind of automated algorithm (AI-based or not) to make decisions on individuals, but if people complain and demand an explanation you need to review the decision manually.

Addendum: If you want to learn more about the motivations behind article 22 as well as articles 13 & 14 (that concern the transparency requirements), I highly recommend the official document of the Article 29 working party:


If you have to be able to review any decision manually that people complain about, that is significantly limiting. It means you can only write software to do things humans can do.

I don't think it's alarmist to worry about the GDPR. It's an extremely far-reaching law covering types of technology that are changing very rapidly. Our default reaction should be skepticism, because the default outcome of that combination is disaster.

If you have to be able to review any decision manually that people complain about, that is significantly limiting. It means you can only write software to do things humans can do

If you create an algorithm that by pure coincidence happens to discriminate against a protected class then any judge will take a dim view of you claiming to be unable to explain it.

I’d like to encourage everyone, even if just for fun, to insist on an explanation for every algorithmic decision they may be subjected to

The HN ranking algorithm ranked this comment below others. I demand a human explanation.

What significant legal effects do you believe the HN ranking algorithm of that comment has concerning you directly?


Do you need significant legal effects(no quite sure what that means)?

Yes. Article 22[1] gives you the right to not be subject to a decision made solely by automated processing which produces legal effects concerning you or similarly significantly affects you.

[1]: https://gdpr-info.eu/art-22-gdpr/

Arguably, suppressing someone's voice on the internet can qualify.

Speaking in terms of engineering practice only, that genuinely seems like an entirely reasonable request—the mods might decline to answer for fear of informing voting rings how to be effective, but they should at least have an answer for themselves.

As you said it's often not possible to explain exactly why an AI-based algorithm took a given decision (if it were we wouldn't need such a complicated algorithm in the first place), but this isn't necessary either. What you need to explain to the data subject is the purpose of the automated decision making, the data you use for it and the basic logic structure of your algorithm. Again, I can only recommend to read the guidelines issued by the Article 29 working party, as they will guide the implementation and enforcement of the regulation in the member states. An overview can be found here (the relevant document is "wp251"):


> to review any decision manually that people complain about, that is significantly limiting. It means you can only write software to do things humans can do

So are you happy with a black box that just says "computer says no" without any explanation or way to review this decision?

"Sorry, your insurance premium is now 300EUR higher than next year, computer says so"

A better example: An insurance claim might use a sophisticated anti-fraud algorithm to reject claims, but they actually have to be rejecting claims for reasons of fraud, and not because "computer says so".

I've asked auto insurance how they arrived at that number, they always give me some vague generic answer.

You forget to mention, that this isnt a regulation covering a technology. Its about how a specific technology is utilised in a very specific use case.

The entire debate is, like the name suggests, about the regulations of how personal information can be used in an economic context. Unsurprisingly, some of these use cases are illegal now and existed in a loophole before.

To simplyfy, framing a question this way is similar to asking if rights of gun ownership are infringed by outlawing shooting people.

”It means you can only write software to do things humans can do.”

For the things computers can do, they typically are way faster and way more reliable, but I don’t see how there could be anything software can do that humans can’t (quantum computing may change that)

Also, being able to explain what the computer did doesn’t imply being able to do it.

As a (silly) example, “We hashed the c.v. you sent us, getting a 150-digit number N. The N-th digit in the hexadecimal expansion of pi isn’t a zero, so you didn’t get invited for an interview” would be a perfectly good explanation of your hiring practices, even if you don’t know how to compute any of those decimals.

I don't think that's quite correct, many (most?) kinds of AI achieve things which are difficult or impossible for humans to do, but that doesn't mean you can't figure out why a decision was made afterwards.

There are some scenarios where that isn't possible, but... if nobody knows why decisions impacting people's lives are made and it isn't possible to determine what they were done for, is that something we really want?

>if nobody knows why decisions impacting people's lives are made and it isn't possible to determine what they were done for, is that something we really want?

We do this every day with human cognition and nobody complains.

It is not exactly true that nobody complains. There a lot of discrimination complaints to go around. But since we do not have means to open the brain and check what was the reason for this or that decision, deducing motivations is hard (unless you are commenting on internet, where everybody knows everybody else's motivations and is not afraid to explain it voluminously to each other). It would also be hard with algorithms, so basically the approach seems to be "reducing the problem to one we already know" - i.e. switching algorithmic unscrutability for human unscrutability and hoping that humans would do better or at least in won't be worse than what we have already been dealing with for millenia.

I think one of the reasons that we have a higher standard for algorithmic decision making is the fact that algorithms are used at scale:

For example, for a classification task we usually train an algorithm on labeled data (i.e. good debtor vs. bad debtor). When trained, we apply the algorithms to ALL future decisions on whom to give a credit. This makes algorithms so dangerous as compared to humans, because if 10 out of 1000 bankers are racist and discriminate against certain groups of people we still have 990 that don't. If our algorithm is trained on data that contains racism it will discriminate in every single case if it has the data to do so.

It's too bad your comment is buried so far down in the conversation, because this is probably the most insightful point I've seen on the internet today.

And not only do we do this, we have to.

I don't think that's an accurate analogy, at least at the enterprise level.

Generally speaking, individuals do not make decisions. They may influence decisions or find novel ways to apply rules, but anything organizationally important is written down somewhere.

Typically for legal purposes.

Yes we do this everyday, but there is lots of complaining.

You can develop whatever you want, just don't subject people to arbitrary binding decisions without appeal. The default attitude is one of skepticism, skepticism towards technology that could potentially affect human lives in very nasty ways. If you want your technology adopted instead of banned, it might be a good idea to listen to the people who will ultimately have to deal with the consequences of such technology. Otherwise, why should society adopt a technology that they cannot understand, that possibly no one understands or can explain or will explain? How can society trust you as the creator of such technology when it's a black box?

Your basic premise, that society as a whole ought to approve of new technologies before anyone is allowed to use them, seems very strange.

This isn't a case of ought or ought not to, it's just what happens. Society does indeed either approve or disapprove of certain technologies, not always before they are used. Say you develop ai to replace judges. Don't you think society needs to approve this before we replace human judges? What if someone trained the ai to give all non white people sentences 20% longer than white people? Society will, hopefully, balk at this idea and such a system will not be implemented or scrapped if it has been. If, on the other hand, they don't know the system is biased because its decisions are hidden from the public, it'll be a system that society might accept on false pretenses, leading to mass suffering. For technology that could potentially be used to oppress people, I think people have the right to know how it works and to ask for a human to verify or carry the actual task out. Otherwise it wouldn't be surprising that they would want to ban or severely restrict something so potentially harmful.

How so? Organizations such as the FDA do the exact same thing.

There is nothing in it about approval, only adoption.

I’m not seeing a problem. There should never be a situation where a human cannot review the decision and reverse it.

> Why is x 73rd on Google search ranking?

It's unlikely that a human would be able to produce the exact same ordering, for any possible search term for all the pages and pages of results that Google shows.

Since when are search results personal data?

They are affected by personal data. E.g. if you performed a search, and your friend next to you performed the same search and got different results there is probably no way Google could give a concise or exact explanation as to why your results were different.

I would guess personalized search results went mainstream on 2005 or so, as Google became popular, but perhaps some of the portals Google replaced (ask Jeeves and Yahoo) had personalized results before that

It's not just reversing you should be able to explain how it arrived to that decision in the first place.

It sounds like they're suggesting that, assuming those rights exist in these situations (they claim there's ambiguity there), that there are big questions around what qualifies as an acceptable explanation.

Did I misunderstand?

Yes, but reading the official guidelines should clarify this ([1], p. 14 ff.):

> The controller should find simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm. The information provided should, however, be meaningful to the data subject.

There's also a detailed example in the document that should make it clearer what kind of explanation is required (and what is not required). Again, personally I understand that it's not clear what exactly will be required here, but titling an article "Will GDPR Make Machine Learning Illegal?" is just an attempt to garner attention by instilling (unfounded) fear.

[1]: http://ec.europa.eu/newsroom/just/document.cfm?doc_id=47963

This :

>>> The information provided should, however, be meaningful to the data subject.

Although non binding, this makes the intent very clear. "AI: you have been refused insurance. Me : Why ? Insurer : Because our AI has reached that conclusion based on these data : X,Y,Z". Looks perfectly fine to me. Because with enough explanations like this, we can form an opinion about how the AI is working, which in turn will allow to balance the powers between me and the insurer some more. That looks good and balanced to me (notice that this argument doesn't consider the cost of implementation of GDPR, just the way the intereste of parties are better balanced)

Expand that a bit:

"Articles 13-15 provide rights to 'meaningful information about the logic involved' in automated decisions."

Your scenario doesn't explain the logic. Saying "that's the AI's choice and we're going with it because it's 99.9% accurate" isn't the logic involved in the decision.

You need an interpretable model to ensure that the AI isn't discriminating based on a protected class (race/gender/etc). "You were denied a loan because the AI determined that you're Polish, and we don't like Polish people" is partly what this law wants to prevent.

Forcing models to be explainable makes sure that we aren't illegally discriminating, so we need to make sure that we can tell why the AI made it's choice, not just what the choice was.

100% agree, that's why I wrote "that conclusion based on these data : X,Y,Z". The important word is "data". I said data because with AI, the decision process may be quite a black box. The only thing you know is what data you put in. So to me, input data is part of the answer.

In my job, we grant decisions to help people or not. We could use some kind of AI to give, for example, a "pre decision". That AI would be trained on our current data but, in the end, it would interpret the profile of the person. So basically, it'd say "based on the profile of X, we've decided that ...". Now if nationality, for example, was in the list of data in the profile, I'm 100% sure that we'd have a lawyer at our door (rightfully).

My point is that just saying "we have data X,Y,Z" for a person doesn't explain the logic. It allows you to check that the input data is correct, but you don't understand the decision from it. What you need is an explanation saying something like "X is too low, and we think that Y in the presence of Z is a significant risk factor."

The need for the explanation is because an AI can learn to discriminate against protected classes even if they aren't explicitly part of the dataset. You might not have included race in the inputs, but you did include their name, and it figures out that people named "Jakub" should be declined for a loan. The AI can't say that it's because they are Polish, but it learned to discriminate against Polish sounding names because of all the racism in the training data. We could uncover that if the AI was able to explain that it denied the loan mostly because of the name, and that the other pieces Y and Z did not factor into the decision as heavily. Just saying X Y and Z doesn't help us figure out which of those pieces are the important parts for denying a loan.

That's only meaningful if they can assure in writing that they have not approved for anyone with x,y,z reasons.


It may make it illegal, via the aggregate "right to explanation", to make decisions about people solely on the basis of machine learning models. That is, "we fed some data into a computer and it said 'no', so too bad" is not going to be a valid explanation.

That sounds like a fucking amazing outcome to me.

But unfortunately it's very easy to find a way around it: Make all decisions based on ML models, and only when a customer manages to go through the bureaucratic, expensive process of requesting an explanation, have a human review the case and come up with a plausible reason.

It's definitely a whole lot better than not having that law, but we're not quite there yet.

You're right, additional properties are desirable. The ACM has recently published a [set of guidelines](https://www.acm.org/binaries/content/assets/public-policy/20...) for algorithmic decision making. "Access" and "Auditability" are properties that mean it shouldn't be difficult to get the explanation, and that it shouldn't be possible to 'bullshit' an explanation afterwards.

That sounds a lot like fraud to me... are you sure this would be at all legal?

(Alternatively is your argument "but companies might blatantly and intentionally break the law"?)

I can't speak for the OP, but doesn't companies blatantly and intentionally break the law all the time?


Well I mean in aggregate yes, but that's just because there are so damn many of them, that's not different from saying "people murder eachother all the time". Presumably you mean normalized to the number of companies/people involved/something.

Also yes if you mean "misinterpret civil law in their favor", or just "not quite comply with some regulation - in the regulators opinion".

But not in a case like this where it would appear to be fairly unambiguous and criminal law which you could go to jail for. When that happens it's newsworthy.

Besides which, even if that were true, the right response isn't to create more laws on top of this GDPR thing, it's to create it and then enforce it, and fraud. Once you send a few hundred people to jail for fraud I imagine that the rest would fall in line.

> only when a customer manages to go through the bureaucratic, expensive process of requesting an explanation, have a human review the case and come up with a plausible reason.

That doesn't sound exactly legal as the actual reason had to exist at the time of the decision and an after the fact reason would not be the truth.

Don't the ML models log why they made decisions in the first place?

The log is we multiplied an 80x25x3 matrix of your planet's photograph by 30 40x40 matrices of other numbers and then fed it through a neural net which consisted of adding and multiplying a few hundred other numbers and we got this number, which when run through a tanh function, classified your planet as uninhabitable space garbage. That's why the Vogon constructor fleet arrived and cleared it to make room for an intergalactic highway.

The point of interpretability in ML is that the "reasons" that model arrived at a decision/classification/etc are incomprehensible (and, given e.g. the existence of adversarial examples in image classification, this should be expected -- no reasonable answer exists for "why" the model thought this static was a cat).

It's not magic. It amazes me how many "data scientists" can't explain the basic mathematics behind their modelling.

If the outcome of algorithm/model can't be explained by its regulators then it shouldn't be making any decisions of any significance or importance to human lives.

When engineers fail to address cracking in a bridge and people die when it falls, humans are held to account and explain why this decision was so. If machine learning statistical models can't do the same, they better not be used to scan any bridges for structural cracking where I live and vote.

I have heard approaches described that train a simpler ML algorithm to predict why a complex ML algorithm made a decision. Decision-tree models predicting neural net models. Paypal or Square I believe.

how does this solve anything? if a simple decision tree could predict the outputs of more complex deep nets, why not use the decision tree in the first place? also, what do you do when a decision tree isnt powerful enough, as in the case of many interesting problems such as speech, computer vision, etc.

The uses cases for this were around fraud detection systems where you need to provide a reason for the flag.

I don't have to know the complexities of how all the factors are interacting within a neural net to be able to tell someone that if they increased their credit score by 100 points, they'd be approved.

Hold up, data scientists don't know why their ML is deciding things and companies are using it to decide important things in people's lives? Sounds like it should be banned until companies can actually explain the decisions and some data scientists need to take an ethics class.

>until companies can actually explain the decisions

That's almost certainly never going to happen. Our most accurate models are unintelligible, and our most intelligible models are inaccurate. There's a trade-off here and without some sort of magic I don't see us transcending that trade-off.

This is where I don't understand. If the data scientists don't know the why of the decision making and the software cannot explain itself then how do they understand that it is accurate?

Going further, how in the heck do they justify using something like this on credit applications or anything else that can adversely affect their fellow humans? The ethics of such a lack of knowledge should give pause and frankly I hope they get sued if this is the truth of the matter. That is monstrous.

> This is where I don't understand. If the data scientists don't know the why of the decision making and the software cannot explain itself then how do they understand that it is accurate?

They understand that by evaluating the model on a "test dataset" that is, hopefully, representative of the real-world. This does allow you can explain why a given model makes the decisions it makes - it only allows you to understand how well it performs on the data you feed it.

Using an inaccurate or biased model to detect assign credit ratings is indeed immoral. This is orthogonal to interpretability: you can have a model that is both inaccurate and interpretable (i.e. it's wrong and you can say why it's wrong); you can also have a model that is highly accurate but not interpretable (i.e. it works, but you cannot explain why).

Ok, I get the training and verifying via test data sets, but I really don't get how a piece of software cannot be explained. It runs on a computer. It executes instructions that have to be visible in a trace. Someone had to write and debug the initial code. The data for the customer flows through the system, so how the heck can it be a black box? This is not mystical, its computer instructions. I can accept that data scientists have built software that doesn't output enough information to model the decisions, but I cannot accept a artificial system cannot be trace to show the decisions. If this is the type of software that is going to be running anything that can kill (e.g. autonomous cars) or fry (e.g. medical equipment) a human then I hope someone makes it illegal as heck until the model can be reversed in every case. How in the heck do developer / data scientists think they can get away with a black box with no accountability?

If a neural network has let's say 50 million parameters, it doesn't matter if you can trace any of the calculations, you're never going to actually understand how the model operates.

Sometimes it's possible to extract a sort of conceptual understanding in some cases, eg you might say that this layer performs edge detection or whatever. But that's not much of an explanation.

They do it because it works. You trust your Uber driver to get you to your destination even though you have no idea how his neurons do it. People are going to trust machine intelligence in the same way.

People are going to trust machine intelligence in the same way.

Given the xenophobic history of the human race, I have some severe doubts about that. Frankly, the first data scientist who ends up in court because the automated car decided to kill some kid on the sidewalk[1] to keep the death count down is going to find out real quick what a jury thinks of machine intelligence.

Thinking about it, I can see the black box in a car having to record the last minute of instructions run by the car's cpu. The whole idea of unknowable isn't exactly going to sit well with the NTSB. I can truly see this if data scientists testify in a Congressional Hearing that they don't understand how their creation came to its decision.

I guess I am in awe of folks who do not have tools to figure out what model all the learning has built. Where is the DTrace for ML? If your neural network has 50 million parameters then how the heck did the data scientist have a data set to teach it from that is in anyway complete enough to trust it?

I can just not see how it ethically can be unleashed upon people in a final go / no go decision affecting people's lives or livelihood. After reading this thread, I now hope people who are rejected for loans ask for an actual explanation and the exact manner the decision for their rejection was reached.

1) https://www.usatoday.com/story/money/cars/2017/11/23/self-dr...

What's the uplift between your super accurate model and 1-r? What's the uplift vs a 50 node decision tree?

Also accuracy is hard to talk about if you don't have the whole distribution you are estimating from; which generally we don't... Many highly accurate models (in the sense of working on the test set and n-fold xvalidation) underperform in production.

No. This is an absurd victim roleplay. People’s rights to control what happens to their information is not going to make machine learning illegal. It’ll just give all users a little more privacy and make selling their data a little less lucrative.

Besides, there’s been plenty of work into improving interpretibility. Stop pushing the myth that we don’t understand anything about neural networks and they’re the biggest, blackest box since Schwarzschild.

GDPR might make certain uses of machine learning more difficult.

A hypothetical insurance company that uses ML to decide who gets offered which rate, might run into trouble when a customer asked for "meaningful information" about the logic behind the decision.

But for the same insurance company, using ML to flag possible fraud cases, and then having a human review them, seems not to pose a problem. All decisions were made by humans, the machines just showed where a decision was needed.

Of course, IANAL.

Insurance companies are already prevented by law from using methods which cannot be explained, so they will be fine.

I actually think that most ML will be fine, as the outcomes are trivial. The particular set of results that Google shows you on Google.com are not a decision about you, and I can't see those needing to be interpreted.

Anything more consequential (i.e. job hiring) probably will, but also has laws that prevent the use of indiscriminate data already.

So, I don't see this an issue.

> The particular set of results that Google shows you on Google.com are not a decision about you

I think that's easily demonstrated as not true: when I search for something like `luigi` in a regular window, I get the github link 1st. When I search for it in an anonymous window the github link is 3rd. Try it for yourself with a technical term that also has a common non-technical meaning.

I believe that search personalization is useful, but it may be a privacy issue.

See I get you, but I disagree. My example is python, whereby biologists and such-like people get snakes, people hanging out here get the programming language and so on.

I personally believe (with no evidence) that the number of potential SERPs is much lower than the number of users, as I reckon they use some kind of dimension reduction technique and pick vectors of results "close" to yours.

But I completely accept that I could be entirely wrong on this (i still think such a decision isn't consequential enough to trigger any GDPR provisions, but again I could also be wrong on that :) )

The insurance example was just the first thing I could think of. Replace with phone company, utilities company, what have you.

Would publishing a deep learning model on GitHub satisfy that requirement? You want to know how it makes decisions? Here you go!

Regardless of the whatever the GDPR actually says and even if ML is a complete black box, I would be very happy at least knowing my input features to ML algorithms.

e.g. If I'm denied a loan, I want to learn the various "facts" about me that led to the ML coming up with the denial. Then I can fact-check it (e.g. "actually I don't already have $X of debt") and come up with an appeal.

Agreed. We are building up this level of faceless bureaucracy the more automated algorithms with multiple inputs we are using.

I know someone who had to deal with identity theft and it's a sheer nightmare to figure out why you get denied a rental car or airplane boarding. The people facing the customer know that the person is blocked but don't know why and have no way to figure out why.

What's that saying about "The answer to any headline that is posed as a yes-or-no question is 'no'."

Betteridge's law of headlines I believe.


In the case of things like being rejected by a machine learning algorithm for a mortgage, it should be pretty easy to automate coming up with an explanation for the applicant.

Run the application again with $5k higher income. Keep raising income by $5k until you find an income level that results in approval. If that income level is reasonable for people in general who were approved for similar mortgage amounts and whose other inputs are in the same ballpark as the rejected application, tell the applicant the rejection was for income too low.

You can do a similar thing with other parameters, such as credit rating, length of employment, and whatever else is input to your algorithm that is under the applicant's control. In some cases you might not be able to find a good single parameter tweak for approval and will have to resort to combinations. There will usually be many different ways then to tweak for approval. Use comparisons to similar applications that were approved to pick a reasonable combination of tweaks to base the rejection explanation on.

The only reason I could see this wouldn't work is your explanation had to be "good" one.

E.g. if you lived 10 miles to the west, had the same income, had a mortgage before 2005, and didn't apply for credit between Sept 2010 and June 2011, then you would have been accepted. (There are other situations that would have resulted in acceptance; this is just one.)

I find it extremely concerning that people, especially on this platform, don't care about what the law says and choose to just hand wave it as "it doesn't apply to us / they don't enforce it like that".

The letter of the law is even more important than the spirit of it because it allows governments, corporations and other entities to simply say "this is what the law says" and because you are small you don't have the money/power to challenge this rendition if the spirit of the law disagrees with them. Not to mention that laws are _laws_, they are not meant to be really interpreted (it is a byproduct of having to do so to have a functioning society).

It's an extremely slippery slope to not care what the law says. If we don't we'll have some dystopian laws that allow for anything quite soon.

I have a related question - can EU citizen under GDPR demand that a company "unlearns" all of their machine learning models from his/her data? I'm not asking if this is technically feasible (you could do that by removing old model, removing user data from training set and then reruning learning). I'm asking whether GDPR will give citizens power to demadn that.

GDPR only requires erasure of personal data on request, where "personal data" is defined as information relating to some identified or identifiable person. Machine learning models trained on personal data aren't going to be personal data themselves.

Machine learning models leaking private information is a real problem though, unless mechanisms to ensure something like differential privacy are applied. Those approaches may still be too theoretical and hard to reason about for many machine learning practitioners. Will be interesting how that plays out.

Well, unless they can be used to identify the person.

But we are (reminds me of black mirror...) entitled to having knowledge and memories. Corporate entities will start to claim that you cannot ask us to remove our memories (for example, identifying people in images).

Nobody else has a problem with the mix up of 'Machine Learning' and 'Deep Learning' here?

Last time I heard of those terms, 'Machine Learning' was a much broader category than 'Deep Learning'. Its like someone says 'Next years Porsches will be illegal' and the next person calls 'Next year cars will be illegal'?!?

It's in your best interest to reduce your risk by going through a De-identification process for data collected:

>De-identification is adopted as one of the main approaches of data privacy protection. It is commonly used in fields of communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks and audio–video surveillance.


Rather than explaining how the machine learning system works as an "explanation", perhaps auditing could be added to machine learning algorithms so that you very much could produce a very long but accurate description of the process, a la pages full of "X was compared to Y, X was larger, and thus we move on to step 261". A bit like disassembling machine code.

Of course, in machine learning, the datasets backing up the comparisons could not be shared as they contain variations of confidential and personal data, but you might still end up with a legally tolerable record of the algorithmic steps involved in the decision making process even if they're not useful to see.

I see what you’re getting at, but this would be a very lengthy and arduous process. Not to mention, many ML and DL algorithms are incredibly mathematically complex that describing them with literal step-by-step detail sounds like hell.

It would be if you had to be involved, but I'm suggesting algorithms could have some sort of instrumentation so such "explanations" could be automatically generated and thrown into a data warehouse for possible future use. (This is all a cynical attempt to meet legal requirements rather than anything actually useful for the user, of course.)

Serious question: is there a web based application of ML that isn’t about exploiting or deceiving customers/users?

ML has its legit uses sure, but it always seems to be used to outwit people in practice

Catching fake / spamming / abusive users on social media.

People are (rightfully) upset by the amount of spam and harassment happening on platforms like Twitter. Machine learning is a good way to tackle this problem at scale, in an automated way, by training models to spot and auto-moderate bad posts.

Under GDPR, anyone can demand an explanation for why they were marked as spam / abusive. This will allow trolls and spammers to improve their workarounds, by demanding an explanation for why their posts were blocked, then tailoring their later posts to avoid these signifiers. It will also let them impose a tax on platforms that try to block them by flooding them with demands for explanations under GDPR.

Well, I mean, what counts as exploitation? If I learn a lot about what shoes people like in my job as a shoe salesman, and I use that experience when new people come in to make recommendations, I think everyone would call that just good customer service. But when Amazon sets up an ML model to do the same thing, many people argue they're exploiting your personal data for the sake of profit.

Off the top of my head, I can't think of a widespread web-based ML application that some people don't consider exploitative. But is that a property of the ML or the people?

I’d argue that “people who bought X also bought Y” is too simplistic to be considered ML. But say, charging a little extra for X to people who previously bought Y would be exploitative.

Spotify uses ML to give music recommendations: https://hackernoon.com/spotifys-discover-weekly-how-machine-...

Sure. How about the basic e-commerce use case of "based on your interest in product X, you may also be interested in product Y"?

For those interested in the cutting edge of interpretability, check out https://github.com/marcotcr/anchor I'm pretty sure this sort of approach will be how most businesses handle the problem.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact