To add insult to injury, you could outsource the training of your own model to the API too.
Alternatively, you could restrict your "stolen" model to a smaller domain and use fewer, more targeted examples for training. But at this point, you might as well start blending in predictions from other APIs, perhaps even training one off the errors of another. This is basically a technique that has been around for a long time, and in one incarnation is called "boosting" (see Adaboost).
With carefully selected queries based on an already trained model, I think you many not need so terribly many. But that's just an intuition.
Of course, model providers could just as easily have some sort of protection against this, similar to what's done with "trap streets" on maps.
Is this only for supervised learning?
Also, couldn't this be done offline, pseudo-legitimately, using your API call log data later on? I don't see how that can be mitigated.
often generated by humans
> and care only about making the most accurate predictions possible
of labels often generated by humans.
Bias can get into a classifier. It can get there through a biased model, but that's very unlikely. Much more likely is that it's trained on biased data. Which is _easy_ to do, even by accident.
But in most cases, the whole point of using machine learned models is to do better than humans. Like an insurance company using ML to predict how likely a customer is to get in an accident. They aren't going to train the models to mimic actuaries, they have plenty of actual data to train it on.
And it is quite possible that males get into accidents much more often than females. But that doesn't mean the model is prejudiced or that it's wrong.
That depends on two things:
1. How possible it is to inspect and criticize the judgements of the model, and its basis for it
2. How possible it is to inspect and criticize the judgements of the humans, and their basis for it
I would say that 2 is the bigger problem all in all. But 1 can potentially still become a big problem if models are trusted blindly.
But on top of that, humans almost always do worse than even the most simple statistical baselines. Simple linear regression on a few relevant variables beats human 'experts' 99% of the time. Humans shouldn't be allowed to make decisions at all, yet everyone seems to fear teh scary algorithms instead.
The research is about extracting information from black boxes.
I want an AI to steal my model. And run it. Forever.
But even so it's not clear their algorithm was the cause of the bias, or that the bias was significant. For instance, it's possible that black people have slightly worse "facial symmetry" on average, or whatever made up metric they were using. And even if black people only scored 1% worse on average, that means the extremes will be dominated by whites, because of the way gaussian distributions work. So it may appear to be way more biased than it actually is.
I think this is actually the reason for a lot of accidental discrimination, because human judges would have exactly the same problem.
I remember in school, playing chess a couple of times against a guy fresh from Sudan. He had the most unsettling smile, and played very unorthodox openings. I won some, he won some - but I always suspected he was stronger than me, and just being polite/testing me. It's just impossible to read someone from a culture so different. I'm glad we didn't play poker, to put it like that.
You can do this deliberately, or your machine learning model might do it spontaneously.
It may be true that a sufficiently complex machine learned model could learn race as a feature, but again, why would it? It has no prejudice against specific races. Unless you really believe that black people are inherently more likely to get into car accidents, even after controlling for income and education, etc. And even then it doesn't hate black people, it's just doing it's best to predict risk as accurately as possible. It's not charging blacks, as a group, a higher rate than they cost, as a group.
I fail to see why people get so upset at the mere possibility of this. I think they are anthropomorphizing AI as if it was a human bigot that has an irrational hatred for other races, and strongly discriminates against them for no reason. This is more like giving people who live in neighborhoods with slightly higher accident rates, slightly higher insurance rates, to make up for their increased risk. Maybe it correlates with race, maybe it doesn't, it doesn't really matter.
Based on a list of 137 questions the Northpointe system predicts the risk of re-offending, and "blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend." Meanwhile, whites are "much more likely than blacks to be labeled lower risk but go on to commit other crimes."
In other words:
- a white person labeled high risk will re-offend 66.5% of the time
- a black person labeled high risk will re-offend 55.1% of the time
- a white person labeled low risk will re-offend 47.7% of the time
- a black person labeled high risk will re-offend 28% of the time.
The model is specifically avoiding race as an input, but still overestimates the danger of black recidivism, while underestimating white recidivism.
when the supreme ai gains the ability to thirst for knowledge, it will steal all the machine learning models via prediction APIs...