Hacker News new | comments | show | ask | jobs | submit login
Stealing Machine Learning Models via Prediction APIs (arxiv.org)
206 points by 0x0 on Sept 22, 2016 | hide | past | web | favorite | 37 comments



This is interesting but the real money is in extracting HFT models from other firms by messing with the order book and evaluating the response :-).


Thankfully "messing" isn't the word I would choose for most market participants but you're describing the algorithmic game theory that goes into designing a good execution algorithm.


Very interesting. It makes sense that you could learn the model for something like classification which has a cut and dry answer. Just brute force queries at the API, log the results, and start working on your own model based on theirs.


> Just brute force queries at the API, log the results, and start working on your own model based on theirs.

To add insult to injury, you could outsource the training of your own model to the API too.


API rate limits? Suppose you could get around that with multiple IPs/accounts. How many queries do you need to steal a (reasonable size) model?


Depends on too many factors for even a ballpark. Take Google's Machine Vision API for instance. The limiting factor here is that the larger your model (and deep networks are very large models in terms of free parameters), the more training data you need to make a good approximation. To come close to "stealing" their entire trained model, my guess is that your API use would probably multiply Google's annual revenue by a small positive integer.

Alternatively, you could restrict your "stolen" model to a smaller domain and use fewer, more targeted examples for training. But at this point, you might as well start blending in predictions from other APIs, perhaps even training one off the errors of another. This is basically a technique that has been around for a long time, and in one incarnation is called "boosting" (see Adaboost).


You don't need to steal the whole model. You can train a model yourself, and just "steal" those "bits" of a model that you can't reach with your own training data.

With carefully selected queries based on an already trained model, I think you many not need so terribly many. But that's just an intuition.


This is not necessarily that surprising. Hinton's 'dark knowledge' (not cited in the paper) already showed that a remarkable amount of information is hidden in the classification probabilities emitted by a model, and that one neural net can learn a lot from and reverse-engineer another neural net given just its precise predictions.


Isn't this analogous to trying to understand human cognition by using human responses to inputs, e.g. the endeavor of cognitive science? Just in the case of inferring the ML architecture, there's a smaller hypothesis space than for what we think people are doing.


You wouldn't steal the model per se, but you could uses this technique to generate some nice training data.

Of course, model providers could just as easily have some sort of protection against this, similar to what's done with "trap streets" on maps.


So, is this just brute-forcing API calls to create a training set?

Is this only for supervised learning?

Also, couldn't this be done offline, pseudo-legitimately, using your API call log data later on? I don't see how that can be mitigated.


This also seems useful for extracting a model, creating an interpretable version of parts of it, and proving whether it is prejudiced against certain races, genders, etc.


Why would a model be prejudiced against a certain race? Very rarely do people give race as a feature to statistical models to begin with. And even if they did, they do not have human prejudices. They train on actual data and care only about making the most accurate predictions possible.


> They train on actual data

often generated by humans

> and care only about making the most accurate predictions possible

of labels often generated by humans.

Bias can get into a classifier. It can get there through a biased model, but that's very unlikely. Much more likely is that it's trained on biased data. Which is _easy_ to do, even by accident.


Sure, but even then they would be no worse than the humans they replace.

But in most cases, the whole point of using machine learned models is to do better than humans. Like an insurance company using ML to predict how likely a customer is to get in an accident. They aren't going to train the models to mimic actuaries, they have plenty of actual data to train it on.

And it is quite possible that males get into accidents much more often than females. But that doesn't mean the model is prejudiced or that it's wrong.


> Sure, but even then they would be no worse than the humans they replace.

That depends on two things:

1. How possible it is to inspect and criticize the judgements of the model, and its basis for it

2. How possible it is to inspect and criticize the judgements of the humans, and their basis for it

I would say that 2 is the bigger problem all in all. But 1 can potentially still become a big problem if models are trusted blindly.


We can't inspect human brains, they are black boxes. People are incredibly biased, but also mostly unaware of their biases. For instance, judges have been found to give much harsher sentences just before lunch, when they are hungry. Or attractive people get much shorter sentences. In job interviews attractive people do better. It also matters way more in tipping waitresses and in elections.

But on top of that, humans almost always do worse than even the most simple statistical baselines. Simple linear regression on a few relevant variables beats human 'experts' 99% of the time. Humans shouldn't be allowed to make decisions at all, yet everyone seems to fear teh scary algorithms instead.


Funny you should mention that human brains are black boxes...

The research is about extracting information from black boxes.

I want an AI to steal my model. And run it. Forever.


https://motherboard.vice.com/read/why-an-ai-judged-beauty-co... “It happens to be that color does matter in machine vision,” Alex Zhavoronkov, chief science officer of Beauty.ai, wrote me in an email. “and for some population groups the data sets are lacking an adequate number of samples to be able to train the deep neural networks.”


Well of course a beauty contest judged by AIs would go horribly wrong. Appearance is highly subjective and arbitrary.

But even so it's not clear their algorithm was the cause of the bias, or that the bias was significant. For instance, it's possible that black people have slightly worse "facial symmetry" on average, or whatever made up metric they were using. And even if black people only scored 1% worse on average, that means the extremes will be dominated by whites, because of the way gaussian distributions work. So it may appear to be way more biased than it actually is.


Even with no bias, uncertainty can cause problems. Say an ML system is tasked with finding the top 10 candidates in terms of "confidence that they will be able to do the job". Then if it has little training data on candidates of a particular class, and those in that class are actually quite different on many different variables (so it can't generalize very well), it may not be able to reach the required levels of confidence for them.

I think this is actually the reason for a lot of accidental discrimination, because human judges would have exactly the same problem.

I remember in school, playing chess a couple of times against a guy fresh from Sudan. He had the most unsettling smile, and played very unorthodox openings. I won some, he won some - but I always suspected he was stronger than me, and just being polite/testing me. It's just impossible to read someone from a culture so different. I'm glad we didn't play poker, to put it like that.


That particular study is pretty bad. Look at the contestants' faces. They might be failing at identifying facial features or not accounting for lighting, tilt, etc. It is obvious if you look at their estimated ages.


Sorry but your knee-jerk "analysis" doesn't hold as much weight with me as the judgement of the program's actual authors.


They clearly got the ages wrong that's not really debatable. Do not take my word for it though. There are plenty of studies in this space that got wildly different results some of which mostly agree with hotornot ratings.


It's possible to predict race by looking at several other variables which are correlated to race. See Technical Appendix A and especially Table 8 for an example of predicting ethnicity using surname and state only: http://files.consumerfinance.gov/f/201409_cfpb_report_proxy-...

You can do this deliberately, or your machine learning model might do it spontaneously.


I don't deny that. Although I don't think anyone uses surnames as a feature, and that would be pretty blatant. You also need to use census data to really make use of that for less common last names. Anyone going out of their way to do this might as well just discriminate directly instead of this convoluted method.

It may be true that a sufficiently complex machine learned model could learn race as a feature, but again, why would it? It has no prejudice against specific races. Unless you really believe that black people are inherently more likely to get into car accidents, even after controlling for income and education, etc. And even then it doesn't hate black people, it's just doing it's best to predict risk as accurately as possible. It's not charging blacks, as a group, a higher rate than they cost, as a group.

I fail to see why people get so upset at the mere possibility of this. I think they are anthropomorphizing AI as if it was a human bigot that has an irrational hatred for other races, and strongly discriminates against them for no reason. This is more like giving people who live in neighborhoods with slightly higher accident rates, slightly higher insurance rates, to make up for their increased risk. Maybe it correlates with race, maybe it doesn't, it doesn't really matter.


Here's an article with more information on the potential for bias in algorithms: https://www.propublica.org/article/machine-bias-risk-assessm...

Based on a list of 137 questions[1] the Northpointe system predicts the risk of re-offending, and "blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend." Meanwhile, whites are "much more likely than blacks to be labeled lower risk but go on to commit other crimes."

In other words:

- a white person labeled high risk will re-offend 66.5% of the time

- a black person labeled high risk will re-offend 55.1% of the time

- a white person labeled low risk will re-offend 47.7% of the time

- a black person labeled high risk will re-offend 28% of the time.

The model is specifically avoiding race as an input, but still overestimates the danger of black recidivism, while underestimating white recidivism.

[1]https://www.documentcloud.org/documents/2702103-Sample-Risk-...


This article was posted below and it has serious problems with it: https://www.chrisstucchio.com/blog/2016/propublica_is_lying....



That research has some serious issues with it: https://www.chrisstucchio.com/blog/2016/propublica_is_lying....


"Extracting a model" refers to approximating someone else's black box outputs. You would be dissecting your own approximation, which could systematically be very different from whatever black box you're aiming to make inferences about, even if they both produce similar outputs.


If you put your discrimination detector on an API, you would enable the original model's creators to eliminate bias by training against it. Resulting in an anti-discriminative model, through a generative / discriminative process.


How would you go about aggregating an image library to properly gauge the classification results? Would you just use ImageNet content or try using something entirely new?


Is this even an issue? This strikes me as being economically infeasible given the size of training datasets.


"On Google’s platform for example, an extraction attack would cost less than $0.10, and subvert any further model monetization"


woah... this is the stuff of science fiction.

when the supreme ai gains the ability to thirst for knowledge, it will steal all the machine learning models via prediction APIs...


havent read yet, but am not expecting more than what we had in the 90s of trying to figure out search engines prioritization algos to use on our optimization ones.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: