
Attacking Machine Learning with Adversarial Examples - Chris2048
https://blog.openai.com/adversarial-example-research/
======
craigching
Previous discussion:
[https://news.ycombinator.com/item?id=13662165](https://news.ycombinator.com/item?id=13662165)

------
mholt
Since this blog post was written, the same authors have published further
research on transferability which is what really makes these hard to defend
against. They even present a proof of concept task that is resistant to
transferring (although it is not super practical):
[https://arxiv.org/abs/1704.03453](https://arxiv.org/abs/1704.03453)

If it interests anyone, I wrote up a bit about this and other threats related
to machine learning models, a few weeks ago:
[https://matt.life/papers/security_privacy_neural_networks.pd...](https://matt.life/papers/security_privacy_neural_networks.pdf)

~~~
Sniffnoy
A note -- if you're linking to arXiv, it's better to link to the abstract
([https://arxiv.org/abs/1704.03453](https://arxiv.org/abs/1704.03453)) rather
than directly to the PDF. From the abstract, one can easily click through to
the PDF; not so the reverse. And the abstract allows one to do things like see
different versions of the paper, search for other things by the same authors,
etc. Thank you!

~~~
mholt
Good point, I updated my link.

------
plafl
From what little I have read on the topic adversarial attacks are succesfull
on neural networks because in order to train them succesfully we design them
to operate on a linear regime as much as possible. This linearity compounded
with the high dimensionality of the problem (images) is what makes easy to
find small perturbations that make the output oscillate wildly. My point is
that "attacking machine learning" maybe should be renamed "attacking neural
networks"

------
quinnftw
I wonder if one could introduce a secondary classifier which is immune (or
more resistant) to this kind of attack as a fail safe. One idea that comes to
mind is to back the neural net with a random forest, which I imagine would be
very hard to trick with this kind of attack as a collection of independent
(key) weak learners are trained on the data. To trick a random forest, you
would have to trick the majority of the trees within it.

~~~
derEitel
Here is a paper from Bosch in that direction, it uses a second network to
classify examples as adversarials:
[https://arxiv.org/abs/1702.04267](https://arxiv.org/abs/1702.04267)

Using a fail safe network is hard because adversarial examples usually have a
high accuracy at a false class. So using an accuracy threshold in the main
network wouldn't work. Using a network as described in the paper and then a
different kind of classifier might be worth trying. But it has also been shown
that adversarial examples can transfer to different kind of models (don't know
if random forests have been tried as well).

------
eggie5
Apple has a big problem w this in their personalized app store ranking
(recommender system) -- people gaming it.

