
Adversarial Learning for Good: On Deep Learning Blindspots - doener
http://blog.kjamistan.com/adversarial-learning-for-good-my-talk-at-34c3-on-deep-learning-blindspots/
======
daenz
The video example of a turtle being misclassified as a rifle is pretty scary
[https://www.youtube.com/watch?v=piYnd_wYlT8](https://www.youtube.com/watch?v=piYnd_wYlT8)

We're approaching a world where AI will be more and more relied upon in
dangerous situations. Imagine someone getting killed for something ridiculous
like inadvertently holding an adversarial example. Public trust would have a
hard time recovering.

~~~
oh_sigh
Would adding random noise to an image before running it through the classifier
mitigate these kinds of attacks?

~~~
yorwba
No, ironically because classifiers are trained to be robust to noise.
Adversarial examples are generated by altering the original in one specific
dimension of a high-dimensional vector space. Random noise is vanishingly
unlikely to undo that transformation, so the classification will stay wrong.

------
torthrw
Re: adversarial ML & steganography. There were 2 papers at this year's NIPS
studying these ideas ([http://papers.nips.cc/paper/6802-hiding-images-in-
plain-sigh...](http://papers.nips.cc/paper/6802-hiding-images-in-plain-sight-
deep-steganography) and [http://papers.nips.cc/paper/6791-generating-
steganographic-i...](http://papers.nips.cc/paper/6791-generating-
steganographic-images-via-adversarial-training))

------
pyvpx
Video can be found here:
[https://media.ccc.de/v/34c3-8860-deep_learning_blindspots](https://media.ccc.de/v/34c3-8860-deep_learning_blindspots)

------
beefield
I wonder how sensitive these results are. I mean, if you run one more learning
round of the network, does the rifle turn to turtle in the eyes of the network
immediately so that you would need to generate a new turtle for every single
network? Or does that turtle look like rifle for all current neural networks?
Or, most likely, somewhere in between?

~~~
marcosdumay
If you don't change the data, why would the network change its results with
just some more learning iterations? It's already converging.

~~~
beefield
Because if I understtod anything correctly, the methods tries to find the
smallest possible changes that causes the network to make the incorrect
classofication. And these smallest possible changes just might be very
sensitive to whatever small random ripple weight changes a network has. But I
am far from neural network expert, so I can't really answer.

------
leastangle
I think it would be beneficial to explicitly mention model extraction attacks
since they (kind of) enable such attacks:
[https://news.ycombinator.com/item?id=12557782](https://news.ycombinator.com/item?id=12557782)

