A Guide to Synthesizing Adversarial Examples

anishathalye · on July 25, 2017

Last week, I wrote a blog post (https://blog.openai.com/robust-adversarial-inputs/) about how it's possible to synthesize really robust adversarial inputs for neural networks. The response was great, and I got several requests to write a tutorial on the subject because what was already out there wasn't all that accessible. This post, written in the form of an executable Jupyter notebook, is that tutorial!

Security/ML is a fairly new area of research, but I think it's going to be pretty important in the next few years. There's even a very timely Kaggle competition about this (https://www.kaggle.com/c/nips-2017-defense-against-adversari...) run by Google Brain. I hope that this blog post will help make this really neat area of research slightly more approachable/accessible! Also, the attacks don't require that much compute power, so you should be able to run the code from the post on your laptop.

zitterbewegung · on July 25, 2017

Thank you very much! I requested this and I think I might have possibly come off as rude or entitled but I am so happy you donated your time to create this tutorial!

anishathalye · on July 25, 2017

Oh, I didn't take it that way at all. In fact, I thought it was a great idea! :)

sjroot · on July 25, 2017

Thanks so much for taking the time to write this post. I am a graduate student in computer security interested in branching into adversarial ML, and I have to echo your opinion that the material available today is not too accessible.

jcims · on July 25, 2017

I'm a dinosaur with 20 years in infosec and interested in the same thing. The role of the defender is going to become very interesting very quickly.

sp332 · on July 25, 2017

Does it take a lot longer to synthesize the robust version than the naive version?

anishathalye · on July 25, 2017

It does take longer to synthesize robust adversarial examples - how much longer depends on the distribution of transformations. The rotation example in this blog post was pretty quick to create, maybe a minute or two on a Titan X. The printed-out version in the OpenAI blog post took something like 20 minutes on a Titan X.

0xdeadbeefbabe · on July 25, 2017

> This adversarial image is visually indistinguishable from the original, with no visual artifacts. However, it’s classified as “guacamole” with high probability!

May "guacamole" become as prominent as "Alice and Bob".

dropalltables · on July 25, 2017

This is delightful. As someone who uses AI/ML/MI/... for security, I find there is not nearly enough understanding for how attackers can subvert decision systems in practice.

Keep up the good work!

zitterbewegung · on July 25, 2017

I was introduced to this from a Defcon panel I went to in 2016. See https://www.youtube.com/watch?v=JAGDpJFFM2A . It gives a good conceptual overview.

jwatte · on July 26, 2017

I have the feeling that the fact that imperceptible peturbations changes the labels, means that our networks/models don't yet look at the "right" parts of the input data.

Hopefully, this means research will focus on more robust classifiers based on weakness identified by adversarial approaches!

lacksconfidence · on July 25, 2017

Is the next step generating adversarial examples and injecting them into the training pipeline?

blt · on July 25, 2017

I think there have been some papers on that already. Sorry, I don't know them off the top of my head. It's definitely a good idea.

moyix · on July 26, 2017

Yes:

> Adversarial training seeks to improve the generalization of a model when presented with adversarial examples at test time by proactively generating adversarial examples as part of the training procedure. This idea was first introduced by Szegedy et al. [SZS13] but was not yet practical because of the high computation cost of generating adversarial examples. Goodfellow et al. showed how to generate adversarial examples inexpensively with the fast gradient sign method and made it computationally efficient to generate large batches of adversarial examples during the training process [GSS14]. The model is then trained to assign the same label to the adversarial example as to the original example—for example, we might take a picture of a cat, and adversarially perturb it to fool the model into thinking it is a vulture, then tell the model it should learn that this picture is still a cat. An open-source implementation of adversarial training is available in the cleverhans library and its use illustrated in the following tutorial.

http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-...

https://arxiv.org/abs/1312.6199

https://arxiv.org/abs/1412.6572

bane · on July 26, 2017

I was really inspired by this paper at USENIX [1]. This looks like very very early research, but the outline it provides leaves lots of room for adversarial ML research.

Bonus, if you tackle this problem you get several semi-orthogonal technologies for "free".

1 - https://www.usenix.org/system/files/conference/cset16/cset16...

yters · on July 25, 2017

If it is so easy to fool deep learning, why is it so hyped? Seems a great security risk.

suryabhupa · on July 25, 2017

Many machine learning and reinforcement learning models are susceptible to adversarial attacks; it's not unique to deep learning. However, because so many systems that are currently deployed in applications use deep learning, it's under particular scrutiny.

yters · on July 26, 2017

Then it seems the hype of machine learning is not well founded. Machine learning in general is a big risk if it so easily fooled.

skykooler · on July 25, 2017

It's hyped because it can guess the right answer "most of the time". As long as it's working with a "random" input (not someone intentionally trying to fool it) it works well; and this is all most people care about.

jcims · on July 25, 2017

This seems like a direct argument against camera only systems for autonomous vehicles.

jwatte · on July 26, 2017

Robustly synthesizing and delivering adversarial input to cameras in the wild in real time is a very different problem. Easier to just blind then with a laser. Which you could also do to humans, if you're adversarial enough.

Twisell · on July 26, 2017

But randomly "running into" a naturaly generated adversial input in the wild should be a more common problem.

The end result will vary depending if the adversial input is a child or the rear end of a truck illuminated by sunshine at a certain angle.

So far only the second one have been tested IRL but for some reason I'm not really fond of the idea that we should be gathering more field data of adversial input...

My main fear with current autonomous driving applications of ML is that they might not be that ready for prime time and that we are only a few deadly accidents away from a major setback in public trust relating to autonomous driving.