Hacker News new | past | comments | ask | show | jobs | submit login
Fooling Neural Networks in the Physical World with 3D Adversarial Objects (labsix.org)
243 points by anishathalye on Nov 1, 2017 | hide | past | favorite | 72 comments

Hi HN! I'm one of the researchers that produced this result: we figured out how to make 3D adversarial objects (currently fabricated using full-color 3D printing technology) that consistently fool neural networks in the physical world. Basically, we have an algorithm that can take any 3D object and perturb it so it tricks a classifier into thinking it's something else (for any given target class).

Recently, there's been some debate about whether or not adversarial examples are a problem in the real world, and our research shows that this is a real concern, at least with current neural network architectures (nobody has managed to solve the problem of white-box adversarial examples yet).

I'm happy to answer any questions that anyone has!

This was a wondrous read. I had not expected that these things would be significant in real world context with free viewpoint.

Have you looked at whether the perturbation tend to do something to other networks than it's optimized for? If I ran for example VGG-19 and Resnet-50 alongside Inception V3, could I still be reasonably sure to get a correct best-of-three vote if I point the camera on your models?

Do you think your algorithm would support optimizing to fool several known networks in the same way simultaneously?

What we propose in our paper is a white-box attack, so given access to the classifier gradient, we can make adversarial examples. We don't investigate the "transferability" of our adversarial examples to other networks, though other work in this area has explored that problem (though, of course, not in the EOT / 3D adversarial example case).

I expect it wouldn't be difficult to defeat an ensemble, though (e.g. if you're averaging predictions, you can just differentiate through the averaging).

Apologies if I bring up things that can be found in the paper at a more careful read, but I probably won't have time to go through it properly before this HN post goes cold. I think I have fairly good notion of what you're doing though.

I share your expectations for an averaging ensemble, which is after all still for most purposes a single model, but let's say I'm concerned precisely about people trying to fool my networks like this and one of the things I do is check the consistency of answers between different models and if they mismatch over a certain threshold I might flag that for an extra check by releasing the hounds or something.

In that context I think it's of interest how the perturbation features you develop affect networks created with similar technology but different choices of architecture and hyperparameters. Are the foreign pertubations neutral there or do they have an effect? If there is an effect, to what extent is it consistent? To what extant can they be superposed in a way that is manageable for getting predictable results for different networks simultaneously? What fraction of the available texture area do you need to affect to get a reliable misclassification, and what is the 'perturbation capacity' of the available area? That last one I think is particularly interesting in your case where presumably you put much more constraint on the texture by requiring that it works for multiple viewpoints.

I totally respect if you, or indeed anyone, can't answer those questions yet, because of focus and stage of research. Personally I have only followed adversarial attacks very superficially so far, because IMO before what you just released it was a point of concern for the mechanics of the ANNs (and inspiration for some good ideas) but for practical purposes more of a curiosity than a demonstrated real concern in applications. (If you're allowed to show people deceptively crafted scenes from exactly the right perspective point they fail too. Just look at Ames rooms. But good luck making that into a significant real-world exploit on humans.)

Any publications you'd care to recommend in the transferability subfield?

I agree - it's a surprising and cool paper. There has been some work done on fooling network ensembles by constructing constructing a Bayesian posterior over weights using dropout [0]. This is an ensemble of weights for the same network, not over different architectures, however.

The basic idea here is that most of the time, each member of the ensemble will misclassify the adversarial example in a different way. This means that the posterior predictive distribution for adversarial examples ends up much broader, and you can detect them this way.

Surprisingly, even this can be beaten in the white-box case [1], although it's by far the hardest to beat of the current adversarial defences, and needs much more distortion. It's beaten exactly as the GP says, by differentiating through the averaging. AFAIK no-one's tried architecture ensembling, but I expect it would be vulnerable to the same technique.

[0] https://arxiv.org/abs/1703.00410 [1] https://arxiv.org/abs/1705.07263

Amazing results! To be fair, the baseball looks like crema or coffee stained milk foam even to me... ;-)


The unperturbed baseball also looks like that -- the model we found is of a dirty baseball :P

We probably should have included a photo of that in our blog post, but you can see our paper if you're interested in seeing what the original looks like.

Awesome can I buy an adversarial tshirt?

It would be a neat trick to fool face scanners, for bonus points the shirt should just have a face of Thomas Crown: https://www.youtube.com/watch?v=C3rDWENRI7c

Or I guess it's more effective to fool the scanners to think you're a dog/cat instead of a potential terrorist (because everyone is one nowadays, right NSA?).

Congratulations to you and your team for inventing what i'd call Neural Network Trolling (NNT), and thus giving us humans a way to escape the tyranny of automated decision systems.

The consequences for the future can be enormous, seriously.

White box adversarial examples are not new. The novelty here is producing physical objects that behave like adversarial examples.

Hi there!

What do you think are currently the best method to fight adversarial weakness?

Would simply generating an augmented training set including adversarial examples be sufficient? (in your case, for example, you would include random poses of the adversarial 3d models in the training set) Or do you think a totally new architecture of training method is necessary to deal with it?

And if it's the latter, do you think this is a symptom of a more general shortcoming of current architectures, or a localized issue?

Thanks, great work btw!

Another author here - adversarial training is not sufficient to protect against white-box attacks but it seems to be the best method we have so far (https://arxiv.org/abs/1706.06083).

It seems that all current state of the art architectures are vulnerable to adversarial examples; to the best of my knowledge there are no image classification networks that researchers have failed to reliably produce adversarial examples for.

The best method for neural networks robust to l_infinity bounded iterative adversarial attacks is this https://openreview.net/forum?id=S18Su--CW (although it is still currently under review).

Looks like a very nice paper, thanks.

I see. I ask because there's a common observation that a human (and perhaps an AGI) would never really even run the risk of confusing say a turtle with a riffle, which I tend to agree with.

Part of it is mistaking the forest for the trees. It may be just an artifact from the requirements we place on image classifiers (which are very lax) and the way we train them, not anything fundamental.

Indeed I believe we tend to think with more solid logic, specially when the decision becomes difficult. A DNN will look at the statistics of a feature set and make judgement upon that. A human can categorically reject certain hypothesis from definition requirements: a Riffle is a weapon. It must have a barrel to guide the projectile, and a muzzle for it to exit. It must have some firing mechanism (usually a trigger). Even if at a glance we get confused by exactly what an image is picturing, we can make quick logical judgements on sub-features to make epsilon-misclassifications almost impossible.

A network that would act that way would need some recursive behavior (to implement the varialble-time classification efficiently), a recursive "logic module" or "language module" plugged into the end of naive feature classification.

You're assuming we couldn't make adversarial examples with white box access to your brain. It's entirely possible that such examples do exist.

We don't need white box access: https://i.imgur.com/mOTHgnfl.jpg

That's an adversarial picture, but not nearly as dramatic as the epsilon-adversarial examples -- and in particular there are no logical inconsistencies (confusing a cat for guacamole or a turtle for a riffle). I don't think anyone doubts a form of camouflage or isomorphic deception is unavoidable.

I don't doubt there are weaknesses, or even fundamental lack of interpretability for classification spaces that overlap (i.e. have a "morphing sequence"). One example that sticks out from my childhood is this image:

What do you see here?


My entire childhood I saw a weird face (without thinking too much about it). This is the logo of a brazilian beer brand. When I was a teenager I saw an ad with two penguins, and then it clicked.

Optical illusions are well documented too, some classic examples:




But it's quite probable we'd found any glaring adversarial issues by now. After all artists can conduct a semi-whitebox, mostly blackbox adversarial optimization of illusions (I'm sure some process like this is how they came up with the Old/Young lady illusion). Note however that even those are particular errors like incorrect brightness estimation, or near-complete dichotomy (it's not that either the young or old interpretation are incorrect in some sense). An epsilon-failure seems much more difficult to come up with. The distinction seems mostly in the ability to apply basic logic on top of pure pattern recognition, sort of greatly enhancing the decision boundaries through recursive thought. Eventually logical features (a quick "proof" of sorts) win. You "prove" that what you're seeing indeed can't be a rifle, it must be a weird turtle.

This logical approach probably has gradations in power. In general it might even be algorithmically undecidable whether an image is logically valid. Generalizations of Escher illusions:


come to mind. In practice we cut off the decision process when we've found main logical features and connections, so unless the image inspires superficial uncertainty a deep logical inconsistency could slip by (that a more intelligent person may not miss by universally applying a deeper consistency check).

How are you optimizing over textures? Do you have a differentiable renderer? Or are you not calculating those gradients via backpropagation?

Yeah, we implemented a differentiable renderer.

Please release it! There are a lot of applications for a differentiable renderer that's compatible with DNN libraries.

Yeah, totally agree that it could be useful to the community -- open-sourcing it under a permissive license is on our todo list :)

It's a reasonable amount of effort to disentangle it from the rest of our code and build a nice API, so it will be several weeks till we get around to this.

please do, this could be a huge contribution to the community.

A very interesting result with big implications for the future AI arms race :) Congratulations and thanks for sharing!

Do you know how exactly the distorted object differs from the original? Is it based on color? lighting? Some subtle difference in texture or material?

I know that for 2D images there currently exists something like this where some sort of pixelated noise (color?) is added to the image to force an incorrect classification.

We only modify the texture; we retain the original shape of the object.

If you look closely and compare the unperturbed original with the rifle turtle, you can spot some differences.

If the adversarial turtle is far enough from the camera that it cannot discriminate the detail of the texture, does it get recognized as a turtle?

Was it an MCor printer? Which prints did you use to correct for color artifacts in the print process?

We got all our prints through zverse.com, and we used full color CYMK printing.

We didn't correct for color artifacts; we just used a wide distribution with EOT (we just modeled color inaccuracy rather than trying to make a color map or something).

If you added the y-axis from your cat figures to your turtle videos, what would we see?

What is your team mascot and why is it Winnie the Pooh dressed as a rain cloud?

The inception network tested is not trained for multilabel classification problem, I believe it would be harder to function against a multi label network such as inception trained on Open Images dataset.

What "high risk" real world systems use deep learning?

Maybe self-driving car vision systems? I'm not sure if this is how they're actually implemented, but it would make sense to me.

Well the article asserts this, so I'm asking for clarification on what systems they were thinking of.

I wonder why the network picks a riffle.

Do you have any explanation about that?

It's a targeted adversarial example, we can do this for _any_ target class. We chose for our demo example to turn the turtle into a rifle, which was chosen uniformly at random from the 1000 ImageNet classes.

In our paper, we have many more examples with different target classes, all chosen uniformly at random.

OK. Many thanks for the details.

Have you tried a black box attack using transfer learning on a production system ?

So someday, Hypothetical Nation#1 captures one of Hypothetical Nation#2’s optically-guided missiles that uses a neural network to distinguish friend from foe. N#1 technicians download the network weights and use this to generate perturbatory paintjobs for their fighter jets, making N#2’s missiles recognize N#1’s planes as various models of N#2’s civilian aircraft. Before N#2 can refit their missiles with a retrained neural network, N#1 launches a massive offensive and decisively takes control of the Hypothetical South China Sea, or something.

Do I have that right?

In theory that's possible, but every ML researcher is aware of adversarial examples. In any high-stakes use case with a high chance of adversaries, I doubt people will use neural networks if classical algorithms will suffice, and if they do, they will definitely test and design for the existence of adversarial examples.

Or Uber technicians will download weights from a Google car, and derive an adversarial paintjob for billboards or other Uber cars.

It's more likely something that would make civilian smart tech useless against the top 5-10 militaries.

A military guidance system would overtrain a network using a supercomputer and then cryptographically make it stupider so they could give every missile 5 individually generated networks that vote, forcing attackers to compromise 3 networks per missile. (There may be some back and forth for a while, but here you can create a diversity of networks that makes the ensemble hard to attack.)

So they'll just shoot them down with lasers cause that's less energy intensive than running the compute node to crack missiles in real time.

You just need a second neural net that classifies adversarial and non-adversarial objects.

And of course a third network to classify the adversarial objects that slip through the second.

Reminds me of the record player chapter from GEB [1]

[1] https://genius.com/Douglas-hofstadter-contracrostipunctus-an...

After a couple of iterations, an adversarial object will become the adversarial object for humans too. That is a military plane disguised as a civil plane. It is probably forbidden by some convention.

I am beginning to realize that neural networks have their own class of “vulnerabilities” that are not the same as other software bugs (implementation errors, etc) but are at the same time serious functional flaws. Like “oh I found the bug in your program! Here you import an older CNN, which last year was found to silently fail under this specific set of lighting and lens conditions. You need to update to the latest version and the problem will go away.”

One of the major structural problems with using neural networks in production is that all failures are silent failures. You could create an ensemble of models that base their conclusions on reliably different features and report on disagreements with the ensemble decision, but that doesn't actually tell you which models were in error, just that some models were in error. Also, reliably determining that models are using different features is difficult.

Get rich slow scheme: take out a patent for clothing embedded with adversarial objects. Fashion which confuses our robot overlords is almost certain to become chic one day in the not too distant future.

On the flip side, someone can use this as a feature. You can create hidden messages in 3d objects that can only be revealed in a neural net's wrong classification

Please correct me if I am interpreting this incorrectly. I read the paper and it sounds like you retrained the softmax layer on Inception to classify the 3-D printed turtle as a rifle. In that case, you would have overwritten Inception's original representation of what a rifle looks like. Did you test out what would happen if you put a picture of a rifle in front of the camera? How would the network now classify the rifle?

They're not changing the original network. That would not be very interesting. They're generating objects that fool the correctly trained network.

>given access to the classifier gradient, we can make adversarial examples

It seems like they are finding little "inflection points" in the trained network where a small, well-placed change of input can flip the result to something different. With the rise of "AI for AI", I imagine this is something that could be optimized against.

In the turtle example, it seems that google's classifier has found that looking for a couple specific things (mostly a trigger in this case) identifies a gun better than looking for the entirety of a gun. Perhaps optimizing against these inflection points will force the classifier to have a better understanding of the objects it is classifying and lead to better results in non-adversarial situations.

Reading this article along with the following one, is striking: https://blogs.nvidia.com/blog/2017/11/01/gtc-dc-project-mave...

Crazy to think we've built optical software smart enough to suffer from its own kind of optical illusions, which is effectively what these models are.

> smart enough

I'd suggest that if it was "smart enough", it wouldn't be mistaking a turtle for a rifle.

Well, we are the greatest intelligence yet discovered in the universe and can't tell what color a dress is [1].

[1]: https://en.wikipedia.org/wiki/The_dress

Couldn't you have obtained the same result by painting a rifle on the back of the turtle?

If I stick a picture of a dog on a car and my neural net detects a dog instead of a car, can I claim that I've invented an adversarial generator?

A bunch of armchair devil's advocating here, but is it really the NN that's fooled or the humans? The adversarial turtle isn't a real turtle, so the human is wrong in judging it as that. The NN is presumably seeing features of a rifle camouflaged in the surface of the object - which are really there but our human brain decides the turtle-ness is more important and is very confident that it's only a turtle despite having a rifle stock on it. Since a real turtle would never have those markings, it's not obvious to me that this object should be called a turtle. The NN could be doing a super-human job of detecting that it's not a turtle, but fails in identifying what it really is. Maybe this weakness of the NN would actually make it perform better than a human at picking out camouflaged objects where humans are distracted by the shape of the outline but the NN looks more at the texture.

> is it really the NN that's fooled or the humans?

Given that the object is clearly more like a turtle than a rifle in every regard, I'm gonna give a win to team humans on this one.

'Course, I can't help but be a biased referee...

> Maybe this weakness of the NN would actually make it perform better than a human at picking out camouflaged objects

Well, it failed just now, against a camouflaged turtle.

To make your experiment more interesting you could theoretically (to avoid torture) overpaint real living turtle. Who would be more wrong then? But this work gives bunch of hope for fooling future security check machines - these will see your riffles as turtles.

Would you say the same after being shot by a pistol painted to classify as a lunchbox?

Please don't talk about murdering the person you're talking to. It's intented to provoke painful emotion.

I'll clarify my comment though. The object is really a model that's shaped like a turtle but with pictures of rifle parts on it. It's neither a rifle nor a turtle. Both human and computer are too confident in their classification and both are just as wrong.

Using your analogy. You could actually hide gun inside a lunchbox and fool humans.

I did not intend to provoke painful emotion.

>It's neither a rifle nor a turtle.

I disagree. Would also say the pistol is no longer a pistol?

That doesn't follow because the pistol painted to classify as a lunch box is still a pistol.

The "Turtle" is a plastic replica of a turtle not a real turtle. It's the treachery of images idea - "Ceci n'est pas une pipe."

Humans see its form and recognize the plastic replica to be a representation of a turtle because we prioritize its shape over its textured image which seems more correct to us, but I'm not sure that it really is more correct in some objective way. In this case I suppose you could say it is because a turtle is what we mean for it to represent, but the test seems rigged in favor of human visual classification.

I think an interesting question is what adversarial attacks exist on human vision that may not affect a machine (certain optical illusions?). If we're also vulnerable to this kind of manipulation then it may not be something unique to computer vision we may just be picking test cases that we're better at. Then it's just a matter of tradeoffs and deciding when we want human style classification.

It's a plastic replica of a turtle with an artificial rifle texture.

The human's error is in missing the texture. The computer's error is worse, it misses the turtle and thinks the texture is an actual rifle.

I agree, but it's an unfair test - it was designed to confuse the computer and not the human.

For a counter example - imagine that you make a toaster that looks exactly like a pistol, but it actually just toasts bread.

A human would think it's a pistol when looking at it (so would the machine in this case). There may be adversarial examples where the human classification is worse than the machine if you specifically try and make examples that are bad for the human visual system.

Can you explain why it thinks the turtle is a rifle?

If you freeze the frame and look at the shell closely theres a trigger and stock that is warped. On the underside the fins have gun features as well.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact