
Fooling Neural Networks in the Physical World with 3D Adversarial Objects - anishathalye
http://www.labsix.org/physical-objects-that-fool-neural-nets/
======
anishathalye
Hi HN! I'm one of the researchers that produced this result: we figured out
how to make 3D adversarial objects (currently fabricated using full-color 3D
printing technology) that consistently fool neural networks in the physical
world. Basically, we have an algorithm that can take any 3D object and perturb
it so it tricks a classifier into thinking it's something else (for any given
target class).

Recently, there's been some debate about whether or not adversarial examples
are a problem in the real world, and our research shows that this is a real
concern, at least with current neural network architectures (nobody has
managed to solve the problem of white-box adversarial examples yet).

I'm happy to answer any questions that anyone has!

~~~
etiam
This was a wondrous read. I had not expected that these things would be
significant in real world context with free viewpoint.

Have you looked at whether the perturbation tend to do something to other
networks than it's optimized for? If I ran for example VGG-19 and Resnet-50
alongside Inception V3, could I still be reasonably sure to get a correct
best-of-three vote if I point the camera on your models?

Do you think your algorithm would support optimizing to fool several known
networks in the same way simultaneously?

~~~
andrewilyas
What we propose in our paper is a white-box attack, so given access to the
classifier gradient, we can make adversarial examples. We don't investigate
the "transferability" of our adversarial examples to other networks, though
other work in this area has explored that problem (though, of course, not in
the EOT / 3D adversarial example case).

I expect it wouldn't be difficult to defeat an ensemble, though (e.g. if
you're averaging predictions, you can just differentiate through the
averaging).

~~~
etiam
Apologies if I bring up things that can be found in the paper at a more
careful read, but I probably won't have time to go through it properly before
this HN post goes cold. I think I have fairly good notion of what you're doing
though.

I share your expectations for an averaging ensemble, which is after all still
for most purposes a single model, but let's say I'm concerned precisely about
people trying to fool my networks like this and one of the things I do is
check the consistency of answers between different models and if they mismatch
over a certain threshold I might flag that for an extra check by releasing the
hounds or something.

In that context I think it's of interest how the perturbation features you
develop affect networks created with similar technology but different choices
of architecture and hyperparameters. Are the foreign pertubations neutral
there or do they have an effect? If there is an effect, to what extent is it
consistent? To what extant can they be superposed in a way that is manageable
for getting predictable results for different networks simultaneously? What
fraction of the available texture area do you need to affect to get a reliable
misclassification, and what is the 'perturbation capacity' of the available
area? That last one I think is particularly interesting in your case where
presumably you put much more constraint on the texture by requiring that it
works for multiple viewpoints.

I totally respect if you, or indeed anyone, can't answer those questions yet,
because of focus and stage of research. Personally I have only followed
adversarial attacks very superficially so far, because IMO before what you
just released it was a point of concern for the mechanics of the ANNs (and
inspiration for some good ideas) but for practical purposes more of a
curiosity than a demonstrated real concern in applications. (If you're allowed
to show people deceptively crafted scenes from _exactly_ the right perspective
point they fail too. Just look at Ames rooms. But good luck making that into a
significant real-world exploit on humans.)

Any publications you'd care to recommend in the transferability subfield?

~~~
tMcGrath
I agree - it's a surprising and cool paper. There has been some work done on
fooling network ensembles by constructing constructing a Bayesian posterior
over weights using dropout [0]. This is an ensemble of weights for the same
network, not over different architectures, however.

The basic idea here is that most of the time, each member of the ensemble will
misclassify the adversarial example in a different way. This means that the
posterior predictive distribution for adversarial examples ends up much
broader, and you can detect them this way.

Surprisingly, even this can be beaten in the white-box case [1], although it's
by far the hardest to beat of the current adversarial defences, and needs much
more distortion. It's beaten exactly as the GP says, by differentiating
through the averaging. AFAIK no-one's tried architecture ensembling, but I
expect it would be vulnerable to the same technique.

[0] [https://arxiv.org/abs/1703.00410](https://arxiv.org/abs/1703.00410) [1]
[https://arxiv.org/abs/1705.07263](https://arxiv.org/abs/1705.07263)

------
maxander
So someday, Hypothetical Nation#1 captures one of Hypothetical Nation#2’s
optically-guided missiles that uses a neural network to distinguish friend
from foe. N#1 technicians download the network weights and use this to
generate perturbatory paintjobs for their fighter jets, making N#2’s missiles
recognize N#1’s planes as various models of N#2’s civilian aircraft. Before
N#2 can refit their missiles with a retrained neural network, N#1 launches a
massive offensive and decisively takes control of the Hypothetical South China
Sea, or something.

Do I have that right?

~~~
cortesoft
You just need a second neural net that classifies adversarial and non-
adversarial objects.

~~~
kkon
And of course a third network to classify the adversarial objects that slip
through the second.

Reminds me of the record player chapter from GEB [1]

[1] [https://genius.com/Douglas-hofstadter-contracrostipunctus-
an...](https://genius.com/Douglas-hofstadter-contracrostipunctus-annotated)

~~~
red75prime
After a couple of iterations, an adversarial object will become the
adversarial object for humans too. That is a military plane disguised as a
civil plane. It is probably forbidden by some convention.

------
TaylorAlexander
I am beginning to realize that neural networks have their own class of
“vulnerabilities” that are not the same as other software bugs (implementation
errors, etc) but are at the same time serious functional flaws. Like “oh I
found the bug in your program! Here you import an older CNN, which last year
was found to silently fail under this specific set of lighting and lens
conditions. You need to update to the latest version and the problem will go
away.”

~~~
dodobirdlord
One of the major structural problems with using neural networks in production
is that all failures are silent failures. You could create an ensemble of
models that base their conclusions on reliably different features and report
on disagreements with the ensemble decision, but that doesn't actually tell
you which models were in error, just that some models were in error. Also,
reliably determining that models are using different features is difficult.

------
Tomminn
Get rich slow scheme: take out a patent for clothing embedded with adversarial
objects. Fashion which confuses our robot overlords is almost certain to
become chic one day in the not too distant future.

------
blancotech
On the flip side, someone can use this as a feature. You can create hidden
messages in 3d objects that can only be revealed in a neural net's wrong
classification

------
rgoti
Please correct me if I am interpreting this incorrectly. I read the paper and
it sounds like you retrained the softmax layer on Inception to classify the
3-D printed turtle as a rifle. In that case, you would have overwritten
Inception's original representation of what a rifle looks like. Did you test
out what would happen if you put a picture of a rifle in front of the camera?
How would the network now classify the rifle?

~~~
readams
They're not changing the original network. That would not be very interesting.
They're generating objects that fool the correctly trained network.

~~~
jon_richards
>given access to the classifier gradient, we can make adversarial examples

It seems like they are finding little "inflection points" in the trained
network where a small, well-placed change of input can flip the result to
something different. With the rise of "AI for AI", I imagine this is something
that could be optimized against.

In the turtle example, it seems that google's classifier has found that
looking for a couple specific things (mostly a trigger in this case)
identifies a gun better than looking for the entirety of a gun. Perhaps
optimizing against these inflection points will force the classifier to have a
better understanding of the objects it is classifying and lead to better
results in non-adversarial situations.

------
scalablenotions
Reading this article along with the following one, is striking:
[https://blogs.nvidia.com/blog/2017/11/01/gtc-dc-project-
mave...](https://blogs.nvidia.com/blog/2017/11/01/gtc-dc-project-maven-jack-
shanahan/)

------
munificent
Crazy to think we've built optical software smart enough to suffer from its
own kind of optical illusions, which is effectively what these models are.

~~~
zimpenfish
> smart enough

I'd suggest that if it was "smart enough", it wouldn't be mistaking a turtle
for a rifle.

~~~
munificent
Well, we are the greatest intelligence yet discovered in the universe and
can't tell what color a dress is [1].

[1]:
[https://en.wikipedia.org/wiki/The_dress](https://en.wikipedia.org/wiki/The_dress)

------
amelius
Couldn't you have obtained the same result by painting a rifle on the back of
the turtle?

------
tomdre
If I stick a picture of a dog on a car and my neural net detects a dog instead
of a car, can I claim that I've invented an adversarial generator?

------
averagewall
A bunch of armchair devil's advocating here, but is it really the NN that's
fooled or the humans? The adversarial turtle isn't a real turtle, so the human
is wrong in judging it as that. The NN is presumably seeing features of a
rifle camouflaged in the surface of the object - which are really there but
our human brain decides the turtle-ness is more important and is very
confident that it's only a turtle despite having a rifle stock on it. Since a
real turtle would never have those markings, it's not obvious to me that this
object should be called a turtle. The NN could be doing a super-human job of
detecting that it's not a turtle, but fails in identifying what it really is.
Maybe this weakness of the NN would actually make it perform better than a
human at picking out camouflaged objects where humans are distracted by the
shape of the outline but the NN looks more at the texture.

~~~
hnaccy
Would you say the same after being shot by a pistol painted to classify as a
lunchbox?

~~~
averagewall
Please don't talk about murdering the person you're talking to. It's intented
to provoke painful emotion.

I'll clarify my comment though. The object is really a model that's shaped
like a turtle but with pictures of rifle parts on it. It's neither a rifle nor
a turtle. Both human and computer are too confident in their classification
and both are just as wrong.

Using your analogy. You could actually hide gun inside a lunchbox and fool
humans.

~~~
hnaccy
I did not intend to provoke painful emotion.

>It's neither a rifle nor a turtle.

I disagree. Would also say the pistol is no longer a pistol?

~~~
fossuser
That doesn't follow because the pistol painted to classify as a lunch box is
still a pistol.

The "Turtle" is a plastic replica of a turtle not a real turtle. It's the
treachery of images idea - "Ceci n'est pas une pipe."

Humans see its form and recognize the plastic replica to be a representation
of a turtle because we prioritize its shape over its textured image which
seems more correct to us, but I'm not sure that it really _is_ more correct in
some objective way. In this case I suppose you could say it is because a
turtle is what we mean for it to represent, but the test seems rigged in favor
of human visual classification.

I think an interesting question is what adversarial attacks exist on human
vision that may not affect a machine (certain optical illusions?). If we're
also vulnerable to this kind of manipulation then it may not be something
unique to computer vision we may just be picking test cases that we're better
at. Then it's just a matter of tradeoffs and deciding when we want human style
classification.

~~~
hnaccy
It's a plastic replica of a turtle with an artificial rifle texture.

The human's error is in missing the texture. The computer's error is worse, it
misses the turtle and thinks the texture is an actual rifle.

~~~
fossuser
I agree, but it's an unfair test - it was designed to confuse the computer and
not the human.

For a counter example - imagine that you make a toaster that looks exactly
like a pistol, but it actually just toasts bread.

A human would think it's a pistol when looking at it (so would the machine in
this case). There may be adversarial examples where the human classification
is worse than the machine if you specifically try and make examples that are
bad for the human visual system.

------
Danihan
Can you explain why it thinks the turtle is a rifle?

~~~
nametube
If you freeze the frame and look at the shell closely theres a trigger and
stock that is warped. On the underside the fins have gun features as well.

