Recently, there's been some debate about whether or not adversarial examples are a problem in the real world, and our research shows that this is a real concern, at least with current neural network architectures (nobody has managed to solve the problem of white-box adversarial examples yet).
I'm happy to answer any questions that anyone has!
Have you looked at whether the perturbation tend to do something to other networks than it's optimized for?
If I ran for example VGG-19 and Resnet-50 alongside Inception V3, could I still be reasonably sure to get a correct best-of-three vote if I point the camera on your models?
Do you think your algorithm would support optimizing to fool several known networks in the same way simultaneously?
I expect it wouldn't be difficult to defeat an ensemble, though (e.g. if you're averaging predictions, you can just differentiate through the averaging).
I share your expectations for an averaging ensemble, which is after all still for most purposes a single model, but let's say I'm concerned precisely about people trying to fool my networks like this and one of the things I do is check the consistency of answers between different models and if they mismatch over a certain threshold I might flag that for an extra check by releasing the hounds or something.
In that context I think it's of interest how the perturbation features you develop affect networks created with similar technology but different choices of architecture and hyperparameters. Are the foreign pertubations neutral there or do they have an effect? If there is an effect, to what extent is it consistent?
To what extant can they be superposed in a way that is manageable for getting predictable results for different networks simultaneously?
What fraction of the available texture area do you need to affect to get a reliable misclassification, and what is the 'perturbation capacity' of the available area?
That last one I think is particularly interesting in your case where presumably you put much more constraint on the texture by requiring that it works for multiple viewpoints.
I totally respect if you, or indeed anyone, can't answer those questions yet, because of focus and stage of research. Personally I have only followed adversarial attacks very superficially so far, because IMO before what you just released it was a point of concern for the mechanics of the ANNs (and inspiration for some good ideas) but for practical purposes more of a curiosity than a demonstrated real concern in applications. (If you're allowed to show people deceptively crafted scenes from exactly the right perspective point they fail too. Just look at Ames rooms. But good luck making that into a significant real-world exploit on humans.)
Any publications you'd care to recommend in the transferability subfield?
The basic idea here is that most of the time, each member of the ensemble will misclassify the adversarial example in a different way. This means that the posterior predictive distribution for adversarial examples ends up much broader, and you can detect them this way.
Surprisingly, even this can be beaten in the white-box case , although it's by far the hardest to beat of the current adversarial defences, and needs much more distortion. It's beaten exactly as the GP says, by differentiating through the averaging. AFAIK no-one's tried architecture ensembling, but I expect it would be vulnerable to the same technique.
The unperturbed baseball also looks like that -- the model we found is of a dirty baseball :P
We probably should have included a photo of that in our blog post, but you can see our paper if you're interested in seeing what the original looks like.
Or I guess it's more effective to fool the scanners to think you're a dog/cat instead of a potential terrorist (because everyone is one nowadays, right NSA?).
The consequences for the future can be enormous, seriously.
What do you think are currently the best method to fight adversarial weakness?
Would simply generating an augmented training set including adversarial examples be sufficient? (in your case, for example, you would include random poses of the adversarial 3d models in the training set) Or do you think a totally new architecture of training method is necessary to deal with it?
And if it's the latter, do you think this is a symptom of a more general shortcoming of current architectures, or a localized issue?
Thanks, great work btw!
It seems that all current state of the art architectures are vulnerable to adversarial examples; to the best of my knowledge there are no image classification networks that researchers have failed to reliably produce adversarial examples for.
Part of it is mistaking the forest for the trees. It may be just an artifact from the requirements we place on image classifiers (which are very lax) and the way we train them, not anything fundamental.
Indeed I believe we tend to think with more solid logic, specially when the decision becomes difficult. A DNN will look at the statistics of a feature set and make judgement upon that. A human can categorically reject certain hypothesis from definition requirements: a Riffle is a weapon. It must have a barrel to guide the projectile, and a muzzle for it to exit. It must have some firing mechanism (usually a trigger). Even if at a glance we get confused by exactly what an image is picturing, we can make quick logical judgements on sub-features to make epsilon-misclassifications almost impossible.
A network that would act that way would need some recursive behavior (to implement the varialble-time classification efficiently), a recursive "logic module" or "language module" plugged into the end of naive feature classification.
What do you see here?
My entire childhood I saw a weird face (without thinking too much about it). This is the logo of a brazilian beer brand. When I was a teenager I saw an ad with two penguins, and then it clicked.
Optical illusions are well documented too, some classic examples:
But it's quite probable we'd found any glaring adversarial issues by now. After all artists can conduct a semi-whitebox, mostly blackbox adversarial optimization of illusions (I'm sure some process like this is how they came up with the Old/Young lady illusion). Note however that even those are particular errors like incorrect brightness estimation, or near-complete dichotomy (it's not that either the young or old interpretation are incorrect in some sense). An epsilon-failure seems much more difficult to come up with. The distinction seems mostly in the ability to apply basic logic on top of pure pattern recognition, sort of greatly enhancing the decision boundaries through recursive thought. Eventually logical features (a quick "proof" of sorts) win. You "prove" that what you're seeing indeed can't be a rifle, it must be a weird turtle.
This logical approach probably has gradations in power. In general it might even be algorithmically undecidable whether an image is logically valid. Generalizations of Escher illusions:
come to mind. In practice we cut off the decision process when we've found main logical features and connections, so unless the image inspires superficial uncertainty a deep logical inconsistency could slip by (that a more intelligent person may not miss by universally applying a deeper consistency check).
It's a reasonable amount of effort to disentangle it from the rest of our code and build a nice API, so it will be several weeks till we get around to this.
I know that for 2D images there currently exists something like this where some sort of pixelated noise (color?) is added to the image to force an incorrect classification.
If you look closely and compare the unperturbed original with the rifle turtle, you can spot some differences.
We didn't correct for color artifacts; we just used a wide distribution with EOT (we just modeled color inaccuracy rather than trying to make a color map or something).
Do you have any explanation about that?
In our paper, we have many more examples with different target classes, all chosen uniformly at random.
Do I have that right?
A military guidance system would overtrain a network using a supercomputer and then cryptographically make it stupider so they could give every missile 5 individually generated networks that vote, forcing attackers to compromise 3 networks per missile. (There may be some back and forth for a while, but here you can create a diversity of networks that makes the ensemble hard to attack.)
So they'll just shoot them down with lasers cause that's less energy intensive than running the compute node to crack missiles in real time.
Reminds me of the record player chapter from GEB 
Given that the object is clearly more like a turtle than a rifle in every regard, I'm gonna give a win to team humans on this one.
'Course, I can't help but be a biased referee...
> Maybe this weakness of the NN would actually make it perform better than a human at picking out camouflaged objects
Well, it failed just now, against a camouflaged turtle.
I'll clarify my comment though. The object is really a model that's shaped like a turtle but with pictures of rifle parts on it. It's neither a rifle nor a turtle. Both human and computer are too confident in their classification and both are just as wrong.
Using your analogy. You could actually hide gun inside a lunchbox and fool humans.
>It's neither a rifle nor a turtle.
I disagree. Would also say the pistol is no longer a pistol?
The "Turtle" is a plastic replica of a turtle not a real turtle. It's the treachery of images idea - "Ceci n'est pas une pipe."
Humans see its form and recognize the plastic replica to be a representation of a turtle because we prioritize its shape over its textured image which seems more correct to us, but I'm not sure that it really is more correct in some objective way. In this case I suppose you could say it is because a turtle is what we mean for it to represent, but the test seems rigged in favor of human visual classification.
I think an interesting question is what adversarial attacks exist on human vision that may not affect a machine (certain optical illusions?). If we're also vulnerable to this kind of manipulation then it may not be something unique to computer vision we may just be picking test cases that we're better at. Then it's just a matter of tradeoffs and deciding when we want human style classification.
The human's error is in missing the texture. The computer's error is worse, it misses the turtle and thinks the texture is an actual rifle.
For a counter example - imagine that you make a toaster that looks exactly like a pistol, but it actually just toasts bread.
A human would think it's a pistol when looking at it (so would the machine in this case). There may be adversarial examples where the human classification is worse than the machine if you specifically try and make examples that are bad for the human visual system.
It seems like they are finding little "inflection points" in the trained network where a small, well-placed change of input can flip the result to something different. With the rise of "AI for AI", I imagine this is something that could be optimized against.
In the turtle example, it seems that google's classifier has found that looking for a couple specific things (mostly a trigger in this case) identifies a gun better than looking for the entirety of a gun. Perhaps optimizing against these inflection points will force the classifier to have a better understanding of the objects it is classifying and lead to better results in non-adversarial situations.
I'd suggest that if it was "smart enough", it wouldn't be mistaking a turtle for a rifle.