
Universal adversarial perturbations - legatus
https://arxiv.org/abs/1610.08401
======
legatus
Abstract: Given a state-of-the-art deep neural network classifier, we show the
existence of a universal (image-agnostic) and very small perturbation vector
that causes natural images to be misclassified with high probability. We
propose a systematic algorithm for computing universal perturbations, and show
that state-of-the-art deep neural networks are highly vulnerable to such
perturbations, albeit being quasi-imperceptible to the human eye. We further
empirically analyze these universal perturbations and show, in particular,
that they generalize very well across neural networks. The surprising
existence of universal perturbations reveals important geometric correlations
among the high-dimensional decision boundary of classifiers. It further
outlines potential security breaches with the existence of single directions
in the input space that adversaries can possibly exploit to break a classifier
on most natural images.

~~~
kough
Super interesting. I'm on mobile and haven't had time to read the whole paper
yet - would it be feasible to continuously compute these perturbation vectors
during training and include them as part of a larger heuristic? For instance,
to incorporate the objective of maximizing the size of the perturbation vector
necessary for misclassification? The goal being to end up with a net that is
more resistant to such perturbations.

~~~
sherjilozair
Short answer: No. Computing these perturbations requires an expensive
optimization with multiple passes through the dataset, and this would be
prohibitively expensive to do in the inner-most loop of training.

There are other work in the literature describing faster algorithms to compute
these perturbations, which makes it possible to use them while training. See,
eg.: [https://arxiv.org/abs/1412.6572](https://arxiv.org/abs/1412.6572)

~~~
joshuawarner32
See also: [https://arxiv.org/abs/1511.04599](https://arxiv.org/abs/1511.04599)

IMO, (at least) two pieces of research on the subject means that the short
answer really is "yes". Maybe not the exact technique used in the paper in the
original post, but conceptually similar techniques.

~~~
sherjilozair
It's easier to find fooling perturbations of one image, but not of the whole
dataset. I assumed the question was can we use the universal perturbations for
robust training? The answer to that is still "no", I think.

------
danbruc
This seems to imply the features lernt by neural networks are very different
from the features humans use to distinguish the same objects because they are
affected by distortions that do almost not interfere with features used by
humans at all.

~~~
danieltillett
One thing is neural networks are much smaller than human brains and most
likely have far few overlapping redundant systems. If you had three separate
neural networks that called on a consensus you might find it much harder to
find adversarial inputs.

~~~
pizza
This reminds me of signal attenuation/gain/feedback (by neurotransmitter
release [I want to say dopamine...]) due to error in the visual cortex..
Hopefully someone who's studied that might have something to share.

~~~
emcq
I believe the parent was referring to having an ensemble of models with
different trained networks can reduce variance and perhaps avoid issues like
these.

Attention, localized gain, etc would not have this effect, but they tend to
allow a smaller network to perform more sophisticated tasks.

~~~
pizza
Like so?
[https://en.wikipedia.org/wiki/Stochastic_resonance_(sensory_...](https://en.wikipedia.org/wiki/Stochastic_resonance_\(sensory_neurobiology\)#Multi-
unit_systems_of_model_neurons)

------
thisisdave
Several of the universal perturbation vectors in Figure 4 remind me a lot of
Deep Dream's textures.

I wonder what it is about these high-saturation, stripy-spiraly bits that
these networks are responding to.

Is it something inherent in natural images? In the training algorithm? In our
image compression algorithms? Presumably, the networks would work better if
they weren't so hypersensitive to these patterns, so finding a way to dial
that down seems like it could be pretty fruitful.

~~~
zo7
My _intuition_ is that these patterns "hijack" the ReLU activations in the
lower levels, causing either important features to not fire or features that
shouldn't fire to do so. Usually the lower layers learn very primitive shapes
like lines and curves, and _I think_ (although I'd need to double check) that
they usually pass through entire color channels rather than nuanced mixings of
colors. (So one features would either pass through all of red or all of blue
or all of both, rather than pass just 66% red, 47% blue, and 33% green -- if
it did the latter it wouldn't be able to generalize well) This propagates the
error through the network, where the later activations start firing in the
wrong places, causing the mis-classification.

(This is totally unsubstantiated though)

~~~
dkarapetyan
No intuition necessary

> The surprising existence of universal perturbations reveals important
> geometric correlations among the high-dimensional decision boundary of
> classifiers. It further outlines potential security breaches with the
> existence of single directions in the input space that adversaries can
> possibly exploit to break a classifier on most natural images.

The paper unpacks that explanation pretty well along with actual pictures and
how they are related to the classification boundary.

------
pfortuny
This is really great research and interesting: (very roughly) how to compute a
very small mask which, when applied to any image, makes the neural network
misclassify it, whereas humans would notice no essential difference.

Quite remarkable.

~~~
hammock
It says these universal vectors are the same across different classifiers. Why
would that be?

~~~
morenoh149
I'm not an expert but it seems these perturbations are fiddling with the
fundamental approach used with a NN. Mainly, that a NN works in layers. So
these perturbations must be messing up the lowest layers and then the higher
layers end up generating the wrong features and ultimately the model
misclassifys. See
[http://i.stack.imgur.com/jpYdN.png](http://i.stack.imgur.com/jpYdN.png)

------
dkarapetyan
This is why I'm never driving a car that is classifying stuff with neural
networks. Some dust, some shitty weather conditions and that pigeon becomes a
green light.

~~~
asperous
This wouldn't affect that because the perturbations were specially picked to
mess up the network. It wouldn't just happen naturally.

Also self-driving cars have distance sensors and wouldn't just drive into
upcoming traffic because of one sensor anomaly.

~~~
rcthompson
Ok, so some guy invents a device that tricks every car at an intersection into
seeing a green light, and maybe blinds them to the presence of other cars.

~~~
ipunchghosts
I dont think you quite understand the problem. This can't happen.

~~~
zo7
I would imagine you could print the pattern onto a film that you stick on the
lens of a camera to throw off its classifier.

~~~
Hondor
It won't be able to focus on something so close. It'll just slightly darken
the whole image uniformly.

------
jmount
In signal processing you often have to pass the data through some sort of low-
pass filter before attempting your analysis. I would be surprised if that
isn't one of the methods being tried to protect deep neural nets from some of
these attacks. Obviously there are some issues (needing to train on similar
data, and such blurring interfering with first-level features that emulate
edge-detection and so on).

------
nullc
So what happens when you stick this procedure in the training loop? Do you get
networks which are robust against doubly-universal perturbations?

------
dTal
What happens if you include the perturbations in your training data?

~~~
dandermotj
If my understanding is correct, the perturbations are inherent in the model,
not the data. It's a vulnerability in the high dimensional decision boundary
of n nets.

------
jonathanyc
Reminds me a little bit of the short story BLIT [1], where scientists have
accidentally created images that crash the human brain. Cool stuff!

[1]:
[https://en.wikipedia.org/wiki/BLIT_(short_story)](https://en.wikipedia.org/wiki/BLIT_\(short_story\))

~~~
ccvannorman
"Snowcrash" is the more realistic Neal Stephenson version where it gets at the
eye-brain-embedded hardware. And of course the original, "the joke so funny
that if read or heard would make you laugh yourself to death".

Humans seem really good at being imprevious to these, due to millions of years
of ignoring things..

------
amiramir
I'm guessing it won't be long until someone uses this technique to computer
and apply perturbation masks to pornographic imagery and make NN-based porn
detectors/filters (like the one Yahoo recently open-sourced) a lot less
effective.

------
yodon
Is there reason to think the human visual system is sufficiently well modeled
by deep neural nets that our brains might exhibit this same behavior? My first
thought was the perturbation images would need to be distinct per person, but
photosensitive epilepsy like the Pokémon event [0] might suggest the
possibility of shared perturbation vectors.

[0]
[https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy](https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy)

~~~
nhaliday
What I find interesting is that the labels for the perturbed images aren't
completely off in all cases, eg, wool for a shaggy dog.

~~~
morenoh149
that image also seems very black (the dog takes up most of the image) so the
perturbations probably didn't have much to perturb. Also the perturbation is
"universal" so it could have simply landed on the same classification.

------
javajosh
My science-fiction brain is, of course, interested in this as a method to
defeat face-detection _in a way humans can 't see_. I'd like to think that the
crew of the Firefly used this technology to avoid detection when they did jobs
in the heart of Alliance territory.

------
oh_sigh
Could you just add noise to any image before passing it through a NN to defeat
this kind of attack?

------
yodon
Can someone help with a notation question? In section 4 of the paper, the norm
of the perturbation is constrained to a maximum of 2'000 which presumably is
"small" but I don't know how to parse an apostrophe like that

~~~
yodon
Update: later in the paper, the authors mention that 2x10^4 is an order of
magnitude larger than 2'000 so perhaps this is just a way of introducing a
thousands separator without introducing cultural ambiguity over whether it's a
thousands separator or a decimal separator?

~~~
danbruc
It is a thousands separator and for example used in Switzerland and on
calculators with LCD display.

------
bmh100
My intuition is that the existence of adversarial images with barely
perceptible differences but a high-confidence misclassification will lead to a
new NN architecture for image classification.

------
mathgenius
This is like Godel incompleteness for deep learning.

~~~
ktphy
Why?

