
Keras reimplementation of "One pixel attack for fooling deep neural networks" - gk1
https://github.com/Hyperparticle/one-pixel-attack-keras
======
agitator
This is really interesting, but points out a key importance in training neural
nets, which is to design your dataset and training to maximize generalization.
For example, in the case of training a neural network for something that is
highly safety critical, like an autonomous vehicle, it's important for vehicle
and pedestrian detection to be as generalized as possible. In order to achieve
high confidence in all sorts of lighting, weather, and angle conditions, the
training data is augmented, which means it is manipulated randomly with color,
saturation, contrast, blurring, shifting, mirroring, adding noise, etc. So
adding random pixels, to potentially protect against dead pixels is also a
great idea.

~~~
tomrod
Jumping on top comment (which I completely agree with) to ask:

Why wouldn't a K-fold cross validation enable catching this? I'm curious if
the attack adds doubt, in that the prediction algorithm is _close_ to truth
but gets confused (likelihood of horse slightly less than dog), versus
incorrect certitude (the horse is definitely a dog). One could then attach a
weighting, perhaps based on max RGB/CYMK vector norm between two pixels across
the image, to the folds' difference in top two certitudes.

I don't know, something like that.

~~~
nightcracker
I don't believe that many people are using K-fold cross validation at all with
deep learning, as the computational overhead is massive.

~~~
tomrod
I must be misspeaking then. I'm not suggesting it run during training, but run
the classification algorithm X number of times over an input image with chunks
areas removed/suppressed from the data.

~~~
yorwba
If I understand correctly, you want to modify an input image at application
time to get multiple different classifications, and then compare them to find
out how certain the model really is.

While that would likely improve results a bit, it would also multiply the
model runtime. That's why the other replies directly jump to talking about
training data augmentation, since that can give you similar benefits without
the runtime penalty.

However, random augmentation can't fully protect against adversarial examples.
The number of input variables is simply too large, and there are exponentially
many directions in which they could be modified. Data augmentation can't cover
all of them, and a single modification that confuses the model slightly can be
amplified into an adversarial example that causes a total misclassification.

~~~
tomrod
> That's why the other replies directly jump to talking about training data
> augmentation, since that can give you similar benefits without the runtime
> penalty.

Exactly. It's a fix that doesn't work, apparently, so that's why I'm thinking
towards the runtime.

> it would also multiply the model runtime.

Predictably so, I would think? Such an approach could scale decently since
it's not adding a dimension to the runtime, just a multiple.

~~~
yorwba
Predictably slow is still slow. You could parallelize it over multiple GPUs,
but GPUs are expensive, so nobody is going to do that. Requiring lots of
training resources is fine, but inference needs to be cheap.

More problematic is that your approach isn't going to actually work, since
CNNs are just too flexible (they can learn even completely random labels) and
only generalize by accident. No input augmentation technique that doesn't
cover every possible modification is going to be robust against adversarial
examples, and getting that amount of coverage requires an exponential blowup
in runtime. The adversary has the advantage of being able to choose one
modification, while the model needs to defend against all of them.

------
xvedejas
I wonder how well capsule networks could deal with this, considering they're
more robust than traditional CNNs towards other sorts of adversarial attacks.
My guess is that one-pixel changes are going to do very little to alter pose
information (though they will still alter the confidence of the feature
existing), and so caps nets should be more robust here as well. Does anyone
know if my intuition is correct?

Link to the capsule network paper, for those who haven't heard of it:
[https://arxiv.org/abs/1710.09829](https://arxiv.org/abs/1710.09829)

~~~
dahart
I suspect the same, that this kind of attack won't work on capnets. Note that
"test on capnets" is on the list of not-yet-complete milestones.

~~~
hyperparticle
Author of the repo here. I'm training a capsule network right now, will report
back on results in the README in 24-48 hours. My hypothesis is that capsule
networks are vulnerable as well, but we'll see by how much.

~~~
hyperparticle
So I did some preliminary testing with CapsNet, and it seems that while it was
harder than all the other CNNs to find adversarial pixels to fool the network,
it was still vulnerable to attack. See the README for quantitative results.

------
7dare
Isn't the fact that it's one pixel of a 32*32 image relevant? I'd be more
impressed to see a neural network be succesfully attacked by a single (or
dozens) of pixels on a full-res image.

~~~
xvedejas
The fact that a human isn't fooled by the attack (we can still recognize the
32x32 images for what they are), points to an interesting gap in the abilities
of conventional convolutional neural nets.

~~~
deegles
That's only because the attack is designed to target that particular network.
Just wait until we understand real brains better and can generate tailored
attacks...

~~~
lbearl
Isn't that basically what an optical illusion is?

~~~
smsm42
Yes, and also this:
[https://en.wikipedia.org/wiki/Dazzle_camouflage](https://en.wikipedia.org/wiki/Dazzle_camouflage)

Hacks human brain rather efficiently.

~~~
thaumasiotes
> Dazzle was adopted by the Admiralty in the UK, and then by the United States
> Navy, with little evaluation. Each ship's dazzle pattern was unique to avoid
> making classes of ships instantly recognisable to the enemy. The result was
> that a profusion of dazzle schemes was tried, and the evidence for their
> success was at best mixed. So many factors were involved that it was
> impossible to determine which were important, and whether any of the colour
> schemes were effective.

~~~
smsm42
It is true that battlefield efficiency of such camouflage is unknown - but I
think one can see the effects it does on the brain without conducting a proper
rigorous study. The question here is not whether the effect exists - which is
IMO obvious - but whether it's enough to make difference in actual combat.

------
derivt
This kind of attack relies on a low margin DNN, see (1), a low spectral norm
of the input-output jacobian matrix guarantees good generalization error. So a
one pixel attack exploits a weak eigenvalue (small absolute value) of the
jacobian matrix.

So to create a one pixel attack, compute: 1)the eigenvalues of the jacobian of
input-ouput matrix, 2) takes the the smaller eigenvalue lambda_1 3) compute or
approximate the function lambda_1 = f(input) 4) compute j = argmax_{i=1..n}
d(lambda_1)/d(input_i) at the point in which the spectral norm is maximum.

So to create the attack change the j-pixel in the points of the training set
that has maximum (or high) jacobian matrix.

(1)
[https://arxiv.org/pdf/1605.08254.pdf](https://arxiv.org/pdf/1605.08254.pdf)

------
SurgeonOfDeath
Can we use one pixel attack to train network? We would generate adversarial
examples to train network.

Having ability to create Training Set that maximizes learning factor for NN
sounds amazing but I think we would run to other adversarial examples.

~~~
haraldurt
Augmenting your training dataset with adversarial examples is known as
adversarial training, see e.g. [0] for a recent overview with empirical
results. This seems to be a good first step in defending against such attacks,
though the most naive approach of adversarial training doesn't work as well as
you'd expect.

[0] [https://openreview.net/forum?id=rkZvSe-
RZ](https://openreview.net/forum?id=rkZvSe-RZ)

------
knolan
I’ve mentioned this before but a median filter would nuke the single outlying
pixel without too much of an effect on the input image. Is there an attack
that can get past such a basic preprocessing step?

~~~
chillee
Assuming you're talking about adversarial examples in general (a one pixel
attack would definitely be stopped by a median filter), yes. Median filters,
gaussian filters, gaussian noise, all don't provide significant barriers
against attack.

------
derivt
We need to define the derivative of a deep model. I mean a way to measure how
a model change when we change one pixel in the training data. Since pixel ->
feature -> margin, we need to define the derivative with respect to a natural
parameter, the natural parameter of the model has to defined ad hoc for every
application. Perhaps the natural parameter encodes an uninformative prior. The
intuition is to use information theory to see how the discriminative power of
the model change when the training data is perturbed. So we need to measure
the derivative of the added information. Fisher information seems to be
related to this.

~~~
derivt
this paper
[https://openreview.net/forum?id=HJC2SzZCW](https://openreview.net/forum?id=HJC2SzZCW)
suggest that sensivity is related to poor generalization power. To define
derivative we need to use a natural parameter in such a way that it measures
sensivity and also allow us to use methods from calculus and manifolds, such
as parallel transport of features. How a DNN label a cat when is catching a
rat.

~~~
derivt
If in a DNN for label a cat we explore the group of movements of the animal
cat (realistic movements available for a cat) we could relate the
discriminative power of the DNN to the energy of the cat. The energy of the
cat is related to the volume of the group of movements. A cat with zero energy
has the identity group of movements (no movement), a hulk cat is able to alter
many of her features, so a very power model is needed to identify a hulk cat.
A person full of rage is able to change the color of her face, again energy
alter training space. Sorry for using HN for thinking.

~~~
derivt
Given that DNN are deep, what a one pixel attack means is that one pixel
change propagates through the map of features: one pixel => 0-level-feature
change -> one 1-level feature change. So this attack relies in weak features
that can easily propagate to next level of features. Hence, to defend against
this attack the model should put a threshold on the ratio (sensitivity of
features)/(number of pixels) and avoid features with high sensivity to easily
propagate to the next level of the DNN. If features are not linearly related
to input set, then correlation is not a measure of feature sensitivity and has
nothing to say about the full DNN effect of such change in a pixel.

------
icc97
But does this mean that edge detection is also failing?

If the neural network thinks a truck is a frog is it not recognising the
vertical edges?

Seeing the intermediate layer images would be interesting to see where in the
process it failed.

I keep thinking how kids often learn through labelled cartoon images. There
the outline is more important.

Perhaps we could pre-train networks first on outlines of images. Make sure
that these are capable of handling adversarial techniques and then build from
there.

------
goldenkey
Aren't these attacks just proof of the fact that even deep neural networks are
approximating a high dimensional function by clamping the entropy of the
formula, rather than truncating the range of input/output values?

For a 32x32 image, the space of 1-pixel attacks is 0xFFFFFF * 32 * 32 =
17179868160 = e^23

Expecting an input space as large as that to not poke through the entropically
deprived network is destined to fail.

~~~
eximius
I mean, you don't even need a proof of that. The latter is impossible since
the range of input/output values is untruncated by construction?

~~~
tomhallett
I'm very new to ML, so I understand about 50% of what @eximius and @goldenkey
are saying, but definetly not 100%. can anyone explain it in a _bit_ more
detail? (im assuming "entropy" is the key concept i need to put on my learning
queue.)

“approximating a high dimensional function by clamping the entropy of the
formula, rather than truncating the range of input/output values”

“not poke through the entropically deprived network is destined to fail”

“the range of input/output values is untruncated by construction”

~~~
madez
The set of all mappings between an input set of N elements to some output set
with M elements has M^N elements.

If you wanted to be able to represent in some way any arbitrary mapping for
given sets of input and output, then you would need at least log_2(M^N) = N x
log_2(M) bits.

In the case of an input set of 32x32 pixel images with 3 bytes per pixel (one
for each channel) we have N = 2^8 x 2^8 x 2^8 x 2^5 x 2^5 = 2^34.

In the case of an artificial neural network we have at the last level an
output. There will be at least one node with at least one bit of output, so M
>= 2. In general, to have anything else but the trivial map that maps every
input to the same output, we always have M >= 2.

So, we need at least 2^34 x log_2(2) = 2^34 bits to represent an arbitrary
function between the input and the output. That is 2 gibibytes!

Since the models don't need 2 gibibytes, something is going on. The magic here
is that we are able to encode subsets of possible mappings very efficiently by
using the execution logic of a computer. The compressed representation of the
mappings in the restricted subset are the learned weights (the code to
evaluate the model is also needed, but that requires less bits than what we
save). We are, in a way, compressing functions, not data. Hence the "clamping
of entropy of the formula". [0]

The restricion of the set of possible functions will lead to new, interesting
phenomena. Think of it as compression artifacts, however not on images or
audio, but functions.

To make a model resistant to attacks by someone knowledgeable about these
artifacts, I would add noise to the input such that the artifacts are not
predictable, hence not practically attackable.

[0] The same basic phenomenon happens with block ciphers in cryptography. A
block cipher on one block is just a permutation of the set of all different
input blocks. If you have a blocksize of 64 bits, representing an arbitrary
permutation would need log_2(2^64 !) bits, where the exclamation mark stands
for the factorial. That number is huge, bigger than 2^69. We can't represent
arbitrary permutations of blocks of 64 bits. Yet, block ciphers are
permutations. What happens here is that once again we find subsets of the
possible permutation we can represent efficiently. The compressed
representation is the key.

~~~
eanzenberg
I’m not sure adding noise to the inputs of an equally complex model will
change the information load of the NN. Because of the compression of the NN I
think there will still exists new input pertubations which generate attacks.

~~~
madez
My idea is that without added noise, knowing the model and the input allows
crafting a pertubation that leads to an erroneous output of the model.

With noise added there is less correlation between the input and the output.
At the extreme with 100% randomness added, there is no correlation anymore
between any pertubations of the input and the output. However, there is
unfortunately also no correlation anymore between the input and the output.

What happens if you add a bit of noise? The more noise, the smaller the
correlation between the perturbations and the output. At what point is the
probability of a successful attack sufficiently small?

To clarify, I mean adding the noise not in the training phase or to the images
itself, but at the input stage into the model. That way even the repeated
input of the same image would result in different inputs to the model.

I'm not sure this type of protection is efficient and effective, but it's an
idea.

~~~
eanzenberg
Adding noise is important for generalization and predictive power but it may
just shift the attacking pixel from one to another, since the model itself is
inherently compressed, inversely correlated with its complexity.

------
mcintyre1994
Does this work if you use data transforms augmentation so that the pixel isn't
always seen in the same spot? You probably wouldn't do any panning/rotation
etc. for CIFAR 10, but for most practical purposes you probably would do more
augmentation and I wonder if that defeats this?

------
wickedlogic
How well does this fair over several frames?

------
abledon
Does anyone else wonder if their usage of the word "THICC"[1] in their meme,
inadvertently comes off as sexist?

[1]
[https://www.urbandictionary.com/define.php?term=Thicc](https://www.urbandictionary.com/define.php?term=Thicc)

Great work! risky intro picture.

~~~
jamesgeck0
Urban Dictionary has prominent offensive definitions for most entires; it's
not a great source for what you're trying to demonstrate.

