Hacker News new | past | comments | ask | show | jobs | submit login
Keras reimplementation of "One pixel attack for fooling deep neural networks" (github.com)
212 points by gk1 on Feb 23, 2018 | hide | past | web | favorite | 77 comments

This is really interesting, but points out a key importance in training neural nets, which is to design your dataset and training to maximize generalization. For example, in the case of training a neural network for something that is highly safety critical, like an autonomous vehicle, it's important for vehicle and pedestrian detection to be as generalized as possible. In order to achieve high confidence in all sorts of lighting, weather, and angle conditions, the training data is augmented, which means it is manipulated randomly with color, saturation, contrast, blurring, shifting, mirroring, adding noise, etc. So adding random pixels, to potentially protect against dead pixels is also a great idea.

Jumping on top comment (which I completely agree with) to ask:

Why wouldn't a K-fold cross validation enable catching this? I'm curious if the attack adds doubt, in that the prediction algorithm is _close_ to truth but gets confused (likelihood of horse slightly less than dog), versus incorrect certitude (the horse is definitely a dog). One could then attach a weighting, perhaps based on max RGB/CYMK vector norm between two pixels across the image, to the folds' difference in top two certitudes.

I don't know, something like that.

Because its too computationally expensive. Attacks like these are found when you have access to the model, take an inout image and change its pixel values and see when an anomoly occurs. It’s much faster scoring N images vs. building N models.

How do you combat it? Well that’s an open research question. IMO the most promising techniques make the system harder to attack (ensemble models, more complex models, randomizing the input slightly and dropping outliers) but its not a guarantee. Like in security, it would be great to verify a model is safe to 200 years of brute force search for attack, or whatever it may be.

What about applying dropout to the input image

I don't believe that many people are using K-fold cross validation at all with deep learning, as the computational overhead is massive.

I must be misspeaking then. I'm not suggesting it run during training, but run the classification algorithm X number of times over an input image with chunks areas removed/suppressed from the data.

This is very common. It's an easy way to improve accuracy on a model with a fixed amount of data. They typically crop the image as well as rotate and scale it in order to get a larger data set.

It does make the model more robust, but doesn't seem to help much with finding adversarial examples in the model.

Generating adversarial examples and training on that might be a better approach to solving this.

If I understand correctly, you want to modify an input image at application time to get multiple different classifications, and then compare them to find out how certain the model really is.

While that would likely improve results a bit, it would also multiply the model runtime. That's why the other replies directly jump to talking about training data augmentation, since that can give you similar benefits without the runtime penalty.

However, random augmentation can't fully protect against adversarial examples. The number of input variables is simply too large, and there are exponentially many directions in which they could be modified. Data augmentation can't cover all of them, and a single modification that confuses the model slightly can be amplified into an adversarial example that causes a total misclassification.

> That's why the other replies directly jump to talking about training data augmentation, since that can give you similar benefits without the runtime penalty.

Exactly. It's a fix that doesn't work, apparently, so that's why I'm thinking towards the runtime.

> it would also multiply the model runtime.

Predictably so, I would think? Such an approach could scale decently since it's not adding a dimension to the runtime, just a multiple.

Predictably slow is still slow. You could parallelize it over multiple GPUs, but GPUs are expensive, so nobody is going to do that. Requiring lots of training resources is fine, but inference needs to be cheap.

More problematic is that your approach isn't going to actually work, since CNNs are just too flexible (they can learn even completely random labels) and only generalize by accident. No input augmentation technique that doesn't cover every possible modification is going to be robust against adversarial examples, and getting that amount of coverage requires an exponential blowup in runtime. The adversary has the advantage of being able to choose one modification, while the model needs to defend against all of them.

There is a way to do this in Tensorflow, where you try to expand your training data images by altering them, moving them around, etc. However, this is not used with large training sets.

not as massive as producing a bum model. And most of the overhead in ML is data acquisition, cleaning, normalization and tagging costs.

>I'm curious if the attack adds doubt, in that the prediction algorithm is _close_ to truth but gets confused (likelihood of horse slightly less than dog), versus incorrect certitude (the horse is definitely a dog).

While I can't speak for this attack in particular, there exist algorithms that can fool a neural network into generating high-confidence incorrect predictions for images that are visually indistinguishable from ones on which the network performs just fine. That's the biggest issue with these adversarial images.

There have been some fascinating steps taken towards that direction, e.g. [0] and its related work. There they take the approach of transferring input images back and forth domains (think, a street imaged in summer transferred to its winter manifestation and back or a synthetic GTA image to real-world and back are the examples in [0]). Doing this while simultaneously holding the semantic content of the input unchanged with a GAN-type strategy seems to be a way to coerce the neural net's internal representations to capture what we want them to instead of idiosyncrasies of the dataset.

[0] https://arxiv.org/abs/1711.03213

This is a truly excellent point. It applies to people, too: what will someone do if they are driving through the snow for the first time EVER, if they had lived in a temperate climate and just moved?

They will go very slow and perhaps it is not an exaggeration to say they will re-learn. Humans are learning constantly. In fact if there were some natural disaster (lava flow) and a human saw another car drive across some set lava on the way out of town (as more lava is rushing toward them) then a human will go ahead and follow, after seeing the other car make it through. If they see another car try to go across but get stuck on, they might take a detour and go find some intact bridge or other way to pass.

Actually, what you call "general" might be as much as general intelligence...

Walk around in the city with a giant yellow square costume and observe the mayhem...

Dressing up as traffic signs might become a thing...

I'm pretty sure that's already illegal on many countries.

I recently saw a yellow banana-man in a moshpit at a metal show.

Don’t put breaking expectations past humans. We are adversarial by nature.

Ian Goodfellow and others have done a lot of great work exploring the properties of adversarial attacks. While all the things you’ve mentioned are great practices and important to avoid overfitting, adversarial attacks are not due to overfitting (nor are they random) and none of the things you described are sufficient as defenses.

It would be interesting to try and generate this kind of data from 3D models: entire intricate scenes, where the camera can go around and take thousands of images in various lightings, weathers and vantage points. ICBW, but I feel like this modeling may have been tried and didn't translate well to real-life at the time. I can't see why it wouldn't be possible with enough modeling and physics precision, though.

Shifting and mirroring is already a standard thing in papers with crop-10 (top-left, top-right, middle, bottom-left, bottom-right) * 2 (reflection on the horizontal axis). Colour shifting is also done via PCA analysis.

One thing that humans have is that young children watch the pages of a book turning, so see basic images at all extreme angles.

You make it sound a bit too easy “just add some random colours”. But there are theoretical hardness results from the 90s on learning NN. It is so-called inherently unpredictable. These hardness results became a bit obsolete by big data. I believe that what we see happening with adversarial inputs is a shadow of such hardness results.

This made me realize that Waymo's training data from Google street view imaging and recaptcha based labeling, whilst noisy in many natural ways, is exclusively daytime data. Seems a glaring hole.

It also doesn't work for adversarial attacks. "Adversarial training" helps somewhat but is still very susceptible to adversarial attacks.

I wonder how well capsule networks could deal with this, considering they're more robust than traditional CNNs towards other sorts of adversarial attacks. My guess is that one-pixel changes are going to do very little to alter pose information (though they will still alter the confidence of the feature existing), and so caps nets should be more robust here as well. Does anyone know if my intuition is correct?

Link to the capsule network paper, for those who haven't heard of it: https://arxiv.org/abs/1710.09829

I’m not sure capsules will change the security much. We’re talking about pixel level attacks while capsules are trying to generalize at larger length scales (rotational/translational invariance).

I suspect the same, that this kind of attack won't work on capnets. Note that "test on capnets" is on the list of not-yet-complete milestones.

Author of the repo here. I'm training a capsule network right now, will report back on results in the README in 24-48 hours. My hypothesis is that capsule networks are vulnerable as well, but we'll see by how much.

So I did some preliminary testing with CapsNet, and it seems that while it was harder than all the other CNNs to find adversarial pixels to fool the network, it was still vulnerable to attack. See the README for quantitative results.

... We should do this. I learn capnets, you get feather in cap?

Isn't the fact that it's one pixel of a 32*32 image relevant? I'd be more impressed to see a neural network be succesfully attacked by a single (or dozens) of pixels on a full-res image.

The fact that a human isn't fooled by the attack (we can still recognize the 32x32 images for what they are), points to an interesting gap in the abilities of conventional convolutional neural nets.

A better analogy would be stimulating an individual receptor on the retina, in an eye with only 32^2 such receptors. When we see these pictures, we've got a much larger set of inputs to work with.

That's only because the attack is designed to target that particular network. Just wait until we understand real brains better and can generate tailored attacks...

Isn't that basically what an optical illusion is?

Yes, and also this: https://en.wikipedia.org/wiki/Dazzle_camouflage

Hacks human brain rather efficiently.

> Dazzle was adopted by the Admiralty in the UK, and then by the United States Navy, with little evaluation. Each ship's dazzle pattern was unique to avoid making classes of ships instantly recognisable to the enemy. The result was that a profusion of dazzle schemes was tried, and the evidence for their success was at best mixed. So many factors were involved that it was impossible to determine which were important, and whether any of the colour schemes were effective.

It is true that battlefield efficiency of such camouflage is unknown - but I think one can see the effects it does on the brain without conducting a proper rigorous study. The question here is not whether the effect exists - which is IMO obvious - but whether it's enough to make difference in actual combat.

Plus cognitive biases. Those are already well understood and used in a variety of ways.

The one-pixel attack works for pretty much any machine learning model, not just one single neural network.

500,000 years ago our eyes and vision were probably significantly worse than they are today. As our eyes evolved to capture the world better our brains also evolved to correct the errors from our eyes. On the other hand, we feed into our neural networks high quality images. It's true that they are low resolution but they don't contain noticeable noise or artifacts. The attack described here is a smart application of salt and pepper noise. It's ineffective on humans because our vision evolved to filter it out, but a network which has seen only noiseless images is helpless.

I'm curious whether training the network by adding noise and other mutations to the set would make the network more resilient to this attacks. In other words, it's the training set or the network architecture that's vulnerable here?

>I'm curious whether training the network by adding noise and other mutations to the set would make the network more resilient to this attacks. In other words, it's the training set or the network architecture that's vulnerable here?

This is called adversarial training and is currently the most popular technique for protecting neural networks against this type of attack. That being said, it doesn't work as well as one would hope: the adversarially trained models are usually still vulnerable to other attacks.

Not a new concept in general, just a new approach for it with a single pixel change.

There are other concerns of practicality as well, such as dependence on ability to rerun samples through the original network.

The authors of the paper have successfully conducted the attack on 227*227 images as well. The pixels are much harder to see with the human eye.

It seems that larger images increase the search space as a linear function of the dimensions. That is to say, it does take more time to find such pixels, but they are still relatively common.

There are attacks where all pixels are just slightly changed and the classification is completely wrong. Think of changing all pixels by just the least significant bit. Still these changes are invisible to (my) human eye. Basically, each distance function has atttacks with a very small distance.

In the sources of the single pixel paper I found a reference to a prior work which does this: http://www.shivakasiviswanathan.com/CVPR17W.pdf

This kind of attack relies on a low margin DNN, see (1), a low spectral norm of the input-output jacobian matrix guarantees good generalization error. So a one pixel attack exploits a weak eigenvalue (small absolute value) of the jacobian matrix.

So to create a one pixel attack, compute: 1)the eigenvalues of the jacobian of input-ouput matrix, 2) takes the the smaller eigenvalue lambda_1 3) compute or approximate the function lambda_1 = f(input) 4) compute j = argmax_{i=1..n} d(lambda_1)/d(input_i) at the point in which the spectral norm is maximum.

So to create the attack change the j-pixel in the points of the training set that has maximum (or high) jacobian matrix.

(1) https://arxiv.org/pdf/1605.08254.pdf

Can we use one pixel attack to train network? We would generate adversarial examples to train network.

Having ability to create Training Set that maximizes learning factor for NN sounds amazing but I think we would run to other adversarial examples.

Augmenting your training dataset with adversarial examples is known as adversarial training, see e.g. [0] for a recent overview with empirical results. This seems to be a good first step in defending against such attacks, though the most naive approach of adversarial training doesn't work as well as you'd expect.

[0] https://openreview.net/forum?id=rkZvSe-RZ

There are a nearly infinite amount of forgery possibilities. The best solution seems to be making multiple networks with different approaches and training sets. Use a consensus or refer to a human if there is none. Finding holes should become harder with this approach.

I’ve mentioned this before but a median filter would nuke the single outlying pixel without too much of an effect on the input image. Is there an attack that can get past such a basic preprocessing step?

Assuming you're talking about adversarial examples in general (a one pixel attack would definitely be stopped by a median filter), yes. Median filters, gaussian filters, gaussian noise, all don't provide significant barriers against attack.

This is interesting. I guess then you’d want to focus not on a single pixel but a collection that will still retain some value once filtered.

It would be an interesting experiment.

We need to define the derivative of a deep model. I mean a way to measure how a model change when we change one pixel in the training data. Since pixel -> feature -> margin, we need to define the derivative with respect to a natural parameter, the natural parameter of the model has to defined ad hoc for every application. Perhaps the natural parameter encodes an uninformative prior. The intuition is to use information theory to see how the discriminative power of the model change when the training data is perturbed. So we need to measure the derivative of the added information. Fisher information seems to be related to this.

this paper https://openreview.net/forum?id=HJC2SzZCW suggest that sensivity is related to poor generalization power. To define derivative we need to use a natural parameter in such a way that it measures sensivity and also allow us to use methods from calculus and manifolds, such as parallel transport of features. How a DNN label a cat when is catching a rat.

If in a DNN for label a cat we explore the group of movements of the animal cat (realistic movements available for a cat) we could relate the discriminative power of the DNN to the energy of the cat. The energy of the cat is related to the volume of the group of movements. A cat with zero energy has the identity group of movements (no movement), a hulk cat is able to alter many of her features, so a very power model is needed to identify a hulk cat. A person full of rage is able to change the color of her face, again energy alter training space. Sorry for using HN for thinking.

Given that DNN are deep, what a one pixel attack means is that one pixel change propagates through the map of features: one pixel => 0-level-feature change -> one 1-level feature change. So this attack relies in weak features that can easily propagate to next level of features. Hence, to defend against this attack the model should put a threshold on the ratio (sensitivity of features)/(number of pixels) and avoid features with high sensivity to easily propagate to the next level of the DNN. If features are not linearly related to input set, then correlation is not a measure of feature sensitivity and has nothing to say about the full DNN effect of such change in a pixel.

But does this mean that edge detection is also failing?

If the neural network thinks a truck is a frog is it not recognising the vertical edges?

Seeing the intermediate layer images would be interesting to see where in the process it failed.

I keep thinking how kids often learn through labelled cartoon images. There the outline is more important.

Perhaps we could pre-train networks first on outlines of images. Make sure that these are capable of handling adversarial techniques and then build from there.

Aren't these attacks just proof of the fact that even deep neural networks are approximating a high dimensional function by clamping the entropy of the formula, rather than truncating the range of input/output values?

For a 32x32 image, the space of 1-pixel attacks is 0xFFFFFF * 32 * 32 = 17179868160 = e^23

Expecting an input space as large as that to not poke through the entropically deprived network is destined to fail.

I mean, you don't even need a proof of that. The latter is impossible since the range of input/output values is untruncated by construction?

I'm very new to ML, so I understand about 50% of what @eximius and @goldenkey are saying, but definetly not 100%. can anyone explain it in a bit more detail? (im assuming "entropy" is the key concept i need to put on my learning queue.)

“approximating a high dimensional function by clamping the entropy of the formula, rather than truncating the range of input/output values”

“not poke through the entropically deprived network is destined to fail”

“the range of input/output values is untruncated by construction”

The set of all mappings between an input set of N elements to some output set with M elements has M^N elements.

If you wanted to be able to represent in some way any arbitrary mapping for given sets of input and output, then you would need at least log_2(M^N) = N x log_2(M) bits.

In the case of an input set of 32x32 pixel images with 3 bytes per pixel (one for each channel) we have N = 2^8 x 2^8 x 2^8 x 2^5 x 2^5 = 2^34.

In the case of an artificial neural network we have at the last level an output. There will be at least one node with at least one bit of output, so M >= 2. In general, to have anything else but the trivial map that maps every input to the same output, we always have M >= 2.

So, we need at least 2^34 x log_2(2) = 2^34 bits to represent an arbitrary function between the input and the output. That is 2 gibibytes!

Since the models don't need 2 gibibytes, something is going on. The magic here is that we are able to encode subsets of possible mappings very efficiently by using the execution logic of a computer. The compressed representation of the mappings in the restricted subset are the learned weights (the code to evaluate the model is also needed, but that requires less bits than what we save). We are, in a way, compressing functions, not data. Hence the "clamping of entropy of the formula". [0]

The restricion of the set of possible functions will lead to new, interesting phenomena. Think of it as compression artifacts, however not on images or audio, but functions.

To make a model resistant to attacks by someone knowledgeable about these artifacts, I would add noise to the input such that the artifacts are not predictable, hence not practically attackable.

[0] The same basic phenomenon happens with block ciphers in cryptography. A block cipher on one block is just a permutation of the set of all different input blocks. If you have a blocksize of 64 bits, representing an arbitrary permutation would need log_2(2^64 !) bits, where the exclamation mark stands for the factorial. That number is huge, bigger than 2^69. We can't represent arbitrary permutations of blocks of 64 bits. Yet, block ciphers are permutations. What happens here is that once again we find subsets of the possible permutation we can represent efficiently. The compressed representation is the key.

I’m not sure adding noise to the inputs of an equally complex model will change the information load of the NN. Because of the compression of the NN I think there will still exists new input pertubations which generate attacks.

My idea is that without added noise, knowing the model and the input allows crafting a pertubation that leads to an erroneous output of the model.

With noise added there is less correlation between the input and the output. At the extreme with 100% randomness added, there is no correlation anymore between any pertubations of the input and the output. However, there is unfortunately also no correlation anymore between the input and the output.

What happens if you add a bit of noise? The more noise, the smaller the correlation between the perturbations and the output. At what point is the probability of a successful attack sufficiently small?

To clarify, I mean adding the noise not in the training phase or to the images itself, but at the input stage into the model. That way even the repeated input of the same image would result in different inputs to the model.

I'm not sure this type of protection is efficient and effective, but it's an idea.

Adding noise is important for generalization and predictive power but it may just shift the attacking pixel from one to another, since the model itself is inherently compressed, inversely correlated with its complexity.

If I have a function f(x,y, z) = <some expression> \in R^5 (i.e., f: R^3 => R^5) that is hard to compute, I can train a neural network to approximate it such that there are three input neurons x', y', z', some hidden layers doing the approximation, and 5 output neurons representing the approximation of the input.

By construction, the domain and codomain are not constrained. Both the original and our approximation using NN take any three real values and return any five real values.

Next, consider a sample of points from some function. I can perfectly fit those points using a polynomial of degree equal to the number of points by just setting f(x) = (x-y_1)(x-y_2)... If, however, I approximate the function by removing some degrees from the formula, I remove information (entropy) from the formula. It is no longer a perfect match, but it might be very close. Or, if the underlying distribution is of low dimensionality, it might still be an exact match (i.e., picking any number of points from a straight line doesn't mean you need a high degree polynomial to approximate it!).

I presume you have a regression background. If so: "minimizing entropy" is the same paradigm as "minimizing the sum of squares", in a different context.

If not: reducing entropy means finding weights/coefficients in a supplied functional form that minimize some objective function applied to the problem.

Usually the jargon applies to Shannon entropy from signal theory, or some derivation thereof like transfer entropy.

Entropic estimates take a form similar to

$$ -\sum(j) {p(x_j) log(p(x_j))}$$

where j is the event space (e.g. heads or tails on a coin flip).

Brick shattering through window hypothesis finally has evidence

No seriously - am I missing something?

"Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector."

"Submitted on 24 Oct 2017 "

My guess here is that we need to learn more about adding noise to datasets before they go into a neural network. Lots of existing work do alterations like transposing, rotating, flipping and so on. Adding noise, including known attacks, would be part of this.

Noise will definitely help if random noise is added to every input image before the neural network sees it, in production. Essentially, a pre-input pipeline state that a bad actor cannot predict, making single pixel deception totally neutered.

I think what needs to be done here is to add a threshold of correlation between the input pixels. Consider that the problem is that 1 pixel change in the deviously right way, can be equivalent to the change when multiple pixels are changed in the proper way -- the derivative of the cost function. So clearly there needs to be a way to design / tell the network that 1 pixel change cannot be nearly as strong as changing multiple pixels relationships, in terms of cost function value change.

From what I can garner, the only way to accomplish this is to make sure the number of nodes in the hidden layers is strictly monotonically decreasing. By using the last layer as a "grab bag" for classification, with 100s of nodes greater than the previous layers, the network becomes vulnerable to single pixel attacks. There have to be ways to design classification styles networks without the fan-out.

Does this work if you use data transforms augmentation so that the pixel isn't always seen in the same spot? You probably wouldn't do any panning/rotation etc. for CIFAR 10, but for most practical purposes you probably would do more augmentation and I wonder if that defeats this?

How well does this fair over several frames?

Does anyone else wonder if their usage of the word "THICC"[1] in their meme, inadvertently comes off as sexist?

[1] https://www.urbandictionary.com/define.php?term=Thicc

Great work! risky intro picture.

Urban Dictionary has prominent offensive definitions for most entires; it's not a great source for what you're trying to demonstrate.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact