Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Flaw Lurking in Every Deep Neural Net (i-programmer.info)
127 points by bowyakka on Nov 1, 2014 | hide | past | favorite | 59 comments


(a) The document linked by the OP is to a blog post that discusses and extensively quotes another blog post [1] which in turn discusses an actual paper [2]. Naturally the paper is where the good stuff is.

(b) Both blog posts somewhat understate the problem. The adversarial examples given in the original paper aren't just classified differently than their parent image -- they're created to receive a specific classification. In the figure 5 of the arxiv version, for example, they show clear images of a school bus, temple, praying mantis, dog, etc, which all received the label "ostrich, Struthio camelus".

(c) The blog post at [1] wonders whether humans have similar adversarial inputs. Of course it's possible that we might, but I suspect that we have an easier time than these networks in part because: (i) We often get labeled data on a stream of 'perturbed' related inputs by observing objects in time. If I see a white dog in real life, I don't get just a single image of it. I get a series of overlapping 'images' over a period of time, during which time it may move, I may move, the lighting may change, etc. So in a sense, human experience already includes the some of the perturbations that ML techniques have to introduce manually to become more robust. (ii) We also get to take actions to get more/better perceptual data. If you see something interesting or confusing or just novel, you choose to focus on it, or get a better view because of that interestingness or novelty. The original paper talks about the adversarial examples as being in pockets of low probability. If humans encounter these pockets only rarely, it's because when we see something weird, we want to examine it, after which that particular pocket has higher probability.

[1] http://www.i-programmer.info/news/105-artificial-intelligenc...

[2] http://arxiv.org/abs/1312.6199 or http://cs.nyu.edu/~zaremba/docs/understanding.pdf


I'm sure someone else has considered this before, but perhaps optical illusions are similar examples of data that causes pathological behaviour?

Many of them constrain our viewpoint or the sequence of images viewed so we'd be unable to take the actions you mention to handle them.


Yes. But more importantly, we are continuously training our network. There is no terminating "training set" except the set of all considered classifications. Our discovery of ourselves being "wrong" about a thing is our network continually adjusting. We also have the notion of ignorance. These classifiers are often forced to come to a conclusion, instead of having "I don't know, let me look at it from another angle" kind of self-adjustment process. "Aha, it is a cat!" moments do not happen for ai. In us, it would create a whole new layer to wrap the classifier around some uncertainty logic. We would be motivated to examine the reasons behind our initial failure, and use the conclusions to train this new layer, further developing strategies to adapt to those inputs.


Exactly. Reading the paper, I have to say I disagree. Human minds are stable, for a very simple reason : they randomize inputs, they implicitly haar-cascade inputs simply because the sensors are inaccurate (they add time-variant noise like any real-world sensor), and they're mounted on a moving platform that constantly shifts position. The observation here should be that brains don't filter out noise, they use it to their advantage, to improve their performance.

So it's simple : both problems referred in the text don't exist for noisy data, and it's easy to improve ANN's classifications performance through things akin to adding random noise and haar-cascade like approaches (shifting the input image slightly, in x, y, rotation, white balance, ...), then taking the prediction that you saw most often. You can even make neural nets that do this implicitly (though it's even more expensive).

Anecdotally, I do think my own mind has this "problem". There are a large number of things I recognize immediately, but there are also quite a few things I have to look at for a few seconds to even a minute or two (usually geometrical stuff, network plans, or the like) before it "clicks" in my mind and I know what it is. Sometimes that is because I have to wait for the noise level to go down (e.g. exit a tunnel or a building into full sunlight), but usually it's not. I think it's very possible that at such times my mind is simply waiting until the noise in the input kicks it over some decision boundary.

Also, I find people often reclassify things after looking at them a little while longer.


I thought that more or less all of AI researchers admitted now for a long time that NNs aren't human brain and shouldn't reasoned about as if they were, because essentially these are different things. They can be used to study some particular behaviour of the intelligence, to make something more clear about how actual brin wokrs, but one must always remember that they are not actual brain. Recent post by LeCun mentioned that as well.

Anyway, that would seem natural to me. Consider numbers on the rightmost image in OP-link (I'll refer to that blogpost as [0]). Program recognises them "correctly". Would you? Well, I do recognize some, but not all of them for sure, and I guess it would be easy to make such dataset so I would lose flat to that NN.

That makes almost all of [0] pretty much nonsense to me. Author operates common-language words lacking of technical meaning, like "similar". He point's out that two images similar to human are not similar to the algorithm, but if we keep in mind that similarity is property with respect to some observer (ie human, neural net). And that is natural that what is important for that NN is different from what is important for most of us, humans. After all, it often happens that you don't recognize something on photo, but your friend does, and as far as you know you are both humans. More than that, after he tells you what it is you are like "Oh, right, I see now! Silly me!" So it's kinda common sense that for every two different classifiers you can find two objects that will be classified as similar by one and different by another. Well, yeah, humans have more in common between themselves compared to NNs, no big surprise as well. After all your internal image classifier doesn't receive exact pixel values on input, so of course one can find two images so you won't even see the difference and that NN fails to describe them as similar.

So essentially it is saying only that "every particular NN doesn't think like human" which is nothing new. Well, nice catch is that you can easily construct such counter-example manually, but that doesn't seem like "the biggest news in neural networks since the invention of the backpropagation algorithm". More than that, if the first notion (about absense of meaningful features for individual neurons) is completely true, it doesn't explain why deep learning is so successful.

So, I'm confused. If it's really something "backpropagation-size discovery" I'm waiting for comment from some expert we all know and trust, who can explain things clearly. You know, Hinton, LeCun. I just don't see what's so important about that paper.


I'm not trying to pretend that the way ANNs work is at all closely tied to how actual brains work. But the tasks we're trying to give to ANNs are generally about seeing some dataset the same way we do. The whole point in training ANNs on data images with human created class labels is to teach them / encode in them a representation of the mental concepts that we apply to the world, which is supposed to be generalizable. We evaluate them by testing their outputs on new inputs for which we have a human-provided label. You say that "similar" is relative to some observer -- but if the trained ANNs differ wildly from humans in judging what is similar, then maybe they're not doing the job as well as we hoped.


Depends on how the task is defined. Surely you wouldn't expect Naïve Bayes classifier to behave the same way as humans do, because it's simple and you have no illusions regarding what it can do. So you wouldn't be surprised by the fact that you can forge input of of NB classifier so it would give you false-negative, while staying essentially the same for human. Because humans are not Naïve Bayes classifiers, and you know it. Yet given the right task, right set of input parameters and right approach, NB classifier could yield very good results on practical datasets, so, give performance, you would declare it is "doing job well".

So the same thing shouldn't surprise (nor even upset) you about NNs, if you remember that humans are not Neural Nets. Just don't be fooled by the misleading similarity in names for NN concepts and human brain components, because they are not the same and thus don't act the same. It's just a name. So that shouldn't be anything new.

One more tricky moment is where you speak of "human-provided label". Right, but what is meaning of the label? I mean, if label for all cats would be "chair" and after that NN would label cat as "chair" it would do it's job correctly, and labeling cat as "cat" would be a mistake. Why I'm saying such an obvious nonsense? Because labels are somewhat arbitrary, but there's important difference between labeling objects by principle "recognized by [some specific] human as X" and "has origin of X". For example, you can have some photo of a dog (bad lightning, bad focus, bushes, fog, whatever) that wouldn't be recognized as "dog" by 9/10 humans you showed it, but nevertheless it is an image of a dog. Now is it a good result or a bad result, if your NN classified it as "dog"? Depends on how task is defined.

Anyway, what I was pointing out isn't that NNs' false negatives are not really false negatives or something. Yes they are. But every fuzzy classifier (including human) would give false negatives sometimes if labels are given "by origin", it's pretty obvious. What I was pointing out is that similarity between correctly matched object and incorrectly matched object is relative to the observer, and is directly linked to the technology observers works on. And we know that technology of NNs is completely different from technology of human brain. Once again, "cat classifier" inside of human brain doesn't have exact pixel values on input, and your NN does. So no surprise what looks exactly the same to you, looks (and in fact is!) like two different things for your NN.

So, you are right, currently existing ANNs are not as similar to humans as we'd like them to be (for some purposes, that is), but it is no news. Essentially by my previous comment I was trying to say this: if you are not surprised by the fact that NNs often correctly (by the factor of origin) recognize something that is totally unrecognizable to you (and that is common knowledge — take every modern hand-writtend number recognizer) you should not be surprised by the fact that the don't recognize some things that are clear as day to you, because these two effects are essentialy the one. So I'm not denying NNs aren't perfect (in some sense), I'm saying it's no news, for sure not "backpropagation invention sensation".


If you consider the neural pathway from the retina onwards, it does start with a pixel accurate input... You might have to resort to a short bright flash of a cat photo.

The neural pathway does many transforms so that things like scale and movement can be left out for some tasks.

An artificial neural network should evolve to do some similar transformations. It has been done.


> it does start with a pixel accurate input

Not really. Eye doesn't operate exact number values, not discrete, not even analogous electric signal. Even if we imagine that "image" is what is projected on retina (which isn't exactly true in our case), it couldn't be described as an array of pixels, because, as mentioned earlier (and as everyone should know already anyway) real neuron isn't even close to some "single number value storage", but entity much more complex. So, no, even then it wouldn't be pixel accurate input.

In our case, however, "pixel accurate input" is that digital image of yours, array of numbers, that is, which is processed directly by ANN, but not by you. To be processed by your brain it goes through some pre-processing in the computer, is projected onto your not-so-perfect display, mixed with light from all the sources around you and only then is it projected on your retina, which also isn't perfect pixel matrix. So it is actually very much possible that these two images with very close, but different pixel values (which are objectively different for computer) indeed do innervate your retina in exactly the same way, and thus are objectively exactly the same image for your eye, which passes signal further to your brain. It just isn't fair to compare it to some ANN, because your eye isn't sensitive to pixel forging of that sort, but is sensitive to some others, which are treated easier by the NN instead.

You may just think of it like having several filters of some sort on the way of the visual signal to your internal "cat classifier".


People seeing faces in all kinds of things is a vary common mistake which is probably analogous.


Not really. That, if anything, is like overfitting.


If you can slightly distort an image to make NN produce any classification you want that opens many interesting options for steganography.

Funniest application would be distorting individual characters of the printed text so that OCR engine and human would "see" two totally different but meaningful messages. There's likely not enough complexity in the OCR NN to do that, but who knows.


My interest is not really the existence of borderline-recognizable inputs, but how you would reach them. It's trivial to make any system classify wrong if you can add arbitrary amounts of noise. But how often can you trick a human by taking a full-frame broad daylight image and perturbing each pixel up to some limit? I expect the number to be pretty low, even if you're depriving them of the ability to perform more examinations of the object.


There are some possible applications of this:

Better captchas that are optimized to be hard for machines, but easy for humans.

Getting around automated systems that discriminate content. Like detecting copyrighted songs.

Training on these images improves generalization. Essentially these images add more data, since you know what class they should be given. But they are optimal in a certain sense, testing the things that NNs are getting it wrong, or finding the places where it has bad discontinuities.


Bad quality food optimized to trick the human senses to prefer it to actually nutritious stuff. You can also look at makeup, rhetoric, politics etc...


"Better captchas that are optimized to be hard for machines, but easy for humans."

Nope, not gonna work. You'd have to have the classifier/ANN parameters to generate these in the first place in order to locate its adversarial counterexample. Otherwise, the perturbations would likely be irrelevant noise.


The discovery of the paper was that these adversarial examples worked on other neural networks. Including ones trained on entirely different datasets. They are not specific to a single NN.


Well... Not really... They split the MNIST data set and trained on disparate halves. Which is to say I wouldn't generalize from two networks trained on far less than 10x their parameter counts all the way to all neural networks in existence, but of course, your opinions may vary...


Nobody pointed this out yet. It would be very interesting to keep finding such perturbations that mess up learning and repeatedly add the new-found examples to the training set, retraining the model in the process. I wonder if after a finite number of iterations the resulting model would be near-optimal (impossible to perturb without losing its human recognizability) -- or, if this is impossible, if we could derive some proofs for why precisely this is impossible.


The article points out that this is the approach taken by the researchers who found the effect.


I don't find it so surprising that, out of the vast number of possible small perturbations, there are a few that cause the image to be misclassified. I suppose it is interesting that you can systematically find such perturbations. But is there anything here which suggests that a neural network which does well on a test set won't continue to do well so long as the images given to it are truly "natural"?


The interesting thing is that it gets misclassified across networks.

According to the blog post, I can build two NN with different structures and train them on a random subset of a collection of dog and cat pictures. Distort a random picture until network A misclassifys it, then according to the article network B will also misclassify it, despite it having a different structure and a different training set.

I don't think it's obvious that network B will fail as well.


You're right. I guess I just don't like the fact that it's titled as "The Flaw Lurking In Every Deep Neural Net", when in fact neural nets will continue to classify new data as well as ever.

I agree that what you point out is very interesting.


More like "The Flaw Lurking in Every Machine Learning Algorithm with a Gradient"(1) IMO. For example, in a linear or logistic classifier, the derivative of the raw output with respect to the input is the input itself while the derivative of the input is the weight. Knowing this one can use the weights of any classifier to minimally adjust the input to produce a pathological example.

As for humans, I submit we have all sorts of issues like this. It's just that we have a temporal stream of slightly different versions of the input and that keeps inputs like this from having any significant area under the curve. Have you never suddenly noticed something that was right in front of you all along?

(1) And probably those that don't too, but it's harder to find cases like that without a gradient (not that it can't be done, because I've found them myself for linear classifiers using genetic algorithms, simulated annealing, and something that looked just like Thompson Sampling but wasn't).


Maybe it's a function of the fact that I'm not an AI expert, but I never thought it was that specialization for features (whether semantically meaningful or not) was localized to individual neurons, rather than the entire net. Why would we think otherwise?


I suppose it was assumed it was working in a "divide and conquer" manner, since that usually leads to a complexity reduction of algorithms? (and it was then assumed that the division was a clear region over the previous layer)

Of course there's no real need for the network to work that way, and perhaps this interpretation can be made if we assume that divisions are "fuzzy"/arbitrary.


I assumed it is because the amount of output neurons is small compared to the amount of inputs. For example a digit OCR network takes maybe 10.000 pixels as input, but has only 10 possible outputs. I'm no expert either, any confirmation or refutation would be very welcome :)


What happens when you add random noise to the inputs to the neural net?


I like this idea. Maybe it ought to be not just the inputs to the ANN, but to the entire network- like drop-out turning a fraction of the values to 0 some of the time. The network would have to work really hard to generalize to deal with all the noise. I'm sure tuning such a noise function would be critical too.


You would probably not be able to distinguish an adversarial image with added noise from a normal image with added noise.

While it is difficult to locate the adversarial examples by random permutation, they do not appear to be extremely specific. The paper even suggests that they exist within specific areas of the input space. So depending on the size of said area, adding noise will just lead you to a similarly adversarial image.

Regardless they propose a better way of fixing the problem by just modifying the training algorithm to penalize networks that have a structure allowing this kind of error.

Neural networks can still perform arbitrary computations, despite this result, so there is no reason to try and manually fix up bad inputs when you can train the network to do it.


If you rephrase your question to "What happens when I add random noise to the inputs of a neural network and try to teach it to output a "denoised" version of the input." and you've just invented denoising autoencoders.

http://www.iro.umontreal.ca/~lisa/publications2/index.php/pu...


Alright, that's neat. Not at all what I was suggesting, but neat nonetheless.



the perturbations in the study were not random, they had to be crafted.

To the GP, noise is often added to training datasets for exactly the reason you're suggesting it. One of the novel things the paper cited discusses, however, is even if you feed the adversarial perturbations into additional training data, there are yet new ways to subtly perturb the inputs to get incorrect results.

Misclassification is a pretty fundamental consequence of dimensionality reduction, of course, but the surprise is how close those misclassifications are in input-space. This isn't mistaking a box for a square because it's looked at head on, it's mistaking a bus for an ostrich because some of the pixels in the image changed to a slightly different shade of yellow.


I'm not an expert in any sense, just a curious bystander. Assuming that the ratio of perturbations causing misclassifications to ones that don't is extremely low, couldn't you perturb the image in the time dimension, such that "dog" misclassifications would be odd blips in an otherwise continuous "cat" signal, with some sort of smoothing applied that would average those blips away? And in fact wouldn't that be the default case in some real world implementations, such as biological NNs or driverless car ones? The input to the NN would be a live video feed captured via light focused on some kind of CCD, which is going to be inherently noisy.


Of course having data that contradicts the patterns that the neural network is looking for is going to make it err even when is subtle. Humans have it easier because we handle more abstract concepts. One way to solve this is "simplifying" the data: In practical terms (for images) that means applying a bilateral filter[0], also know in Photoshop as "surface blur".

[0] http://en.m.wikipedia.org/wiki/Bilateral_filter


That's not the interesting part, the interesting part is:

What is even more shocking is that the adversarial examples seem to have some sort of universality. That is a large fraction were misclassified by different network architectures trained on the same data and by networks trained on a different data set.


Yeah, It is the interesting part. Because with sufficiently large inputs (even from different sources) the networks are going to be looking for the same patterns.


But why would different structures of networks fit to different data result in exactly the same type of "overfitting"?


Are you sure this really negates the problem? (doesn't that essentially make it equivalent to a downsampled input?)

I think a more important question than trying to come up with a work around is to consider whether those cases really matter. If the probability of those "adversarial noises" is low enough, who cares? Those cases becomes a curiosity. We have noisy systems that may fail catastrophically operating everywhere with some low probability noise vector, but we manage those probabilities.

Unless of course it is an adversarial application and the network is made public; but then I think there are other also simple ways of defeating the attack, e.g. some random smoothing (so that the smoothing isn't also attacked), adding some more noise, etc.


There are the practical flaws like "okay, how reliable can this be if it decides that this picture that is clearly a car is instead a camel." But you're right, we can study those empirically and say whether or not they're actually a practical problem.

But I think that the more interesting question is, what the hell is actually going on here? What is the neural network actually detecting, that you can change what it is detecting with such innocuous-to-human-eyes changes? Is there indeed some kind of pattern that is genuinely characteristic of camels or whatever that is invisible to us, but "visible" to the neural net?

The temptation is to say it's just overtraining, picking up some noise that happens to be the same in various training set photos but is genuinely meaningless in the real world. But the fact that multiple different algorithms have the same or similar adversarial examples, and the fact that deep learning neural networks do appear to perform reasonably well in the real world, argues against that.

So if we provisionally think that it's not overtraining, what is it? Could there be a feature of camels that we can't see? Could that tell us something about camels, or vision, or something?

And vice-versa, what if our goal isn't a narrow task like "identify the existence of camels," but instead we're trying to get a computer to identify similar features in camels as we are (perhaps because the goal is not simply to identify camels, but to identify camels and then reason about the camel in some way). How do we get a deep learning neural net to not focus on some invisible feature, and instead learn to identify a camel based on features that humans DO perceive?


Is not about vision, is about emerging patters that arise from noise. If just one piece (deemed important) piece of the pattern differs from the expectation, then the classification fails. Therefore surface blurring does fix the issue, because the number of patterns is reduced (when applied in both the source samples and the user input).

Here is a diff of the borders of the pictures (instead of pixels, which is less useful), the one in the left with surface blur and the one in the right without: http://i.imgur.com/3bihGVA.png (The parts marked as completely different are the white ones)


Yeah, dude, I know. But why is that piece deemed important? Is it truly important? If it is not truly important, then the algorithm is overtrained -- that is, it has found "patterns" that exist in the training set but only by random chance, and which would not exist in a sample set.

But as I mentioned, there seem like there are reasonable reasons to suspect that this is not just overtraining.


What does adding surface blur really fix? Why could you not add similarly subtle network-breaking noise to a blurred image? If you think of blurring as downsampling, it is obvious that you also need downsampled noise to trick the NN.


The noise is not random, one can see exactly where the artifacts were added (the white parts of the image). And for this solution to work the down sampling must happen inside the program, not in the file system, therefore all network-braking noise would have been already discarded.


And for this solution to work the down sampling must happen inside the program, not in the file system, therefore all network-braking noise would have been already discarded.

I see, maybe I just didn't get your point the first time.

Anyway, what I was trying to say is that if you do view the downsampling as part of the network/program, you could apply the optimization procedure mentioned in the paper to a network that blurs its input. I assume that this would then generate network-breaking patterns that are imperceptible to the human eye, in the same way as happens in the paper.


The way people classify hybrid images is a function of how far away they are. I wonder if these are essentially hybrid images for neural nets. It seems like the noise being added is very high frequency. Given that, I would bet that neural nets classify typical hybrid images the same way as they would the sharper image component.


Humans have a constantly changing view of things, so any adversarial examples from one angle quickly change when viewed from another angle.

This issue can be framed in another way, something incorrectly classified could become easily classified with very minor changes in perspective.


It's not clear how resistant the changes are to random noise. Or how easily it would be to modify their procedure to create images which work even with random noise.

I do suspect noise would help, but some of the changes are things like blurring edges and lines that NNs are sensitive to. Adding noise would just make that worse.


I'm not talking about noise or blurring, though that's another point. Something like a self driving car doesn't operate on a single image, it operates on a time series of images from multiple angles and points in time. I think an adversarial example which lasts for a prolonged period of time from multiple angles would be rare if not impossible.

It needs to be investigated further.


I've always thought that our brains are more complex than we can model at the moment. There is some fundamental concept that we are missing and it allows the brain to classify things we can't do with even our best neural nets today.


A side question i've always wondered : if you train a nn to recognize a particular person's face from photos, will it be able to recognize a drawing / cartoon of that person ?


No, unless you've explicitly trained it to match photos to cartoon images. The article is a bit sensationalistic when it says 'what if a NN would misclassify a pedestrian crossing as an empty road?'. The truth is you can't compare a simple photo recognizer with human perception. Human perception obviously can do far more advanced things than just matching photos to memories of photos, it's not even very good at matching photos. We have depth perception, we have object isolation, we can remember and abstract shapes, we can extrapolate and interpolate, we have loads and loads of context. We don't see the world in RGBA. We see the world as a continuous stream of related information.

All these systems together make sure a human would never[1] mistake a pedestrian crossing to an empty road, and allow us to match abstract paintings to realistic images. Any serious artificial autonomous agent would similarily consist of many independent but contextualized systems.

edit: 1] Never as in, never unless the pedestrian makes a good effort to look like an empty road to all of those systems


I took Yann LeCun's machine learning class at NYU. He demo'ed a system that learned how to recognize human faces from photographs. When he pointed it at a comic book (system was only trained on photos, not drawings), it also recognized the faces in it.


I would say that the NN learns to recognize what the training photos of this person have in common, so if the drawing is similar in the respects that you have trained for, it might be recognized.


Should be easy to counter. Just show several dithered images to NN. Of course if there is a deliberate attack this may not be sufficient.


I am immediately forced to think of Godels Incompleteness Theorems. Can it be proved that these examples always exist within some bounds of manipulation?


Url changed from http://thinkingmachineblog.net/the-flaw-lurking-in-every-dee..., which points to (actually, does a lot more than just point to) this.


This is a really interesting find. At the same time, I have this lurking fear that it will be misappropriated by the anti-intellectual idiots to marginalize the AI community, cut funding, et cetera. Another AI winter, just like the one after the (not at all shocking) "discovery" that single perceptrons can't model XOR.

If anything, this shows that we need more funding of machine learning and to create more jobs in it, so we can get a deeper understanding of what's really going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: