Hacker News new | past | comments | ask | show | jobs | submit login
Deep Neural Networks Are Easily Fooled (arxiv.org)
183 points by sinwave on Dec 9, 2014 | hide | past | favorite | 60 comments

These arguments were introduced by Szegedy et. al. earlier this year in this paper: http://cs.nyu.edu/~zaremba/docs/understanding.pdf. Geoff Hinton addressed this matter in his Reddit AMA last month.

The results are not specific to neural networks (similar techniques could be used to fool logistic regression). The problem is that ultimately a trained network relies heavily on certain activation pathways which can be precisely targeted (given full knowledge of the network) to fool networks into misclassification on data points which might to a human seem imperceptibly changed from those which are correctly classified. It is important to understand adversarial cases, but unreasonable to get carried away with sweeping pronouncements about what this does or doesn't about all neural networks, let alone intelligence generally, or the entire enterprise of AI research, as seems to happen after a splashy headline.

Thanks for the explanation!

I agree that it doesn't seem to be a huge cause of concern for the general use case, but it does make me question the use of machine learning algorithms for computer security, an idea which I've seen a few companies pursuing nowadays.

Entire fields of finance exist for this. Find the hole in the model, and stuff billions of bonds and derivs in it. This is especially an issue when the model sellers optimize off of it.

Also a reason why it's good to be leery when people say, "I don't need to know what's going on in the model, the results are good enough."

I'd characterize this differently and also, as a lot more interesting than that. understanding.pdf can be viewed as a sort of dual to this paper but they're not covering the same thing. In Szegedy et al., they constructed invisibly perturbed images that resulted in the misclassification of previously correctly classified images. Here, the results of a search were images whose classification have little to no visual similarity to typical members of that class.

In a way this is interesting because it's a sort of visualization of what the network views as important in discriminating between different objects. It's also interesting as a display of how alien the learned model's view of the world is.

Take optical illusions...optical illusions are remotely similar to this sort of exploit, although the sort of scene modeling we do is a lot more complex than recognition or decomposition. Anyways, illusions exploit cues that result in distorted recognition but not drastically so, unlike the case for these networks. My guess is that this is due to animal vision using a lot more high level cues -- cues that are also useful in a natural setting -- depending on things like size, color, shade, lines, context and so on. Visual systems are also a lot more proactive, filtering out things that don't make sense, fudging color at the edges of vision, smoothing out shades and generally making inferences and deductions about what it should be seeing and how things are "supposed" to be. In fact, a good number of illusions exploit those aspects of vision.

In the case of these networks, the cues are incomprehensible, having no natural counterpart, so we see most of them as noise. But sometimes they make a kind of sense, as in the starfish, baseball and sunglasses examples. Based on the observations in the paper, I would guess only a handful activations strongly associated to each feature are responsible for each susceptibility.

With animal brains the distortions usually end up in a slightly transformed space, a different scaling or something. It's useful to match a bit overzealously and get something like pareidolia but it also makes sense to have the conflations actually be like something you might run into. The ANNs have no such incentive.

Their paper also wonders about whether this is unique to discriminative classifiers. Would a generative classifier, with access to a proper distribution, be so susceptible? That'd be very interesting to see.

They also mention some real world consequences, some of which I disagree with. Neural Networks are good at interpolating between examples, so if your training has good coverage over what is to be expected then it'll work very well. And in the era of big data this isn't really a problem (that they don't generalize as we do might explain some of why they have trouble with abstract images) so I'm skeptical an image search solution would be thrown off by textures.

There is, however, a better example of facial or speaker recognition. For example, you could train a network to distinguish between faces or voices and then evolve a pattern against it. This could then be used in such a way as to be randomly matched to an individual on a target database. Not good. Driverless cars are also mentioned but those are typically augmented beyond just vision. Personally, I'd add medical scans to the list of things to be careful with.

Finally, it's worth mentioning that some of the evolved images are inspired works of art. And a few of the images optimized (not evolved) with an L2 penalization are recognizable without the label and a few more where you can see why it gave the label it did.

Your offhanded dismissal was unwarranted IMO.

The whole point of this paper is that these 'illusions' look like noise to humans and make no sense in relation to real-life objects, and you're hand-waving it away as if it was some insignificant, tangential technicality.


I'm willing to bet that in late 70s many people also thought that rule-based systems could lead to real AI if they were "more complex" and had more computing power available.

Hello, you write : "With animal brains the distortions usually end up in a slightly transformed space, a different scaling or something."

Do you have any references to support this?

I know very little about neural networks, other than at the conceptual level, so I could be very off here, but couldn't an algorithm be defined to look for meta signals surrounding the activation pathways which could be regularly 'audited'?

What I mean is, humans have a 6th sense that tells us when things are 'off', and require further examination. This comes from having 5 senses that are highly tuned to the world around us, and work incredibly well together. In some ways, physical hardware has limitations on it's ability to 'sense' the outside world, but in other ways, the ability to analyze and collect hard data is significantly more powerful than what humans can achieve, even at a subconscious level. Do current neural network algorithms not have this kind of failsafe?

Humans don't have a very good conceptual failsafe against known exploits. Everything from optical illusions to political messaging to closeup magic can get past our perceptual filters.

In fact optical illusions are probably the best example of this; even knowing that what you perceive is not correct doesn't change your perception.

That's a great point about human conceptual failsafes, but also, the failsafe I would be referring to in your optical illusion example would be the fact that you actually know what you perceive is incorrect, receiving that information from another data source - someone told you what you are seeing is incorrect. There is never a 100% foolproof failsafe, humans can still easily be manipulated and exploited, but manipulation doesn't work 100% across the board for humans either. Is there value in leveraging failsafes across a network? (I'm just throwing ideas out here)

"The results are not specific to neural networks (similar techniques could be used to fool logistic regression)."

Do you have any references or examples where such behavior has been demonstrated on non-NN algorithms (like logistic regression)?

I can verify this is true. It is not extremely difficult to build a multinomial regression model and find "adversarial" images for this trained model. Of course, you're just taking me for my word on this, but you could very well try it yourself. You don't really need any special software outside the tools of SciPy and sklearn for instance if you use python.

OK, I found where Hinton claimed that[1], but it would be nice to have a better reference than a Reddit comment if anyone knows of one.

[1] http://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_...

Can you explain how this might be related to the concept of "overfitting"?

It's not specially related to overfitting. It's just general property of abstraction process (overfitting can be thought as over-abstraction).

Financial accounting can be thought as system that tries to make it easy to differentiate between irregularities and normal operation from few summary documents. The summary documents like balance sheets are abstractions of the original records and you can normally detect financial problems easily using them.

But if you have intelligent adversary within the accounting process that tries to forge the result, you must look into the original records and go trough them to find forgery.

Question: do these blind spots of neural nets persist as the data set is expanded?

I understand that if you have a fixed training data set of, say, 10 million images, there are probably going to be adversarial examples that will be specific to a net trained on that set. (These may be the same artifacts that are picked up by overfitting, or they could be local underfitting artifacts due to early stopping.) But are they specific to the data set?

In other words, are the same adversarial examples going to "break" a neural net trained on a separate (but identically generated) set of 10 million images? Or is this something that can be smoothed away with, say, ensemble methods and multiple data sets?

I guess this boils down to something similar to the approximation of a continuous function by polynomials: given any finite number of values (x_i, y_i), you can always find a polynomial passing through them. However, there are continuous functions which are nowhere differentiable and which have unbounded variation between x_i and x_{i+1} for each i.

The "polynomials" are the "nice images" and the "continuous with unbounded variation" are the "noisy ones".

Disclaimer: I am neither a lawyer not an expert on NN.

Edit: I would like to be corrected if I the similitude is wrong.

We humans are as brilliant at pattern matching as we are in finding patterns that aren't really there, not just with our vision but with our understanding of probability , randomness, and even cause & effect. Thankfully, our brains are very complicated machines that can recognize a stucko wall or a cloud and invalidate the false identification of a face or unicorn or whatever based on that context.

With that in mind, is it really surprising that [m]any of our attempts at emulating intelligence can be easily fooled? An untold number of species have evolved to do exactly the same thing: exploit the pattern matching errors of predators to disguise themselves as leaves or tree branches or venomous animals that the predator avoids like the plague. DNNs seem to be relatively new and we've got a long ways to go, so is this a fundamental problem with the theoretical underpinning or do we just need to train them with far more contextualized data (for lack of a better phrase)?

Is there any chance of us having accurate DNNs if we can, as if gods during the course of natural selection, peek into the brain of predators (algorithms) and reverse engineer failures (disguises for prey) like this?

What I don't get about AI optimism: So we humans are very fallible and can easily make mistakes. Computers are better at us when it comes to problems that are clearly defined and for the problem is decidable (can implement an algorithm to solve any instance of the problem). But we need AI when the problems are more fuzzy, like recognizing a lion in a picture.

How can we build a mostly automated future, if the AIs that are supposed to do our jobs turn out to be very fallible as well? They won't - supposedly - have the problem of being self-aware and being able to follow their emotions rather than their own best judgement and reasoning. But it seems that some problems are inherently prone to making mistakes. Can it be avoided at all? And if so, who do we blame when an AI makes a "mistake" like that? The training set?

Just some back-of-the-envelope estimations (which could be completely wrong):

* AIs can focus all their capacities on a single task for an unlimited time, while a human can focus for a couple of hours each day. (That’s 5-6X thinking time each day.)

* More importantly, AIs have faster access to knowledge systems, other AIs and computation resources (e.g. for simulations and prototyping). For an AI, it will possibly take only in the order of 10e-3 to 10e-2 s to query and interpret information while humans are more in the ballpark of 10e0 to 10e2 s.

* Another advantage is that AIs could fork modified versions of themselves which possibly results in an exponential evolution. The rate of evolution could possibly about 10e8 times higer compared to humans (several seconds vs. 25 years).

> if the AIs that are supposed to do our jobs turn out to be very fallible as well?

To state that AIs can be foiled by a specifically crafted adversarial attack, does not mean the AI is very fallible. Under normal circumstances it still outperforms humans. But let's say the error rate of a net is on average 3%, and an attack works only if it is crafted for a specific net (where the weights are known etc.). Like a team of humans is able to come up with better solutions than individual team members, so does an ensemble of nets (usually) outperform any of its individual members. For instance, the final model generalizes better, because it uses information and predictions from all the nets. Foil one out of ten nets, and the other nine will cancel out the bad prediction with their votes.

In an automated future jobs will be taken up by AI. A security guard then becomes a supervisor: 100s of nets will try to detect disturbances in a mall, and when they find one, they notify the supervisor. A judgement call is then made by a human. This is currently happening with law expert systems. A judge will input the case and get a prediction for punishment. Then make a final adjustment to this punishment, looking at the context of the case. Such a system prevents racially based sentences (the AI will not care about race, but look at precedent), while still giving emotions, judgment and reasoning a final say.

> it seems that some problems are inherently prone to making mistakes. Can it be avoided at all?

Some problems are incomputable, like finding the shortest program to reproduce a larger string. In others, like lossy compression, mistakes may be made in representing lossless information, that humans are unable to spot. I think this is a very interesting, but difficult question to answer.

> who do we blame when an AI makes a "mistake" like that? The training set?

We blame the AI researcher that build the net :). And she will blame the training set :).

People's fallibility goes down as you throw more of them at a task—not because the majority will be right, but because the signal adds up while the noise cancels out. This is what the efficient market hypothesis, "wisdom of crowds", etc. are basically about.

If you train 1000 AIs on different subsets of your training corpus, their ensemble will be much "hardier" than one AI trained on the entire corpus. The automated future comes from the fact that you didn't need 1000 full training corpii to get this effect, nor do 1000 AIs cost much more than one to run, once you've built out hardware enough for one.

In other words, AI makes the application of "brute-force intelligence" to a large problem cheap enough to be feasible, in the same way slave labor made building pyramids by brute force cheap enough to be feasible.

There are many examples where crowd behavior exhibits less "wisdom". Take any market bubble for example. Have we ever looked at a tough scientific problem and came to the conclusion that the best path forward was to collect as many random people off the street and shove them in a room to solve it?

Also bootstrapping or model parameter selection techniques are already heavily used in AI and have not yet brought us this future. I believe that the model you presented has been simplified a bit too much ignoring a lot of important variables.

> examples where crowd behavior exhibits less "wisdom"

When crowds act irrational there is usually a problem in communication. The crowd would still be able to solve problems better after they repair these communication channels. For instance, some rocket launches failed because information by engineers lower in the chain of command, did not work its way up to the decision makers. The group was too compartmentalized, but launching a rocket is necessarily a group effort. 9/11 could have been prevented, or the aftermath lessened, if communication between intelligence agencies was better. In a market crash we often see a single actor making a decision or prediction, and there is little to no reward for people down the chain to disagree with that prediction, or even adjust it (leading to insufficient variance in the predictions). Everyone is blindly chasing the experts, while in a good group setting there is no need to chase the experts.

> Have we ever looked at a tough scientific problem and came to the conclusion that the best path forward was to collect as many random people off the street and shove them in a room to solve it?

Has a scientist ever solved a tough problem growing up in isolation to other scientists? I consider "standing on the shoulders of giants" to be a form of group intelligence. But yes: We have done something similar at RAND corporation. The problem was: Forecast the impact of future technology on the military. The solution was to collect experts (not random people), put them in a room with an experiment leader, and gradually converge to the best forecast, using anonymous feedback every round. It's called the Delphi Technique and it is still in use.

Also, there is an experiment running right now, that takes random civilians, has them answer intelligence questions ("Will North-Korea launch a nuke within 30 days?") and gives weights to their answers, according to previous results. This way random civilians trickle up to the top, that individually beat a team of trained intelligence analysts, simply using their gut or Google. It's called the "the Good Judgment Project". Put ten of those civilians in a room and you have an intelligence unit that is not afraid to be wrong, does not have a reputation to uphold, and does not care about any group pressure, authorities or restrictive protocols that may hamper a group of real intelligence analysts.

> Also bootstrapping or model parameter selection techniques are already heavily used in AI

I believe the parent was talking about model ensembling/ model averaging, not ensembling techniques used by single models, like the boosting or bagging that random forests use. If you have a single attack as input crafted for a single model, then a voting ensemble of three models (lets say: random forests with Gini split, regularized greedy forests and extremely randomized trees) will not be foiled.

Except, of course, that the pyramids are a marvel of workmanship and engineering probably built by a relatively small force of skilled workers. Their construction methods are about as far from "brute-force" as you could possibly get.


"One machine can do the work of 50 ordinary men. No machine can do the work of one extraordinary man." Elbert Hubbard

There is a huge class of cognitive mistakes our brains make that we are aware of, but we can't really train them out of ourselves effectively. Since we can't rewire our own brain, the hope is to wire up something that would not exhibit those known mistakes and biases.

The good thing about machine learning is that there are multiple ways to do it.

The textbook solution for your problem is to throw multiple methods at a problem, and then when they disagree let a human use their judgement.

This is the best of both worlds: computers do the easy, repetitive stuff that humans find boring. Then the tricky things humans use their judgement on.

AIs may be fallible, but much less so than humans because we can eliminate many sources of fallibility that humans have. A car-driving AI will not fail because it didn't get enough sleep, was distracted by an attractive person on the sidewalk, or drank too much.

I'd think all you can do is design for failure (as you would now) and use the failures as training data. The only real advantage is that you don't pay AI and they don't get bored, they're consistsntly good or bad.

I have nothing intelligent to say without reading the full paper...

...But, how different is this from the various optical illusions humans fall for? I mean we can't exactly tell the difference between a rabbit and duck ourselves[1] so isn't it just a universal property of all neural-network like systems that there will be huge areas of mis-classifications for which there hasn't been specific selection?

[1] http://mathworld.wolfram.com/Rabbit-DuckIllusion.html

I believe that the point of the article is that the triggers for optical illusions are totally different in humans and the ANNs. I don't know how valid is this statement - humans sometimes do recognize "shapes" in white noise too.

The vulnerability exploits imperfections in the NN weights. To avoid this kind of mismatch all you need to do is shift the same image by 1 pixel (assuming recognition is done per pixel), and you can cross check results to check if an error occurred.

Human brain recognizes better because it can sample the image many times from many slightly different angles. There's a reason saccade (http://en.wikipedia.org/wiki/Saccade) exists.

This is highly oversimplified. The previous example of antagonistic samples being found works even if the training data was perturbed as you described. The reasons for saccades are also a lot more complex, involving photoreceptor adaptation etc.

There was a similar result a few months ago for another type of machine learning. (That's note 26 in this paper.) The problem seemed to be that the training process produces results which are too near boundaries in some dimension, and are thus very sensitive to small changes. Such models are subject to a sort of "fuzzing attack", where the input is changed slightly and the output changes drastically.

There are two parts of this process that are kind of flaky. The problem above is one of them. The other part is feature extraction where the feature set is learned from the training set. The features thus selected are chosen somewhat randomly and are very dependent on the training set. It's amazing to me that works at all. Earlier thinking was to have some canonical set of features (vertical lines, horizontal lines, various kinds of curves, etc.), the idea being to mimic early vision, the processing that happens in the retina. Automatic feature choice apparently outperforms that, but may not really be working as well as previously believed.

It's great seeing all this progress being made.

Automatic feature choice can actually lead to a set of features that resembles V1 receptive fields (as demonstrated by Olshausen and Field in https://courses.cs.washington.edu/courses/cse528/11sp/Olshau...).

I recently attended a talk by Geoff Hinton on "capsules." He pointed out that the max pooling used in convolutional neural networks effectively disregards information about relationships among features. Instead, he propose a network composed of "capsules" that each estimate whether an implicitly defined intermediate feature is present and its pose. The idea is that an object is present only if its intermediate features are present and their poses agree. He showed some neat results from these models (some published in http://arxiv.org/pdf/1412.1897v1.pdf, and some from http://www.cs.utoronto.ca/~tijmen/tijmen_thesis.pdf). Notably, these models can evidently learn to classify MNIST with >98% accuracy given only 25 labeled examples. (I am not sure how many unlabeled examples were used.) I don't have any experience with these models, but given that most of these images look like a single feature embedded in noise or as a texture, I would not be surprised if a capsule-based network would not be so susceptible to these images.

The discussion about this paper on r/MachineLearning is quite insightful and worth reading: http://www.reddit.com/r/MachineLearning/comments/2onzmd/deep...

I start to like this way of looking for false positives or false negatives more and more.

It would be interesting to introduce some kind of aspects known from the human brain and see if the misclassified items "move" in some conceptually understandable direction.

* Introduce time. Humans are not just image classifiers; humans are able to recognize objects in visual streams of images. Such streams can be seen as latent variables that introduce correlations over time as well as space. What constitutes spatial noise might very well be influenced in our brains by the temporal correlations we see as well.

* Introduce saccades. A computer is only able to see a picture from one viewpoint. Our eyes undergo saccades and microsaccades. That's an unfair advantage for us, being able to see a picture multiple times from different directions!

* Introduce the body. We can move around an object. This again introduces correlations that 1.) are available to us, and 2.) might define priors even when we are not able to move around the picture. In other words, we can (unconsciously) rotate things in our head.

You might be interested in this paper, if you haven't seen it already: http://arxiv.org/abs/1406.6247

Another journal paper covering the same thing.


And the article I got that references it.


One might say that Picasso's Bull is a human equivalent of this: he "evolved" a sequence of images and ended up with something that has very few features of a bull, but nevertheless gets recognized by humans as such.

Then again, unlike the neural networks in the paper, humans would be capable of classifying abstract images into a separate category if asked.

I know literally nothing about this science, so that paper had me concerned about the following question:

Given a visual face recognition door lock or similar system. If I want to break such a door lock, can I install that system at home, train it with secretly taken pictures of an authorized person, and evolve some kind of key picture with my home system until I can show it to the target door lock and fool it into giving me access?

OK this is a very simplified way to put the question, but is that something this paper would imply to be possible (in a more sophisticated way)?

If i'm understanding you correctly you want to train a second system with a first system's authorized user, then use the 'key' to open the first system.

A someone who actively researches biometrics I can't say that this is a good method for a few reasons.

1) Systems often train templates which look very different than the original input, especially if more than one image is involved in training. These templates aren't necessarily going to be recognizable to the first system (even if they can be represented as a 2D image).

2) Many enterprise systems (such as from Honeywell or whoever) include liveness tests and spoofing measures. Though anecdotally they are not very good, they check for basic measures such as if the pupil expands and contracts from a burst of light.

3) Most biometrics that involve access to some place (verification) usually include a 3rd party monitoring said access.

If you were to do this for say someone's home. Depending on the system you may gain access with a high definition photo as many consumer systems are set to a higher false accept rate (FAR) to prevent user aggravation. However, if they set it to be very strict (giving a larger false reject rate) then the best way would probably be attack at their sensor directly. That is, the system often doesn't care about the surroundings, it's trained for one task (open if authorized user).

You may be interested in this talk: https://www.youtube.com/watch?v=tleeC-KlsKA#t=282

The speakers runs you through a hypothetical case study: a pet-door company and looks at the pitfalls of applying machine learning to it.

I believe the paper is actually focusing on something else: Create images that humans will not be able to classify as a digit, but that the net will gladly give a prediction for. To translate to faces: There may be clouds that look random to our human brain, but are detected as faces by nets.

It seems this is an adversarial attack, where you need access to the guts of the net (weights, layers). I compare this with a hashing algorithm and brute-forcing the input till you find a collision with a target. Nearly impossible in real-life situations.

You may be able to sign a check using a scribble that the cashier can not recognize as a digit, but that the machine will recognize as a digit. Not much practical gain from an attack there.

You could probably arrange that person to be on a photograph (say ask him/her to hold a sign "I support our war heroes!" or something else everyone would agree to hold if you said you're making a photo album for a non-profit) and work from there.

But in general, this class of problems is why a biometric lock is only useful if accompanied by a guard with a gun stationed next to it.

Thanks all for the answers. I find neural networks and AI such an interesting topic, wish I had more time to go beyond science journalism to learn about it...

Yes, exactly.

However, you will need the database or training data fed into that lock system. Without the data, you won't be able to test it at home.

I don't have too much of a problem with this actually, because a lot of the "nonsense" images actually bear strong resemblance to the objects. The gorilla images clearly look like a gorilla, the windsor tie images clearly show a collar and a tie. The image coloring is way off of course, but the gradients seem about right.

If we could find out the selection criteria behind each layer of the neural network for the human visual cortex we could possibly build something more accurate.

Although I doubt the visual cortex is a simple feed forward network like the one used in the paper. It's likely to have a non linear structure that's significantly more complex.

So, deep neural networks are like artists, able to see a structure in chaos? Like when Michelangelo looked at a large stone, seeing David there immediately, so do DNNs recognize lions in white noise? We should applaud introduction of phantasy and imagination into science ;-)

What this tells me that there probably exist deeply weird images that would be recognized as something by one person or by very few people, but would be just an unrecognizable mash of colors and lines to everyone else.

I wish they explained why evolutionary algorithms were used. They seem to suggest gradient ascent also works - I wonder what the key criteria are for constructing good adversarial images?

This brings interesting question. Is it possible to hack human brain? Will specific set of stimuli make brain react in certain way?

A field in cognitive science that asks these questions is "gestalt psychology". http://en.wikipedia.org/wiki/Gestalt_psychology

If the shoe was on the other foot, I can imagine a race of super computer AI's administrating a similar test to humans and saying look at the puny human vision system. It is fooled easily by simple optical illusions that wouldn't fool even a 2 year old AI. Clearly, there are questions about the generality of the human vision system and perhaps it is not fit for purpose...

Yes, of course, what constitutes sensible image recognition is just a matter of opinion, and opinions of some algorithm are as valid as yours. In fact, my pseudo-random number generator seeded with image binary accurately recognizes 100% of images I give it (based on my newly developed definition of image recognition).

I have got no idea of the point you are trying to make. You seem to be criticizing something I said but not sure what. You do realize that I was trying to make a somewhat ironic / somewhat humorous comment that you have to be careful that what you test for is relevant and that one isn't focusing on a narrow weakness which may not actually be that relevant for the general case.

You're making a comment in response to a specific research paper. Therefore, I interpret the comment within the context of that paper. So you're implying that the paper is "focusing on a narrow weakness which may not actually be that relevant for the general case". I disagree.

Any 2d image is an optical illusion, so it makes no sense to criticize human image recognition based on it being 'fooled' by illusions. The real criteria for whether image recognition works well or not is altogether different.

So are human brains.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact