Hacker News new | comments | show | ask | jobs | submit login
One pixel attack for deceiving deep neural networks (arxiv.org)
260 points by astdb 7 months ago | hide | past | web | favorite | 129 comments

When discussing adversarial attacks on neural nets, everyone seems to just kind of take for granted that the human visual system is immune to this kind of fudgery. I'd be very interested (and a little afraid) to know if this is in fact the case - we know of a bunch of optical illusions that are perceived differently to what they "ought to be", so it's not too much of a stretch to figure that we could come up with a technique to generate more extreme examples.

It’s still conjecture but the chances are good that photosensitive epilepsy[0] as triggered for example by the famous Pokemon episode[1] is conceptually similar to the unstable responses that adversarial attacks produce in computational neural networks (and yes while there is some debate about the total number of seizures triggered by the broadcasting of that episode there is no debate as to whether the medical condition exists and is an real risk factor for a surprisingly high fraction of the population, particularly males and children[2])




This is a bit of a stretch.

Beyond the fact that unexpected inputs produce unwanted outputs in both cases, I'm not sure I see the connection.

In the DNN case, the network should be able to end up in both states. However, the boundary is fairly sharp and small changes in the input not only push it over the boundary, but make give it high confidence that the new state is the correct one.

The brain, however, shouldn't ever end up in an epileptic state; there are feedback mechanisms that decorrelate neural activity and keep excitation in a certain range. These mechanisms are weakened or defective in epilepsy, which allows very potent stimuli to drive neural activity and kick off positive feedback loops.

The vast majority of the adversarial work on DNN’s is about using small input perturbations to push the system from one stable output state to a different stable output state, mostly because work is mostly done on images rather than video so repeatedly re-classifying the same image isn’t very interesting. That doesn’t mean stable state to stable state is the only form of adversarial input that can be found, it just means that’s all that has been looked for so far.

The odds that there are adversarial perturbations that reliably push the system into an unstable state between two or more output states is probably quite high. It just hasnt hit the threshold at which it becomes interesting for postdocs to look for yet.


This is a categoric no. Epilepsy is a documented phenomenon. Sonically induced excretion is not, despite having received sustained military attention at some point in time, which makes a false negative extremely implausible.

Egestion. Not excretion. ;-)

Elimination, i think.

Having read some very introductory stuff about human visual system, I'd be willing to guess that a partial reason is that the human vision works on the input level quite differently than "observe NxM colored pixels in grid". The act of constructing what we'd think as a single "view" of our surroundings (that one intuitively might equate with a still pixel image) includes lots of low-level processing.

(E.g. Everyone probably has heard about the blind spot at the point where optic nerve 'connects' to retina. But I was surprised to learn that most of the stuff in our FOV is seen very poorly, and only the tiny area called macula is responsible for the high acuity vision. And the macula is capable of covering only a small spot in our FOV at one time: what we'd think as a single view is really constructed in a post-processing step as our eyes move and scan our surroundings. My amateur-level explanation might be a bit off, but the point is, it stands to reason that kind of "dynamic" processing system is more robust (especially to small pixel level "attacks" as in the linked paper) than simply constructing mappings of still pixel grids to outputs, as currently done by CNNs.)

What we know about the brain points to lots and lots of top-down influence on visual processing. Your learned preconceptions can go down the stack and alter/override raw sensory data that gets fed from your eyes up the stack, forming loops that are arguably absent in ANN solutions.

Not only that but much of what you ‘see’ is reconstructed from memory not active input from the eyes.

The mechanisms of vision are really fascinating. Ever notice that when you first glance at a clock, that first tick of the second hand/digits seems to take too long?

I think people get fooled and do small mistakes all the time but we easily ignore them because it's corrected by other senses. Like you thought your phone screen turned itself on in your side vision but you just left your credit card or something on top of the screen.

You can cover someones eyes obstruct visual system entirely, and you'll be fooled for a second until you realize it's probably just your friend because you can feel their hands. Did you fool your friend's visual system? Or does it need to be fooled forever to say you fooled the visual system?

So the visual system on its own is not enough as we have many other senses that you'd need to fool in order to fool you entirely.

I think optical illusions is a great way to fool the visual system only, but it doesn't fool you. (all your senses) There are magic tricks for that sort of thing, but even then we know it's just a trick.

As raverbashing pointed out our visual system is more than just a 2d grid of points with a certain color, so obviously that is easier to fool.

When I saw the 32x32 image of the frog with the pixels I wasn't even sure what it was for a few seconds. But I had context, like I know these images are probably things that are easily classified, so it's gotta be some common animal, airplane, car, house, etc. I also knew that I was supposed to ignore the pixels.

If you present the frog image with pixels to people on the street and tell them to classify the image, I'm sure many of them will get it wrong if they are given less than 2 seconds to look at it. More so when they're focused on something completely irrelevant.

> take for granted that the human visual system is immune to this kind of fudgery

Clearly it isn't, but given that animals use camoflage, the visual system is the end product of millions of years of evolutionary arms race against confounding inputs.

What are you trying to say? What insight do these assertions give us on the human visual system?

It's simpler than that

The human vision has a natural blur (some more than others), natural temporal integration of inputs, automatic centering of subjects, limited resolution and gain adjustment

Hence one pixel nudged makes absolutely no difference

All the things you've said, bar temporal integration, are also present in CNNs in some way. Training data is "augmented" with different levels of luminosity to cancel the affect that has on classification, pooling layers give some translation invariance (and avg pooling is a type of blur), and resolution is also very limited in most models (and gets smaller as you get deeper). And it still fails!

Ah yes, but I mean there are two types of temporal integration

One is the short-timed "persistence of vision" one

The other is observing a scene multiple times by slightly different angles and positions

If some weird arrangement causes an illusion, changing the position slightly usually fixes that

Also, human data is noisy, I guess augmentation strategies might want to consider that as well

So here's an idea to test that. Take an image and find one of these adversary generating pixels. Now take that same corresponding pixel and modify it in each of the other images in exactly the same way that generated the first adversary. I would not be surprised if it is not an adversary generator across all images. I would expect each of those images has a different gradient.

I've often thought that DNN vision systems would be much more robust if they worked on video streams rather than single-image snapshots, for just this reason. It would let them implement something like the "liveness tests" the new iPhone apparently applies to the facial recognition unlock stuff.

It's definitely true that segmentation is a lot easier to do for video than images, in part because you have access to more information (e.g., motion cues from an object moving independently from the camera).

So 2 pixels or 4. Lower resolution is no protection, it just means the pixels are larger.

I think the difference is that people know when the image messes with their perception. i.e. we see the optical illusion, yes, but we also know it's not real, or incorrect in some way.

A neutral net does not. That's the missing element - neural nets "need to know how to know when they don't know".

I don't think so. Many of my friends thought I was trolling them when I told them "the actual color" of "the dress" [1].

When we do know that there is an optical illusion, it is because of the context we have that the machines don't, like, we can read the title of the article ("These 11 optical illusions will blow your mind").

[1]: https://www.google.de/search?q=the+dress

Our eyes are much, much better at handling color balance "in person" than on a screen. Normally, we have the entire surrounding environment to compare colors to and make a guess.

This is what I'm saying: We have more context. There is theoretically nothing keeping machines doing the same kinds of analyses when given the same information.

Actually in this sense NNs are actually smarter than humans - NNs can guide you to its weak points, through backprop. That is indeed how adversarial examples are found. Vice versa, people are notoriously bad at finding their own weaknesses (as demonstrated by Dunning-Kruger etc).

Dunning-kruger doesn't mean the popular misconception you are thinking of

Our visual system has all kinds of "attack vectors" of this sort. Heck, video only works as a medium because it exploits one if them.

For more a much more subtle and in some ways (imo) disturbing one, look into the blindness constantly induced by our visual perception during saccades. A malicious attack on that would be... well I don't really have a word for it. Disturbing doesn't really fit. Suggestions welcome!

Peter Watts' novels have a couple of examples of this. For instance, (SPOILERS!) the scramblers in Blindsight can render themselves nigh-invisible by moving only during human observers' saccades. (I'm not convinced that this would actually work but it made for a cool plot device.)

The human visual system is susceptible to misperception, sure. This has been known for awhile. I also would agree that sometimes single pixels could probably confuse people substantially.

However, if you look at the images in the article, I think it's telling about how much more sophisticated human perception is. The authors even have a note in the first image along the lines of "look carefully, because you might have trouble locating the aberrant pixel."

I think these types of attacks speak volumes about the fragility of DL optimizations: I think there's more overfitting going on than people acknowledge or realize. My sense from reading in working with some ML things is that this extends to natural language data as well.

DL systems are often highly optimized to a particular test set, and might do well on cross-validation to other exemplars of that test set, but that's not the same as generalizing across different types of test sets.

Maybe I'm wrong, though--I think DL NN models have been fantastic, but there's a certain amount of hype in ML in general.

My two cents: I'd think that it probably would be, if you could freeze its physical state and repeatedly apply it to single static images and measure it's response and do gradient descend on that.

But in reality the brain is stateful and noisy and works on a stream of images. Even if now and then, by sheer accident, for a short time, your brain might be mislead, it surely wouldn't go like "well I was pretty sure that was an X, but just now Y is scoring higher than X, so I'll just flip my opinion on that in an instant".

It’s worth noting that 1 pixel on a 32x32 pixel image is a much higher proportion than it may appear at first too. That makes the analogy to optical illusions much more reasonable.

I'm not sure if that was a real optical illusion rather than a massive amount of miscommunication. I.e. some people taking about the colours of the actual pixels in the image and some people taking about the likely colour of the dress given the cameras respond to the living conditions. There different of course but everyone just said "the colour of the dress is..." resulting in pointless arguments about different things.

I don't think most people who argued about that even realized those two are different things - and indeed the inability to make that distinction was one of the causes of the controversy.

It's an easy to reproduce optical illusion, using computers instead of real world lighting. Xkcd did a nice illustration.

I don't think it is immune. Like with neural nets, human sensory processing is dependent on previous input data to generate generalized models of sensory input to prioritize and cluster data so that we can understand how things are similar. In psychology, this phenomena goes by different names depending on the circumstances; sensory priming and sensory habituation are two fairly well documented examples of sense/memory integration.

Obviously, humans aren't going to have to have this amount of error from small, small changes like these (except for maybe in very autistic people who have dramatic reactions to changes in their environment, but I may also be wrong; I'm not autistic, and I'm not a psychologist), but our innate perceptual biases do play heavily into how stimuli are processed and reacted to.

People do generate optical illusions of the time the red green gold dress is a modern example.

Have you read snow crash?


> This first example of the Berryman Logical Image Technique (hence the usual acronym BLIT) evolved from AI work at the Cambridge IV supercomputer facility, now discontinued. V.Berryman and C.M.Turner [3] hypothesized that pattern-recognition programs of sufficient complexity might be vulnerable to "Gödelian shock input" in the form of data incompatible with internal representation. Berryman went further and suggested that the existence of such a potential input was a logical necessity ...

TL;DR https://en.wikipedia.org/wiki/BLIT_%28short_story%29


Lots of people make the equally simplistic argument the other way - that because the human visual system can sometimes be fooled, then this is not an issue.

As other people have mentioned, human vision is quite robust. But there are other ways to hack the human psyche... For instance, the abuse of selective attention:


The underline assumption is that the human visual system can be regarded as continuous – small changes in input cause small changes in output.

But we know neural nets are not continuous: a tiny change in input could cause a huge change in output as suggested here.

There's an FAQ about these things: http://ansible.uk/writing/c-b-faq.html

But what does it take to fool the human visual system? I'm guessing it's generally more than a few pixels.

If by knowing the structure of a NN a "trick" image can be specially crafted to fool it, what is the equivalent for a human brain?

If we get to the point of having a human connectome to analyze-- or the kind of access to neural topology that a neural lace would provide-- could an optimizer generate an image of static that human would mistake for the president of the United States?

It seems outwardly implausible that such an image could exist, but perhaps that is only because we've never seen one (or if we had, would we know?), and a "blind" search of images would never find it, as the space of images is galactically huge. With a "map" of the brain it might suddenly become plausible.

And if so, that world sounds absolutely terrifying to me.

We have optical illusions, which can make you answer a lot of questions wrong like

Is this a face?

Are these lines straight?

Is this stationary?

What colour is this?

These even work across vast numbers of people.

With full knowledge of a person's brain and the connections, we would surely at least be able to enormously improve on this.

What other things might be possible? Could we make people move in a certain way, do or say certain things? There's no fundamental reason we'd be limited to affecting the visual processing sections of the brain.

And now I need to read snow crash again.

They also tend to implicitly make viewers aware that their sensory apparatus is doing something strange, because people aren't structured to only answer those single questions.

Not all of them. Some illusions are structured like relatively "normal" images, so unless you explicitly draw attention to the issue, people will just accept what they see and move on. See e.g. tower on a chessboard, or blue/pink dress thing.

I see the tower on a chessboard as more of an illustration than an illusion, the point being that we don't really observe absolute colour because it's not terribly useful for making sense of space (which is what sight is for).

The dress is sort of a variation on the same theme, except it divides people by how they extrapolate the rest of the scene.

Plenty of illusions don't seem strange.

Ex: https://en.m.wikipedia.org/wiki/Ebbinghaus_illusion

> What other things might be possible? Could we make people move in a certain way, do or say certain things?

At least to some extent it's possible, with hypnotism.

Does anyone have conclusive evidence that it works? My perception is that it's between lies and pseudoscience.

Hypnosis is essentially placebo. I can elaborate on this if you're interested

I am, please do.

>> If we get to the point of having a human connectome to analyze-- or the kind of access to neural topology that a neural lace would provide-- could an optimizer generate an image of static that human would mistake for the president of the United States?

Probably not, because our brain doesn't seem to work like Artificial Neural Networks do. Most notably, we learn to identify novel objects after only seeing a single example of them while ANNs may require many thousands of examples. We don't seem to learn to identify individual pixels, either (though what exactly our brain does when we learn to identify objects from their images is anyone's guess).

Digital images are also not a very good analogy for what human eyes see: our vision doesn't have "pixels" and we don't even need images to be particularly clear to identify them with good accuracy (we can still tell what things are up to a point, even in the dark, when it rains, when our visual field is occluded etc).

Generally, you can't expect the human brain to work like an ANN. Like others have said before [1], the "neural network" analogy is not a very good one. It often serves only to create confusion about the capabilities of ANNs and the human brain.


[1] https://spectrum.ieee.org/automaton/robotics/artificial-inte...

  Yann LeCun: My least favorite description is, “It works just like the brain.” 
  I don’t like people saying this because, while Deep Learning gets an inspiration 
  from biology, it’s very, very far from what the brain actually does. And describing 
  it like the brain gives a bit of the aura of magic to it, which is dangerous. It 
  leads to hype; people claim things that are not true. AI has gone through a number 
  of AI winters because people claimed things they couldn’t deliver.

I think the differences between ANNs and BNNs are exaggerated. I think they probably work on similar principles. Even if there some differences.

But none of that is particularly relevant. Even if they are completely different, so what? The same procedure could still work. Take a biological brain and backprop through it to find exactly what inputs change the outputs by some small degree and tweak it bit by bit until you change the output. You can apply this to any function, it's general.

Adversarial examples are exceedingly rare in natural data. They require tweaking exactly the right pixels in exactly the right direction. It's something stupid like a one in a billion billion chance of such a malicious example occurring by chance if you just randomly add noise to images. It requires a very precise optimization procedure.

So if adversarial examples did exist for human vision, we probably wouldn't know it yet. They don't occur in nature. So there's no reason for the brain to have evolved defenses against them. (Though camouflage is an interesting natural analogy, it's not quite the same.)

>> Take a biological brain and backprop through it to find exactly what inputs change the outputs by some small degree and tweak it bit by bit until you change the output.

How exactly do you "backprop through" a (biological) brain?

Also, I don't see why you'd ever want to do that "tweak it bit by bit" thing to a brain. Human brains seem to catch on to ideas pretty quickly. They don't need to go back and forth on their synapses a million times until they learn to react to a stimulus. Whatever the brain does is light years ahead of backprop, which is, all things considered, a pretty poor algorithm. So why would you ever want to do that "backprop on a brain" thing, if you could do- you know, what brains do normally?

If you have a perfect simulation of it, you can just run it step by step and create a computation graph and go backwards through it.

The algorithm used to do the optimization doesn't really matter. Use a GA or hillclimbing if you want.

EDIT since you added more to your comment:

>why you'd ever want to do that "tweak it bit by bit" thing to a brain. Human brains seem to catch on to ideas pretty quickly.

So what? That's the process for creating an adversarial image. Does it matter how many steps it takes to create it?

>They don't need to go back and forth on their synapses a million times until they learn to react to a stimulus.

That's exactly how learning works in humans. Try to learn to juggle with just one or two tries. It takes thousands. And that's after you've spent years in your body learning how to coordinate your muscles and locate objects with your eyes and how physics works, etc.

>Whatever the brain does is light years ahead of backprop, which is, all things considered, a pretty poor algorithm.

I really really doubt that. There are a number of theories about how the brain might implement a variation backpropagation for learning. Hinton has one.

Backpropagation is not a poor algorithm, it's probably close to optimal. No one has been able to come up with something better besides just little heuristic tweaks. It's very difficult to see how you could do so. It's so simple and elegant and general.

Ah, so all you need is a perfect simulation of a brain?

Ideally. I'm of course talking about whether it's hypothetically possible to do this. Once we understand the brain better, it may be possible to create a reasonably accurate computer model and do this for real.

Yes. Convolutional networks is probably not how the brain does it. I guess instead it takes a brute force approach, where more parameters need to be trained.

Further, humans also use non-visual information when interpreting sights.

Does your brain confuse this squiggly line for a dog? http://www.pablopicasso.net/dog/

It doesn't look anything like a dog!

That's a good question; I can look at that and say it's not a dog, but it is a deliberate representation of a dog. It doesn't confuse me into thinking it is a dog. Do the NNs differentiate between an actual object and a representation of an object? Given that their input is (I understand) just a 2D image, I could imagine the question itself makes no sense in the context of a NN. To an NN, a picture of a dog is a dog.

In the same vein: is this a pipe?


I think all the other fail-safes human brains have would be a pretty good protection against OP's scenario. When I'm not seeing clear I automatically take a second look, try a different angle, realize that something isn't quite right and evaluate the whole situation accordingly. All things that today's neural networks don't and can't do.

Individual NNs themselves, true, but a comprehensive machine vision package such as one in an autonomous vehicle definitely must - and does - have that sort of situational awareness and sensor fusion from multiple input sources.

Exactly, and those systems aren't vulnerable to such simple attacks, at least as far as we know. The human visual system, and everything connected to it, is even more complex and sophisticated, so I think it's fair to say that it would be even harder to trick it like that. On the other hand, maybe this complexity and sheer amount of processing makes us susceptible to another kind of trick (like advanced optical illusions)

Nope you could easily train a CNN to distinguish photos of real dogs from drawings of dogs.

How about a real dog and a photo of a real dog?

Having been taught the difference between a photo of a dog and a drawing of a dog, would it then be able to differentiate between a photo of any object and a drawing of that object, or do we need to teach it the difference again for every different object there is?

If I teach it to identify a simple two colour single-line drawing of a dog, like that Picasso picture, will it then be able to handle surrealist drawings of dogs, and impressionist, and cubist, and a picture of a sculpture, and watercolours and charcoal and all the other varieties of form and style, or do I need to teach it separately for everything? Don't forget slide puzzles! I can tell this is a dog - https://lh3.googleusercontent.com/oAtmNcl25MPQOZ5Occ_fr7_BKr...

These of course are hypothetical questions; I suspect the answer is that there is going to be an awful lot of teaching, with a few pleasant surprises when it gets one style of artwork from having seen enough of other styles.

Except the first one; to the NN, a real dog is a picture of a dog - with no concept of real object behind the picture, the NN's universe is pictures and it will only ever be a simple machine for identifying things in a very very narrow universe.

Nothing I'm saying here is news to anyone, of course, but sometimes it seems like these NNs are portrayed as general identifiers, when they're actually very narrow.

Even actual dogs wouldn't recognize that image as a picture of a dog. NNs have approached at least animal levels of intelligence which is amazing.

This optical illusion [1] is the closest thing I've come across. It relies on the fact that there are apparently separate parts of the visual processing in the brain for fine details and for more coarse features. This illusion consists of a picture of the fine details of Albert Einsteins face, so when you are close enough to see the details, you see Albert. But there is also a blurry image of Marilyn Monroe, which isn't really possible to make out until you view the image from farther away, or squint your eyes, so that the finer details are lost.

[1] http://www.123opticalillusions.com/pages/albert-einstein-mar...

There are different processing stages, actually not unlike the layers in a DNN. Edge detection starts already in the retina.

The human brain is incredibly more complex than a "simple" (artificial) neural network, so there might very well be no equivalent for a real brain... but I think optical illusions would be a good approximation, even if less severe: tricking the brain in elaborate ways to see something that is different from what it would usually see.

I'd look at the superset containing both optical illusions and the the various funny brain exploits lkie rerarnagnig ltetres in wrods, or using more than one "the" in a sentence without reader noticing it. I don't know what's the the name of that superset, but I don't think we should limit study of our visual processing stack to object recognition - the brain does so much more!

The brain has some inherent biases and prejudices - these arise from just how we grow up and perceive objects. So optical illusions are a deviation from whatever visuals we are used to, and you can say similar events occur with stuff like tasting an unknown object, or finding a different smell.

As an avid reader of Neal Stephenson, I was surprised not to find the phrase "Magic Eye" in this thread yet.

There are many known ways to create optic illusions that trick humans this quickly and thoroughly. Efficiency is the real question if you are concerned about deception and human brains.

Actually, come to think of it, every image is created to trick the brain's image processing. When viewing real scenery the eyes get hit by photons of all sorts of colors, but since there are only three different color receptors (with overlap) we can trick the brain into thinking an image with only red, green, and blue in it is the same. But for example some birds that can see ultra violet will probably not recognize an image of a flower since it lacks that spectrum.

Holding it upside down would already be sufficient.

I'd expect redundancies in analysis in the brain's NN make this impossible.

The underlying problem seems to be that deep neural network classifiers tend to place their classification boundary surfaces very close to data points in at least one dimension in a high-dimensional space. That makes them brittle - perturb the data very slightly in the wrong direction and they move through a boundary into some other classification.

I don't know enough about the subject to know why training does that, or what can be done about it.

> deep neural network classifiers tend to place their classification boundary surfaces very close to data points in at least one dimension in a high-dimensional space

Doesn't this imply the Jacobian of the network is ill-conditioned near the adversarials? If so, it seems like we could test this by imposing regularization on ratio of min and max singular values of the neural network's Jacobian, and examine what effect, if any, on adversarial examples.

What's concerning is that the network used dropout, which I thought was aimed and making networks less brittle (i.e. reduce overfitting).

Go watch Yatin Gal's talk on dropout in neural networks. He shows pretty convincingly that the belief that dropout reduces network overfitting by introducing noise is wrong.

Wait, that can’t be wrong because that is literally what DO does. It is a convex hull regularizer around the network activations using noise. That is also why dropout does not solve susceptibility to adversarial examples: It merely extends the regions that the NN generalizes to outward; but that is limited because high-dimensional spaces are counter-intuitively large and the noise required to cover a descent fraction of the “unmapped” space would completely prevent learning. AFAIK, Yarin Gal merely provides a Bayesian interpretation of the noise.

IIRC, his "Bayesian interpretation of the noise" actually shows that dropout performs approximate integration over model parameters. As he says, dropout doesn't work because of the noise but despite the noise.


That seems like a strange/unnecessary way to put it because DO is noise.

I thought the point was that dropout effectively learns an ensemble of all... erm... subtopologies (if that's the right word) of your network?

You can also call it just subgraph; not all of them, but exponentially many.

It would be interesting to see what happens when these adversarial images are used while data augmentation in training

Figure 8. Without a label, I wouldn't have been able to independently tell anyone that this was a picture of a dog.

For what it's worth, images used in this paper are 32x32 pixels.

I can't even see a dog when I know there should be one...

Looks more like one of these dump trucks to me http://3.bp.blogspot.com/__EzFEHn2YBI/SbEdsBdbL7I/AAAAAAAAAR... (why are these always yellow?)

Or half a torso of a dog... with a dump truck in the background..

> (why are these always yellow?

The yellow is the manufacturer's (Caterpillar) trademark color--and it probably helps that it's very high visibility.

That would be my guess, but I see lots of other types of vehicles that are normally yellow in other colors. But this specific kind I don't recall ever seeing in another color.

Even here [0] most are yellow despite having a different manufacturer. Maybe the difference is enough to differentiate them though.

[0] https://en.wikipedia.org/wiki/Haul_truck#Ultra_class

So, I might be wrong, but my understanding of these attacks is that they require you know what the model of the classifier is.

If my understanding is correct, I guess my question is: how general are these attacks? Can we ever say "oh yah, don't try to classify this penguin with your off-the-shelf model, it'll come-up iguana 9/10"

See https://arxiv.org/abs/1610.08401 for model agnostic examples of adversarial attacks.

This is a great paper.. the perturbation map is very reminiscent of psychedelic imagery

Psychedelics come up a lot with nn created images, but this is interesting that under the influence of a perturbation suddenly the classifier starts presenting illogical assumptions

Also similar to how people recollect their own classifiers unfolding or showing bias when considered under the influence of psychedelics

Perhaps there is some deeper analogue with a substance influenced brain's neuronal activity

This is great stuff! I've noticed that the inter-model universality is OK (a little over 50% for one of the universal pertubations), but that's still pretty good!

This is much more interesting, thanks!

You don't necessarily need to know the whole model, but you need accurate enough prediction score outputs for each category. I think if you have that, that's enough to numerically approximate the gradient of the output scores and perform gradient descent.

Someone just published a new attack at ICLR that does not even need confidence scores but works solely on the final decision output of the network and works as well as the best known attacks (that so far required gradients): https://openreview.net/forum?id=SyZI0GWCZ&noteId=SyZI0GWCZ

The algorithm mentioned in the paper doesn't need any information about the internal structure of the image classifier.

Is getting to know the model of the classifier so difficult? I mean, for an API, is it not possible to deduce the internal structure of the system based on the Input-output combination?? Just asking.

If by internal structure gp means the number or structure of the layers, then no, it’s not feasible to deduce this.

If they write regular blog posts or publish the architecture in a journal however, and the api gives you probabilities, you could maybe recreate it with a process similar to distillation?

There are different kinds of attacks. This one, and related ones that minutely change the input, require you to know some details about the model. There are other kinds of attacks, where you don't need to know what the specific model is. For example, http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.ht...

The fundamental problem is that there are only so many training examples and layers that you can fit in a gpu's memory. The current state of the art neural networks work well for very specific tasks (ie handwriting recognition), but don't generalize.

Neural networks do generalise well, and their vulnerability to crafted attacks is unrelated to generalisation anyway.

Additionally the size of the network isn’t the same thing as GPU memory and you don’t fit training examples in memory anyway (?!).

Interesting, can this be prevented by simply random blurring images first?

The NN should still be able to recognise something partially blurred if trained that way.

I don't want to totally brush off the security implications, but I think focusing on ways to mitigate this exact "attack" is almost missing the point.

You could trivially thwart this attack by rate-limiting--it needs many passes through the network to evolve an image--or by caching the classifier's output and returning it for all similar (e.g., by hamming distance) images.

Instead, I think this work is interesting because it shows the limits of the network's generalization abilities.

or perhaps adding random pixel noise to the image before attempting training/recognition to prevent "smoothness" in an area from being a recognized property

Alternatively eliminate the noise by low-pass filtering in freq domain. Even a small LED light will bleed to more than one pixel in a real photo.

From a workflow point of view, this is a datasets security / integrity problem, isn't it? Public and open source datasets should come with some sort of sanity check then. A best practice protocol for pre-processing private datasets / unvetted sources should also be made public and disseminated.

Related article: "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" (https://arxiv.org/abs/1708.06733)

Related: Slight Street Sign Modifications Can Fool Machine Learning Algorithms https://news.ycombinator.com/item?id=14935120

One way to beat back these types of attack is to have the software do a type of image filtering that averages out the adjacent pixel in a indeterminate(random) way before feeding into the NN for inference

Not really. The single-pixel attacks are impressive because they just modify individual pixels but the concept of adversarial image generation is a broad and increasingly well studied area within neural networks. There are infinitely many ways to structure adversarial images not just single pixel defects.

Wouldn’t it be sensible to median filter the input images to remove these types of single pixel outliers?

If you're on a phone, here's a responsive HTML version of the paper: https://www.arxiv-vanity.com/papers/1710.08864/

Beware that figure numbers in the HTML version are incorrect. :(

s/If you're on a phone,//

Reading PDFs on screen sucks.

Woah, thank you!

This makes me feel like neural networks are nothing but glorified hash functions. Well, if you think about it, here we are just optimizing for hash collisions of similar things.

The whole point of classification algorithms, including NNs, is to map similar inputs to similar outputs, or indeed the same output, and different images to different outputs. Hash functions usually attempt to erase all information of "similarity" between inputs. However, the metric that determines what "similar" means in a NN is not necessarily what we expect it to be.

But of course NNs are definitely just functions, in the strict mathematical sense. You could replace one with a large lookup table. The interesting part is the training: how to come up with the function in the first place.

Maybe Marvin Minsky was right about neural nets (he’d surely be shaking his head right now, maybe dropping an “I told you so” or two...)

It is kinda obvious that you should make multiple passes at recognition with jitter and dithering.

The field of Music Theory is algorithms & patterns for music. Computer music provides the opportunity for rapid iteration. If that direction were followed intensely, music theory would become a sub-topic within Psychology.

Thank you, an academic article without paywall

Does this publication bring any value as it was known well before that such thing is possible? Or no-one had measured how many pictures can be "transformed" by changing only 1 pixel?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact