
How Adversarial Attacks Work - lainon
http://blog.ycombinator.com/how-adversarial-attacks-work/
======
ChuckMcM
This weakness is one that I think will plague self driving cars as sign
recognition will be key and without some ability to insure that they cannot be
dangerously fooled, it will be hard to get them certified. The canonical
example is to make a no left turn sign recognize as a no right turn sign and
have the car go the wrong way on a one way street.

Clearly there is a marketing opportunity for t-shirts that make you recognize
as other things. Who doesn't want to show up in an image search for toasters
on Google Images ? :-)

But my current best guess on how this issue will be addressed will be with
classifier diversity an voting systems. While that just moves the problem into
a harder and harder to synthesize data set (something that not only is
adversarial to one classifier but gives the same answer on several), I believe
it will get us to the point where we can trust the level of work to defeat
them is sufficiently hard to make it a non-threat.

~~~
azernik
Or, simply with better machine learning.

> The border between “truth” and “false” is almost linear. The first cool
> thing we can derive from it is that when you follow the gradient, once you
> find the area where the predicted class changes, you can be fairly confident
> that the attack is successful. On the other hand, it tells us that _the
> structure the decision function is far simpler that most researchers thought
> it to be_.

Humans can be fooled by optical illusions as well; but those illusions are
much more limited and much more noticeable than most of these. My (very non-
specialist) interpretation of the italicize clause is that a) vulnerability to
these attacks is a continuum, not a binary, and b) the current ease of these
attacks reflects the crudity of our current ML techniques.

~~~
ben_w
Habe you seen the Monroe-Einstein illusion? There is a continuous transition
from one image to the other, depending on the angular size. This feels
relevant to your point.

[https://static.independent.co.uk/s3fs-
public/styles/article_...](https://static.independent.co.uk/s3fs-
public/styles/article_small/public/thumbnails/image/2015/04/03/14/einstein.jpg)

~~~
tambourine_man
I wouldn't call that an illusion, but more a consequence of a high-pass/low-
pass filter at different distances with limited resolution.

What I find much more disturbing is that even though I measured and _known_
those lines are parallel or that those shades of gray are the same, I can't
"unsee" the illusion.

------
sandGorgon
Can such methods be used to "fingerprint" proprietary datasets by tainting
them ? For example, i want to make sure that my dataset is not stolen and used
by someone else (Waymo?). So I taint it using an adversarial method and create
a "canary test set" that will uniquely identify if my dataset has been used in
some training.

~~~
alexcnwy
I believe google maps has fake streets as a canary for their maps being
scraped...

~~~
cscheid
yup, in cartography these are known as trap streets

------
scrooched_moose
Is there any reason to think this would work at all in the real world? All of
these "attacks" require complete control of the image being fed to the
classifier.

In the ATM example you don't directly load an image of the check to the
computer inside the machine. You design a check in photoshop, add the noise,
print it out, feed it to the machine, which takes a picture of the check.
Mobile bank apps still require you to take a picture of the check so you don't
have enough control there either.

Similarly in the road sign example; the lighting, angle between car and sign,
dirt on the sign, etc all mean the car sees a much different image than you
designed.

I'd think all of these steps mean the classifier gets a dramatically different
image than you intend and the attack fails. There's maybe a vanishingly small
probability it works when the stars align, but that could be easily mitigated
by taking multiple consecutive images and looking for an odd results.

~~~
anishathalye
Goodfellow et al. demonstrate that adversarial misclassification works in the
physical world:
[https://arxiv.org/abs/1607.02533](https://arxiv.org/abs/1607.02533) \-- a
printed image is consistently misclassified (demo here:
[https://www.youtube.com/watch?v=zQ_uMenoBCk](https://www.youtube.com/watch?v=zQ_uMenoBCk))

Recent work by me and some friends demonstrates that physical-world
adversarial examples can actually be made quite robust, and you can synthesize
3D adversarial objects as well, and make them consistently classify as a
desired target class: [http://www.labsix.org/physical-objects-that-fool-
neural-nets...](http://www.labsix.org/physical-objects-that-fool-neural-nets/)

~~~
scrooched_moose
Thanks for the links and great work! I hadn't seen any research on it making
the jump to the real world yet.

------
blauditore
It seems like all adversarial examples I've seen so far are enabled by
networks overfitting to small-scale features. By overfitting I mean that they
give more importance to details compared to humans, who recognize much more
the image as a whole.

Maybe the problem could be mitigated by penalizing "direct" responses to
small-scale features while training, which is not a trivial thing to do
though. One approach I could think of is training with multiple altered
versions of the image, e.g. various amounts of blur, noise and mean/median
filters applied. Or the other way around: To be more confident about a result,
scale down the image to a fraction of it's size, run detection on that and
compare results.

Are such techniques in use, or being researched on? I'm not too much in the
loop about those topics.

~~~
rahimnathwani
"training with multiple altered versions of the image, e.g. various amounts of
blur, noise and mean/median filters applied"

"Are such techniques in use"

Yes, it's called 'data augmentation': [https://medium.com/towards-data-
science/deep-learning-3-more...](https://medium.com/towards-data-science/deep-
learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d)

------
joe_the_user
"Recent studies by Google Brain have shown that any machine learning
classifier can be tricked to give incorrect predictions"

\-- That has to be an overly broad statement (edit: it might true if you say
"neural nets" or something specific instead). I would assume they mean any
standard deep learning system and maybe any system that is more or less
"generalized regression" but that couldn't be "any machine learning system", I
mean one could imagine a deterministic model that couldn't be "tricked".

Plus a link would be good (it's not around the quote, I don't know if it's
elsewhere in the article).

~~~
alexbeloi
>I mean one could imagine a deterministic model that couldn't be "tricked".

The models in reference are deterministic, most models for
classification/regression are deterministic.

Adversarial attacks are showing that the models are chaotic (in the dynamical
systems sense), they're very sensitive to their inputs.

edit: Not all models have this issue, it's been shown that all the typical
image based convolutional neural networks are susceptible to this issue. My
guess is that it's a more general problem of high dimensional inputs.

~~~
xapata
We've known for decades that NNs are prone to overfit. This is just another
example of it. With great variance comes great ... uh ... susceptibility to
unusual future inputs?

------
xstartup
Well, this is getting more attention because it's important for the "ad
network moderation". If Google/FB fail at moderation using these methods,
they'll have to HIRE lots of humans to do it for them, which often involves
contracting it to outsources like Accenture. This will put downward pressure
on their billion dollar revenue. Humans are expensive but vastly effective at
ad moderation.

------
trott
Very recently, it was found that changing a single, carefully placed, pixel is
enough to confuse NN classifiers.
[https://arxiv.org/abs/1710.08864](https://arxiv.org/abs/1710.08864)

~~~
dontreact
For 32 by 32 images

~~~
shpx
1/(32*32) = 0.1%

------
acallahan
Obscurity seems like useful security here. IIUC it shouldn't be possible to
e.g. trick self-driving cars with noisy signs, unless you have a copy of the
classifier to train against. Thinking about ATMs, you could train against it
as a black box, repeatedly inserting different patterns of noise? But it seems
probably infeasible if you need to do a lot of iterations.

It also suggests that people concerned about adversarial attacks shouldn't use
off-the-shelf pretrained classifiers, where attacks can be trained offline in
advance. Similar to hashing algorithms and rainbow tables, maybe a practice of
"salting" an off-the-shelf classifier could be effective in dodging attacks.

~~~
3pt14159
True. Also, just having unconnected systems that use different types of
features / heuristics should be enough to at least pull the car over when they
wildly disagree over what to do.

------
neptvn
> Apart from the fact that nobody wants to risk having false positives,
> there’s a simple argument as old as machine learning itself: whatever a
> human can do, a machine can be taught to do. Humans have no problem
> correctly interpreting adversarial examples, so there must be a way to do it
> automatically.

So by all means, the authors (and a vast majority of researchers) seem to be
confident that ML/DL is the road to AGI, hence can "solve" human intelligence
(given it is computational)? For how long are we gonna drag the adage that
mimicking a human (Turing test) is equal to reaching human levels of
intelligence?

~~~
moxious
It is not true that whatever a human can do, a machine can be taught to do.
The human must have insight into HOW they do it in order to teach it, or
otherwise come up with some new algorithm. There are a large class of things
humans do that they don't understand the mechanics behind, and for which there
also aren't algorithms.

I'm not talking empathy or philosophy. How about just folding laundry. Not
just one type, not in a controlled environment, but folding any laundry
anywhere.

~~~
IanCal
> It is not true that whatever a human can do, a machine can be taught to do.
> The human must have insight into HOW they do it in order to teach it, or
> otherwise come up with some new algorithm.

Why does this mean a machine _cannot_ be taught to do it?

> There are a large class of things humans do that they don't understand the
> mechanics behind,

Sure.

> and for which there also aren't algorithms.

Well, we manage to encode it in our brains.

------
aidenn0
ELI5: Why are adversarial attacks not preventable by adding unpredictable
noise to untrusted inputs?

~~~
Houshalter
The attacks are resistant to noise, or at least can made to be so. If every
single input is tweaked in exactly the right direction, noise won't undo that.
Most inputs will still be pointing in the adversarial direction. The noise
will move some inputs back to their original position, but others will be
pushed even further into adversarial territory.

~~~
aidenn0
Thanks, that makes a lot of sense.

------
d--b
Research is needed in that area, but surely adding some noise / blurring to
sample during training and during recognition should help reduce the
feasibility of the attack. Isn't that the case?

Adversarial attacks seem to point to overtraining in some sense.

~~~
shmel
Unfortunately, it is not that simple. There are multiple ways to generate
adversarial examples and simple defenses help at most against the simplest
attacks. Every now and again a new preprint "one simple trick defeating the
latest adversarial defense technique" appears on arxiv.

I believe they are linked to the nature of DL (and ML in general) models. We
try to capture very tiny manifold of natural images in the space of all
possible images. We found a technique that does it well (CNNs). But by the
very definition of its training we train it to output some values in a finite
number of points. In the same time CNN's output outside of those points are
mainly defined by the model's smoothness. We can expect that in a neighborhood
of a given image the output will be roughly the same (it allows them to
generalize). Far from any point of training set the model can say anything.
Adversarial examples basically hint that this smoothness work well only along
natural images manifold, once we step outside it is much more chaotic. Or,
equivalently, the neighborhood where the CNN gives roughly the same output is
a very thin "slice" that closely follows natural images manifold. Why is that?
Probably there are some non-trivial topological reasons. But it is exactly
what nobody understands now.

------
amelius
Am I the only one who finds the term "Adversarial Attack" a bit funny? See
[1].

[1]
[https://en.wikipedia.org/wiki/Pleonasm](https://en.wikipedia.org/wiki/Pleonasm)

------
barrkel
ISTM that many of the classifiers get a lot of their signal from detecting
textures, and that the adversarial noise works by superimposing a different
texture on the image.

------
xstartup
There is a huge industry for folks who are finding ways to bypass
Facebook's/Google's ad moderation. They pay you if you craft ads as per their
liking and it approved.

------
naveen99
Don't you need access to the classifier internals to train the adversarial
network ? Nobody is going to publish the network weights for a check reading
machine...

~~~
Winterflow3r
No, this paper by Papernot et al shows how to do blackbox attacks without
knowledge of model internals.
[https://arxiv.org/abs/1602.02697](https://arxiv.org/abs/1602.02697)

~~~
naveen99
Thanks. But they still need to be able to use the black box a brute force
number of times on shady inputs and get detailed outputs during their gradient
descent. Not going to be allowed on a check reader or anything sensitive.

~~~
haraldurt
Not necessarily. Adversarial examples have been shown to, for instance, be
transferable across different networks with different hyperparameters (e.g.,
number of layers) trained on disjoint subsets of a training set [0, section
4.2]. There are more references from the paper linked by the OP.

[0] [https://arxiv.org/abs/1312.6199](https://arxiv.org/abs/1312.6199)

~~~
naveen99
Thanks. I wonder if adversarial training helps prevent overfitting too. Could
you use adversarial training to beat alphago ?

~~~
yorwba
You could not, because AlphaGo is not a classifier (so it isn't well-defined
what an adversarial example is) and the input space is discrete (Go board
state) and you can't do ε-small perturbations (two different states differ by
at least one stone).

------
samat
Do this attacks still work if you don’t control input image bit-by-bit, but
rather feed it via digital camera? While using any digits camera for input in
the real world there will be some blurrines and distortion. Is it possible to
construct such a noise that 1) will survive camera input 2) will be
undetectable with human eyes 3) will fool a ML algorithm?

I do believe you are not able to control input image bit-by-bit in many real-
world scenarios.

~~~
taejo
As has been posted elsewhere in this thread, you can 3D print objects that are
consistently misclassified when viewed through a camera from any angle:
[http://www.labsix.org/physical-objects-that-fool-neural-
nets...](http://www.labsix.org/physical-objects-that-fool-neural-nets/)

------
eroccatlun
Does the adversarial attack require access to the model making the
predictions?

It seems like the attack relies on `doping` the input with features present in
a different target, or by masking features of the existing target.

Could you so specifically attack a target without knowledge of its features?

------
dontreact
Was there a follow up to this paper? NO Need to Worry about Adversarial
Examples in Object Detection in Autonomous Vehicles
[https://arxiv.org/abs/1707.03501](https://arxiv.org/abs/1707.03501)

------
bartkappenburg
I tried the example from the blog post (Stallone <-> Reeves) in Rekognition
(AWS service) and it came out correct:

[https://imgur.com/a/C0s05](https://imgur.com/a/C0s05)

------
dontreact
I don’t buy any of the attacks listed here or see how the examples being
imperceptible is actually a factor.

If you have the ability to modify the check why not make it actually look like
it’s for a 1000000 dollars (ie even to a human).

If you are going to go out and replace speed limit signs to fool self driving
cars, it’s probably equally dangerous whether or not the change is obvious,
because if it’s way out of bounds a car won’t do it but if it isn’t then it
would also fool humans.

Anyone have some more realistic attacks that are specifically possible due to
the imperceptible nature of the changes in the input? Most of the harm I’ve
heard in examples simply comes from the fact that if you are controlling the
input to an ml system, out can get it to do whatever you want even without
using any of these techniques: just actually change the class of the input.

~~~
maltalex
> If you have the ability to modify the check why not make it actually look
> like it’s for a 1000000 dollars (ie even to a human).

Humans are harder to fool. But some bank apps allow you to deposit a check by
photographing it. Such apps would be fairly easy to attack.

If you limit yourself to making 100$ checks become 1000$ checks and not
1,000,000$ checks, you might even get away with it.

~~~
c22
A few years back a friend of mine was depositing a $300.00 check at Chase.
When the teller asked him how she could help he said "Oh, just depositing this
three thousand dollars." She punched in the deposit for $3000.00 and gave him
a receipt. It was corrected within 24 hours but the receipt and printout of
his bank statement was a great conversation piece. I doubt they would have let
him just walk away with the money, nevermind multiple times.

------
pdeburen
While it's true that these attacks work well on state-of-the-art models, there
are defence strategies such as including adversial examples during training.
Advanced defence strategies such as
[https://arxiv.org/abs/1705.07204](https://arxiv.org/abs/1705.07204) are
robust to a wide array of attacks and achieve very competitive error rates.

I'm not saying it's not a problem but there are successful defence strategies
already in place for many attacks.

~~~
LolWolf
> Although our models are more vulnerable to white-box FGSM samples compared
> to the v3adv model, ensemble adversarial training significantly increases
> robustness to black-box attacks that transfer FGSM samples crafted on the
> holdout Inception v4.

So, I just train my new adversary on the "new" model that was trained on the
previous adversarial examples. And now we're back to square one.

I suspect the problem of adversarial attacks is a problem of high-dimensional
spaces, not of training on particular samples.

~~~
pdeburen
I'm not sure how the quote supports your argument. Adversial examples
generalize well accross many different classifiers.

Shallow NN's can be fooled just as well, it seems to be more of a problem of
linear models in general. Apparently Geoff Hintons Capsule Networks are more
robust due to being "less linear" (Ian Goodfellow mentioned this in a recent
talk, don't have the references now to back it up)

~~~
LolWolf
I'm not sure it's about how shallow the network is; even logistic regression
can be fooled by the same techniques (e.g. 1-layer NN). That being said, maybe
it does have something to do with linearity (I suspect not) or maybe it's just
generally harder to deal with nonlinear functions.

------
erikb
I see an SaaS opportunity here, that takes your photos, adds noise, and only
then uploads it to social networks.

------
lazugod
Adversarial attacks sound useful. How can we make it illegal to prevent them?

~~~
matt4077
I may not completely understand the question, but I'm pretty sure
"Intentionally fooling an algorithm for nefarious purposes is illegal" is both
the answer, and the status quo.

------
breakingcups
> How is this possible? No machine learning algorithm is perfect and they make
> mistakes — albeit very rarely.

Right.

------
drudru11
ATMs using neural nets - lol - nope. That is a stretch.

~~~
genericpseudo
Effective counterexample: the USPS.

[http://www.cedar.buffalo.edu/~srihari/talks/Telcordia.pdf](http://www.cedar.buffalo.edu/~srihari/talks/Telcordia.pdf)

------
conorcleary
The _only_ solution is coming up with a globally-accepted open source base set
of instruction for any and all AI or AI-capable code.

------
yters
Ah, this is why AlphaGo won't release their source code or models.

If all ML algorithms can be fooled so trivially, this shows the human mind is
not an ML algorithm.

~~~
munificent
I think your logic is:

1\. All ML algorithms can be fooled trivially. 2\. The human mind cannot be
fooled trivially. 3\. Therefore, the human mind is not an ML algorithm.

But claim number 2 is clearly wrong. Human minds are trivially fooled. Here's
one:

[http://www.jimonlight.com/wp-
content/uploads/2012/02/Paralle...](http://www.jimonlight.com/wp-
content/uploads/2012/02/Parallel-Lines.gif)

This is exactly what an optical illusion is.

~~~
YeGoblynQueenne
The problem with optical illusions like that is that they are, in their vast
majority, made of abstract shapes. Most of them play with our perception of
distance and depth - and the majority again work on two dimensions, only.

It's really hard to imagine an optical illusion that makes you mistake objects
in the physical world for something else- say, panda for a lawn mower or a car
for a pigeon, or something like that.

Note that I don't agree that this tells us anything about whether the human
brain (or mind) is "like" a machine learning algorithm. To me this question
has about as much meaning as asking if the brain is "like" quicksort.

The physical substrates are so clearly different that the only comparison you
can make is on the level of capabilities (say, both are Turing-equivalent etc)
not that of actual structures. Like, where's the 1's and 0's in the brain?

~~~
munificent
> It's really hard to imagine an optical illusion that makes you mistake
> objects in the physical world for something else- say, panda for a lawn
> mower or a car for a pigeon, or something like that.

Here's a physical object that makes you mistake an insect for a plant:

[https://en.wikipedia.org/wiki/Phasmatodea](https://en.wikipedia.org/wiki/Phasmatodea)

~~~
romaniv
Problem is, mimicry involves copying essential properties of the target
object: color, texture, shape, movement dynamics and so on. It relies on true
ambiguity. Adversarial examples against neural networks (the interesting ones,
anyway) involve a combination of insignificant, seemingly random permutations
that only work in their totality. That's a very important difference.

~~~
munificent
> essential properties of the target object

They are only "essential" properties according to the limitations of your
particular perceptual system. To a bee that can see UV light, a stick bug may
look entirely different from an actual stick. An animal that hunts by scent
would find them clearly distinct.

There is no such thing as "true" ambiguity unless the two objects are actually
the same thing in all respects. If they are distinct but appear the same, it's
because they are overlapping in some respects but not all.

~~~
romaniv
_> They are only "essential" properties according to the limitations of your
particular perceptual system. To a bee that can see UV light, a stick bug may
look entirely different from an actual stick. An animal that hunts by scent
would find them clearly distinct._

This is a very confused statement. Fist, since we are talking about image
recognition we are - by definition - talking about vision in the spectrum that
can be captured by a digital camera and encoded in a typical image format.
Second, there definitely is such a thing as essential property. It is a matter
of correlation with reality, as well as internal consistency. For example,
plants are green and have leaves because of the way they use sunlight. So
permuting color of all leaves in the picture is fundamentally different from
permuting luminosity of some random pixels.

------
matt4077
> Lately, safety concerns about AI were revolving around ethics — today we are
> going to talk about more pressuring[sic] and real issues.

Nice how this casually demeans peoples' worries. Of course, the (demonstrated)
idea that ML algorithm would pick up on, and amplify, discrimination (among
other things) isn't "real" to these guys.

Risk-scoring of loan applications actually happens to be one of the few uses
of ML in the non-tec sector, and it's incredibly likely that some of them are
already denying people's application because they happen to be named "La
David" and not "Emil". But maybe the authors just don't consider "ethics" to
ever be a "real" problem?

But "change a single pixel and the ATM gives you $1,000,000" is, apparently, a
"real" and "pressing" problem.

------
mLuby
Am I the only one who thinks "adversarial attack" is both a redundant and
unhelpful name?

\- Redundant: Anyone who attacks you is by definition your adversary.

\- Unhelpful: According to the article, "designing an input in a specific way
to get the wrong result from the model is called an adversarial attack." That
sounds much closer to spoofing attack ("a situation in which one person or
program successfully masquerades as another by falsifying data" -Wikipedia).
For example, a turtle masquerades as a gun by spoofing the machine learning
system by changing irrelevant visual details.

~~~
tlb
Adversarial means something specific in the ML community.
[https://en.wikipedia.org/wiki/Adversarial_machine_learning](https://en.wikipedia.org/wiki/Adversarial_machine_learning)

