
Robust Adversarial Examples - eroo
https://blog.openai.com/robust-adversarial-inputs/
======
hyperion2010
My own view of this having spent some time in visual neuroscience is that if
you really want vision that is robust to these kinds of issues then you have
to build a geometric representation of the world first, and then learn/map
categories from that. Trying to jump from a matrix to a label without having
an intervening topological/geometric model of the world in between (having 2
eyes and/or the ability to move and help with this) is asking for trouble
because we think we are recapitulating biology when in fact we are doing
nothing of the sort (as these adversarial examples reveal beautifully).

~~~
clickok
We tried that; the reason deep nets are popular is that they outperform
geometric (or other problem-specific) models. This might be because they
implicitly develop such representations somewhere along the way, or because
such a representation is not really necessary for visual classification.

Additionally, introducing ancillary modules is not without cost-- you might
gain robustness to some kinds of adversarial inputs at the expense of becoming
vulnerable to others. There's plenty of ways to fool biological visual
systems: c.f. magic-eye posters, optical illusions, or the various exploits
described in Lettvin and Pitts' paper "What the Frog's Eye Tells the Frog's
Brain".

~~~
mjn
> outperform

I think that remains to be seen, at least in the general case, since we
haven't yet agreed on a measure of performance. The debate around adversarial
examples can be interpreted as arguing over the proper measure of performance.
Although so far the debate is doing so somewhat implicitly, since afaik nobody
has formalized a measure of robustness to adversarial examples; it's
progressed more by case studies (which is fine, since research into NN
robustness is still quite early stage, and case studies can help illustrate
issues). I think it can be fairly said that neural nets perform well _on the
ImageNet benchmark_ and similar measures of performance. But whether those are
good measures of performance, or whether some kind of metric that weights
robustness more heavily should be used (and what methods would perform well on
that) is the subject of current research, like this research.

------
arnioxux
There are plenty of adversarial examples for humans too:
[http://i.imgur.com/mOTHgnf.jpg](http://i.imgur.com/mOTHgnf.jpg)

~~~
csomar
Is it though? The human correctly interpreted the image. The problem is that
the image was well, not "real". Human have a limit of figuring out what is
real and not real based on experience.

------
tachyonbeam
IMO, what these adversarial examples give us is a way to boost training data.
We should augment training datasets with adversarial examples, or use
adversarial training methods. The resulting networks would only be more robust
as a result.

As for self-driving cars, this is a good argument for having multiple sensing
modalities in addition to visual, such as radar/lidar/sonar, and multiple
cameras, infrared in addition to visible light.

~~~
andbberger
But at what point do you have to wonder if we're using the wrong basis? And
how do you know that augmenting the data with tiny adversarial perturbations
won't just leave the network vulnerable in a different direction?

It's pretty obvious how to build translational symmetry into a net that's
still expressive and easy to train (convolution). But you have to spoon feed
CNNs rotational and other symmetries by augmenting the training data. What you
really want is a model that has all the symmetries your data has built in.

My sense is that the community at large seems to regard DL as a magic blackbox
which it really is not. Complete basis of function + finite data = guarantee
of wonky interpolation between samples. What you really need to do is restrict
the class of expressible functions to those you need - build your prior into
the model.

~~~
azag0
This is a huge topic in applying ML in physics and chemistry where we already
have a lot of prior detailed knwoledge about the systems we want to describe
and it would be silly not to build it into the ML models.

~~~
yakult
What's the current state of art in this direction? Is there a way to encode
equations explicitly prior to training?

~~~
azag0
People now try to use ML anywhere and everywhere so it's wild west a little.
Three examples: [1] uses a standard neural net to represent a many-body wave
function, with all the machinery of quantum mechanics on top of that, and
reinforcement learning to find the true ground state. [2] uses a handcrafted
neural net, which by construction already takes advantage of a lot of prior
knowledge, to directly predict molecular energies. [3] uses a simple kernel
ridge regression coupled with a sophisticated handcrafted scheme to
automatically construct a good basis (set of features) for a given input, to
predict molecular energies.

In all these cases, the ML itself is not the target problem, but only a tool,
and most effort goes into figuring out where exactly to use ML as a part of a
larger problem, and how to encode prior knowledge, either via feature
construction or neural net handcrafting.

[1] [http://sci-hub.io/10.1126/science.aag2302](http://sci-
hub.io/10.1126/science.aag2302)

[2] [http://sci-hub.io/10.1038/ncomms13890](http://sci-
hub.io/10.1038/ncomms13890)

[3]
[https://arxiv.org/pdf/1707.04146.pdf](https://arxiv.org/pdf/1707.04146.pdf)

------
bsder
I can paint a road to a tunnel on a mountain side and fool some amount of
people. Meep. Meep.

The problem isn't that there are adversarial inputs. The problem is that the
adversarial inputs aren't _also_ adversarial (or detectable) to the human
visual system.

------
std_throwaway
Does this effect carry over to classifiers which were trained with different
training data?

~~~
clickok
I am unsure what you mean-- do you mean with different training sets but the
same testing set?

It's an interesting question; maybe the reason for (some) of these adversarial
vulnerabilities is due to a handful of bad training examples. You could
formulate it as a search problem to see if there's particular images (or small
groups of images) that are responsible for the adversarial vulnerabilities.
This might then indicate that some of these perturbations are really just
taking advantage of the fact that neural nets tend to "memorize" some of the
data, so we're not really exploiting some deep structural feature so much as
just feeding the echo of an input that the net has learned to automatically
classify as, say, a computer/desk[0].

It would be a good project, but I don't have enough GPUs on hand to train
scores of deep nets from scratch.

Assuming one were to bite the bullet, it might also be worth trying different
data augmentation strategies. Most of the time, we try to eke out additional
performance/robustness by using the same sets of transformations (translation,
rotation, cropping, rescaling, etc.), but if the net is vulnerable to
adversarial examples because of something in the training set, then you might
just be making sure that adversarial vulnerability is present everywhere in
the image and at multiple scales.

On a related note, there's an interesting paper about _universal_ adversarial
perturbations, i.e. those that can be added to _any_ image and thereby induce
a misclassification with high probability[1]. This effect holds even across
different models, so the same perturbation can cause a misclassification in
different architectures.

\------

0\. Neural nets learn by some combination of abstraction and memorization. If,
for some reason, many members of a particular class are hard to generalize,
then it's possible that they instead learn to identify some particular aspects
of those classes (that are not usually present in other images) and have a
disproportionate response when those features are present. If such features
are not obvious to human visual inspection, then we get misclassifications
without insight into _why_ they were misclassified.

1\.
[https://arxiv.org/pdf/1610.08401.pdf](https://arxiv.org/pdf/1610.08401.pdf)

~~~
std_throwaway
> I am unsure what you mean-- do you mean with different training sets but the
> same testing set?

Yes, assuming we have 10000 different training images. Divide these into 5
sets of 2000 each and train 5 networks with them. Assuming that 2000 images
are plenty for this application, we will have 5 well trained networks that
have similar performance for a test set.

BUT

They will work slightly differently internally and those "inverse gradient
search" methods (or what they are called) might only be able to manipulate an
image for one network at the time with "specifically chosen additive noise"
while the other 4 are unimpressed.

That's assuming that the manipulation can't be targeted at all 5 classifiers
at the same time.

------
pvillano
I don't know how you guys think this is an adversarial example. I see a
picture of a desktop computer.

------
sharemywin
To me it's an image of a picture regardless of the contents of the picture.

~~~
mannykannot
A classifier that implements that logic is not going to be useful for
anything.

~~~
sharemywin
It's sure as hell not a cat. cat's aren't flat and don't have white borders.
Well I guess it could be road kill.

Once you figure out it's a picture(or photo) of something then figure out what
it's a picture of.

------
therajiv
It's not clear to me how malicious actors can manipulate this observation to
confuse self-driving cars. That said, I don't think this discredits the point
of the article; it's important to note how easily deep learning models can be
fooled if you understand the math behind them. I just think the example of
tricking self-driving cars is difficult to relate with / understand.

~~~
skishore
Why do you say that? The first demo they provide shows that the adversarial
image, when printed and then manipulated, still fools the algorithm. That
means that the example is robust to various affine transformations but also to
the per-pixel noise that is a result of a printing something and then viewing
it again through a camera.

Suppose you were to place an example like that on a stop sign that fooled a
car into thinking that it was a tree. The car might blow through an
intersection at speed as a result.

The training strategy they used provides a template for doing even more exotic
manipulations. For example, you could train an adversarial example that looked
like one thing when viewed from far away but something quite different up
close. Placing an image like that by a road could result in an acute,
unexpected change in the car's behavior (e.g. veering sharply to avoid a
"person" that suddenly appeared).

~~~
nullc
Though I generally agree with your point, the tree vs stopsign example may not
be the best because it would arguably work equally well on humans.

~~~
SmallDeadGuy
Only if the adversarial image printed doesn't look like the stop sign, though
the example in this article shows that it's entirely possible to make an image
that just looks like a distorted/badly-printed kitten to a human but
completely different to a computer. A similar image for a stop sign might just
look like wear in the paint or weird reflections or something but still look
like a stop sign to a human.

~~~
randyrand
yes but wont we still notice that self driving cars aren't stopping at the
stop sign? and we'd investigate

