
Adversarial Examples Are Not Bugs, They Are Features - selimonder
https://arxiv.org/abs/1905.02175
======
pakl
Deep convolutional networks, by design, are unable to integrate contextual and
ambient information present in an image (or in preceding images) to inform how
to interpret local features they use. So it's no surprise they struggle with
unconstrained images. Images where ambient context varies.

It's intriguing how much focus there is on adversarial examples. You don't
need adversarial examples to make a deep network fail - in a sense that's
overkill. Just point the poor deep network at a sequence of images from the
real world -- images from a self driving car, security camera, or webcam.
You'll see it make spontaneous errors. No matter how much training data you
gave it.

The field will advance when/if practitioners recognize that classifying pixel
patterns in isolation isn't sufficient for robust visual perception, and adopt
alternative neural network designs that can interpret what they perceive in
light of (no pun intended) context and physical expectations.

It worked for our prototype.[0]

[0] [https://arxiv.org/abs/1607.06854](https://arxiv.org/abs/1607.06854)

~~~
MAXPOOL
Learning multilayer convolutional representations of statistical features is
roughly equal to taking few first few layers in visual cortex and stacking
them. Creating higher and higher stacks is not going to solve vision.

We are essentially building a frog with better and better visual perception in
the hope that it could become a taxi driver. It will become a totally amazing
super-frog with super-vision, but it's still just a frog with frog-like visual
perception and limits. Using pre-attentive feature recognition stage
equivalent for complex object recognition can fake human like object
recognition when we force it, but it's wrong approach. We get these
catastrophic failures because we hit the limits.

Features seem to exist independently from one another in the early processing
stages of human perception. They are not associated with a specific object
either. Human perception is not gradually turning features into objects like
we do in deep learning. Properly distinguishing feature integration from
detection and how to do it is a open question.

~~~
jacobush
And people will amaze at the totally super-froggy things these super-frogs can
do, and understand even less why the super frogs aren't taxiing already. :-)

~~~
AstralStorm
They actually are, but the self driving cars use that subsystem as only one
component of the whole and most of it is not a super-frog.

------
Chirono
There's a good summary of the paper by the authors here, for people who don't
want to digest the pdf:
[http://gradientscience.org/adv/](http://gradientscience.org/adv/)

~~~
andybak
I find academic papers fairly indigestible both because of their language and
verbosity and because PDF is a fairly horrible format for reading on screen.

So thank you.

~~~
p1esk
They get easier to digest after you’ve read a few dozen. Or written a couple.

------
Macuyiko
Very interesting paper. With some surprising insights (need to read it a
couple more times for sure).

The conclusion states:

> Overall, attaining models that are robust and interpretable will require
> explicitly > encoding human priors into the training process.

I feel that is true, though another part of the solution IMO lies in coming up
with classifiers that can do more than output a probability alone. I agree
that classifiers being sensitive to well-crafted adversarial attacks is
something that can't be avoided (and perhaps even shouldn't be avoided at the
train-data level), but the problem lies mainly at the output end. As a user,
the model gives no insights towards how "sure" it feels about its prediction
or whether the inputs deviate from the train set (especially in the useful
non-robust feature set). This is especially a problem given that we stick
softmax on almost all neural networks, which has a tendency to over-estimate
the probability of the rank 1 prediction which confuses humans. Most
adversarial attacks show [car: 99%, ship: 0.01%, ...] for the original image
and [ship: 99%, car: 0.01%, ...] for the perturbed image.

Using interpretability and explanatory tools to inspect models is a good
start, though I'd like to see more attention being given to:

\- Feedback with regards to whether a given instance deviates from the
training set, and to which extent

\- Bayesian constructs w.r.t. uncertainty being incorporated, instead of only
probabilities. Work exists that tries to do this already [1,2] with very nice
results, though is not really "mainstream"

[1]:
[https://alexgkendall.com/computer_vision/bayesian_deep_learn...](https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/)

[2]: [https://eng.uber.com/neural-networks-uncertainty-
estimation/](https://eng.uber.com/neural-networks-uncertainty-estimation/)

~~~
AstralStorm
DBNN are actually mainstream, the issue being they have the same failure modes
while also being slow to train.

We just do not know how high level structure of a mind looks, best we have is
some sort of data compression entropy model. That's obviously not enough.

Adversarial training model is probably closer (e.g. A3C) but it's not detailed
enough either.

Value and policy loss are extremely blunt tools to evaluate an actor or
critic, for example

~~~
Macuyiko
Totally agree. Except I didn't know DBNN are mainstream. That is, in research
they're obviously well known, though I've personally not yet encountered
industry settings (companies different from the tech unicorns, that is) that
utilize them or even think about these problems. They often end up using the
latest well-known architecture (like YOLO) in TensorFlow. That said, we mostly
work with retailers and finance-insurance (non-US).

Would be interested to know if your experience differs and in which
industries.

------
AstralStorm
One thing I don't agree with is that notion of robustness is human specified,
when they clearly measure robustness of a given feature before classification
is changed.

Robustness is a systems statistical notion of amount or degrees of freedom of
state perturbation required to change output, also taking into consideration
the magnitude of change. It is related to but not same as system theoretical
stability. There's nothing human about the definition. Robust features need
not be human derived.

The desired degree of robustness vs absolute accuracy or precision or bias
trade-off is human specified but generally the trade-off is not huge between
these variables.

~~~
omnicognate
Their definition of robustness relies on a pre-specified choice of which set
of perturbations to be "robust" to. They use the letter delta for this set in
the paper, IIRC. This is where the "human" bit comes in: to define robustness
in the abstract you would have to establish which perturbations "should"
change the categorisation and which ones "shouldn't", which they avoid
attempting to do.

------
logane
One of the first authors here - happy to answer any questions!

------
nannananannana
Mirage

