
The Neural Net Tank Urban Legend - JoshTriplett
https://www.gwern.net/Tanks
======
teraflop
I first encountered this idea in a sci-fi story (I want to say it was one of
Peter Watts' "Rifters" novels, but I can't find it now). The idea was that
someone trained a neural network to look at live video feeds of passengers
moving through a subway station, and control the station's ventilation system.
Unfortunately, the movements of individual people were fairly random, whereas
the large-scale traffic patterns were extremely regular and periodic. So
instead of basing its output on the actual crowd patterns, the neural net
decided it was more accurate to look at the hands of an analog clock that
happened to be visible through one of its cameras.

All well and good, until the clock stopped working during rush hour, and
people started asphyxiating.

~~~
vikiomega9
It's starfish,
[http://www.rifters.com/real/STARFISH.htm#bulrushes](http://www.rifters.com/real/STARFISH.htm#bulrushes)

------
kainolophobia
>I suggest that dataset bias is real but exaggerated by the tank story, giving
a misleading indication of risks from deep learning

I don't see how this story gives a "misleading" view of deep learning. From my
(admittedly limited) experience with self-driving RC cars, this type of
mistake is quite easy for a neural net to make while being quite difficult to
detect. In our case, after utilizing a visual back-prop method, we realized
our car was using the lights above to direct itself rather than the lanes on
the road.

Now, you can refute this and say "well clearly your data wasn't extensive
enough" or "your behavioral model is too simple for a complicated task like
driving" however as these tools become easier to use, more and more
organizations will put them into practice without as much care as the
researchers behind most of the current production efforts.

~~~
ml_thoughts
Another more modern and well-documented example of this would seem to occur in
a 2015 write-up of the "Right Whale" competition in Kaggle:
[http://felixlaumon.github.io/2015/01/08/kaggle-right-
whale.h...](http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html)

Contrary to this author's claims, despite using data augmentation and a fancy
modern CNN, a neural network trained to identify whales hit a local optimum
where it looked at patterns in waves on the water to identify the whale
instead of distinctive markings on the whale's body.

I don't buy the "this isn't a problem in real world applications" argument
being made in this article.

~~~
vilhelm_s
He says that his first attempt at whale recognition looked at waves instead of
whales, but

> This naive approach yielded a validation score of just ~5.8 (logloss, lower
> the better) which was barely better than a random guess.

which is different from the tank story. For the tanks, the neural network
appeared to perform well, but was actually not looking at the tanks. Here, it
never performed well, and when debugging why not he found that it was not
looking at the whales.

------
YeGoblynQueenne
The whole "Could it happen"? section is a bit strange. On the one hand, it
focuses on CNNs when it's clear we're talking about a binary classifier (the
article itself points that out). If Fredkin was really the originator of the
story, then discussing CNNs is an anachronism (they were 30 years away at the
time).

More importantly, it's obvious that "it" could definitely happen and in fact
happens a lot- "it" being overfitting to examples. Machine learning
classifiers suffer from this a lot, it's the whole bias/variance tradeoff
issue. Neural Nets are not only not immune to overfitting, they are even
particularly vulnerable to it (especially the ones with millions of
parameters). We've probably all read the adversarial examples papers- a clear
case of overfitting to irrelevant details ("noise").

The story (apocryphal or not) seems like a cautionary tale against
overfitting, or a not-so-innocent attempt to poke fun at machine learning
researchers. One way or another, overfitting is no joke and it's definitely no
urban legend.

~~~
aaron695
> One way or another, overfitting is no joke and it's definitely no urban
> legend.

Can you give an example of where overfitting happened and was successfully
corrected for?

~~~
red75prime
That's why testing and validation sets exists. Overfitting is prevented by
ending learning process when error on validation set starts growing. It is a
standard procedure, so it's unlikely there's a recent case of overfitting
slipping into production.

If irrelevant feature is present in all class samples, then it is not a fault
of NN to use it as a class feature, it's bad data.

~~~
aaron695
So you would agree overfitting is not a real issue and talking about it is
distracting from NN?

My question is are people using overfitting as an excuse of a what is instead
a badly made NN.

If you are smart enough to create a NN that can tell if it's sunny or not then
tanks would also be possible. But if your NN just sucks than blaming
overfitting is a convenient out.

~~~
YeGoblynQueenne
>> My question is are people using overfitting as an excuse of a what is
instead a badly made NN.

Overfitting is a major issue in machine learning and it's an inherent
characteristic of learning from examples and not the result of a mistake, or
of poor practice. There are special techniques developed explicitly to reduce
overfitting- early stopping (what red75prime above, describes),
regularisation, bagging (in decision trees) etc. A lot of work also goes into
ensuring measures of learning performance don't mistake overfitting for
successful learning (e.g. k-fold cross validation).

I'm sorry that I don't have time to track down a good source for a discussion
of the bias-variance tradeoff and overfitting. You can start at the wikipedia
page
[[https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff)]
and follow the links. In short- a model that learns to reproduce its example
data with very high fidelity, risks generalising poorly, whereas a model that
generalises well may have high training error. Linear classifiers in
particular are high-bias, whereas nonlinear learners, like multi-layered
neural networks or decision trees, are high-variance.

The problem is real, it's a big bugbear and you won't find any specialist who
dismisses it, or who considers it "not a real issue".

------
andreasvc
For a better, actual example of this problem, see the leopard sofa:
[http://rocknrollnerd.github.io/ml/2015/05/27/leopard-
sofa.ht...](http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html)

~~~
tomelders
That comment section took an immediate and unexpected turn for the worse.

~~~
KVFinn
>That comment section took an immediate and unexpected turn for the worse.

What the heck is going on there?

~~~
eric_h
Terry Davis - he occasionally chimes in here with similarly themed posts (but
only if you have show dead enabled).

He's schizophrenic, is famous for TempleOS and infamous for the contents of
his posts on the internet.

------
RodgerTheGreat
I think the author's conclusion- that this scenario is unrealistic and would
never happen given today's understanding of machine learning techniques- is
_extremely optimistic_. NNs are demonstrably[1] not robust image classifiers.

In my opinion, it's far more dangerous to downplay the limitations of this
technology and embolden snake-oil purveyors than it is to demand an
inconvenient degree of rigor and caution in reporting results.

[1] [https://arxiv.org/abs/1707.07397#](https://arxiv.org/abs/1707.07397#)

~~~
gwern
> I think the author's conclusion- that this scenario is unrealistic and would
> never happen given today's understanding of machine learning techniques- is
> extremely optimistic. NNs are demonstrably[1] not robust image classifiers.

I am well-aware of adversarial examples, and they are not the same thing as
dataset bias, and I am very troubled by them. If you look at the section on
whether we should tell the tank story as a cautionary story, I already say:

> I also fear that telling the tank story tends to promote complacency and
> underestimation of the state of the art by implying that NNs and AI in
> general are toy systems which are far from practicality and cannot work in
> the real world (particularly the story variants which date the tank story
> recently), or that such systems will fail in easily diagnosed and visible
> ways, ways which can be diagnosed by a human just comparing the photos or
> applying some political reasoning to the outputs, when what we actually see
> with deep learning are failure modes like "adversarial examples" which are
> quite as inscrutable as the neural nets themselves (or AlphaGo's one
> misjudged move resulting in its only loss to Lee Sedol).

To expand a little: dataset bias at least has the tendency to expose itself as
soon as you try to apply it. You waste your time, but that's generally the
worst part. I'm more worried about stuff like adversarial examples, which will
work great in the field right up until a hacker comes by with a custom
adversarial example (eg the adversarial car sign work showing you can trick
simple CNNs into misclassifying speed limits and stop signs using adversarial
examples pasted onto walls or signs or streets). This is not dataset bias; you
can collect images of every single stop sign in the world and that will not
stop adversarial examples.

> embolden snake-oil purveyors than it is to demand an inconvenient degree of
> rigor and caution in reporting results.

I think it's ironic to say that doing the very simplest level of fact-checking
like 'did this story ever actually happen' is an 'inconvenient degree of rigor
and caution' and 'emboldens snake-oil purveyors'.

------
Rotten194
I get the author's feelings about why we shouldn't tell this story, but I
still disagree. It's a pithy, funny example of GIGO in machine learning.
People _could_ read conclusions about the abilities of neural networks from
the story, but they're wrong to do so -- it's a PEBKAC error, not a technology
one. "Truthy" cautionary tales are a near-universal feature of human cultures
-- why shouldn't machine learning have some?

~~~
robotresearcher
It's a plausible story. Fine to present as a parable, but we should stop
presenting it as true unless we can find a reliable source for it.

Until today, I believed it was true. It was told to me as an undergrad, by a
professor who believed it himself.

~~~
rrmm
I first heard it in an AI class as well. When I relate the story I usually
give it as an anecdote of apocryphal origin. But I do love the story, because
it brings into focus several issues: what do you actually want your classifier
to learn? what does your training set actually teach? how do you know you've
learned the right thing?

Many times it seems like people go into these things hoping that the machine
learning part will figure out things for them and relieve themselves of the
problem of thinking hard about the problem. It doesn't. It only moves your
problem over a bit and increases the difficulty.

In fact this problem pops up even in pedagogy where the lessons people are
taught actually train them to do the wrong thing (for example pilots
responding to aircraft attitude upsets).

The parable's lesson is a simplistic one, basically: "stop and think about
what you're doing". But like other simple lessons about crying wolf or
stitching in-time, it bears repeating.

~~~
robotresearcher
> It only moves your problem over a bit and increases the difficulty.

Well, that's not quite true. In robot sensing several things have recently
moved from the nigh-on-impossible column to the holy-shit-that-actually-works-
pretty-well column, thanks to ML.

But I agree with the rest of it.

------
nerdponx
There are three separate lessons to be learned from this parable, and I think
people are conflating them:

1\. Training on a biased data set leads to biased predictions. This is
undoubtedly true.

2\. Data sets can be biased in unexpected and unforeseen ways, so therefore
productions can also be biased in unexpected and unforeseen ways. The examples
at the end of this article don't quite touch on that point. But examples of
this abound in social science. Eg: [https://blog.conceptnet.io/2017/07/13/how-
to-make-a-racist-a...](https://blog.conceptnet.io/2017/07/13/how-to-make-a-
racist-ai-without-really-trying/)

3\. Deep and convolutional neural networks are susceptible to this phenomenon.
This is the point that the article is debating.

------
zodPod
This kind of stuff is fairly normal actually. I don't really agree that it
couldn't happen. Neural Networks train for the wrong features all of the time.
It's part of what happens when you're training unsupervised. You load in a lot
of data and then you find the bias. Sure, as the article purports, if done
perfectly it wouldn't happen. But that's like saying "If you build a bridge
perfectly it won't fall down." before building the first bridge. This was
supposedly an old tale so I'm not sure why the author would assume the people
working on the original theoretical NN were data experts who knew the correct
ways to train NNs.

~~~
6nf
Do you have any documented examples?

------
akavel
Umm, but then a story linked from the article as "alternative example" (thus
presumably "better" than the tank story), and it being one from HN by the way,
seems to have a nearly identical gist, at least for me as a layman:
[https://news.ycombinator.com/item?id=6269114](https://news.ycombinator.com/item?id=6269114)
\- only not about neural nets, but genetic/evolutionary algorithms. Or is it
somehow drastically different and I just don't understand that?

~~~
gwern
The difference is that api (Adam Ierymenko, 17k karma, 10 year old account)
there says he did it himself - he is not retelling 'a friend of a grad student
of a professor told me about some NNs'... I am willing to believe that a HN
user who says something happened to himself, that it actually happened.

And there is a big difference between something that happened and something
that did not happen.

------
Nomentatus
For those who say we shouldn't pay too much attention to urban legends about
neural network failures, here's a real-life example of neural networks
translating "inorganic cat litter" as "in organic cat litter" and thereby
creating a real-life half-billion-dollar dirty bomb that genuinely exploded.

[https://jonathanturley.org/2014/11/21/kitty-litter-dirty-
bom...](https://jonathanturley.org/2014/11/21/kitty-litter-dirty-bomb-new-
mexico-nuclear-disposal-plant-causes-500-million-of-damage-by-using-wrong-
kind-of-kitty-litter/)

This is a hasty link, IIRC the error happened when someone read out
instructions aloud to someone else took who was taking notes.

Yes, that badly behaving neural network(s) was human, and therefore far more
sophisticated than any we can build yet. Which makes the problem worse and
more real, not better or less real.

------
tyingq
The big takeway for me is that even if untrue, similar situations are true.
Like this one from the article: "Gender-From-Iris or Gender-From-Mascara?"
[https://arxiv.org/pdf/1702.01304.pdf](https://arxiv.org/pdf/1702.01304.pdf)

------
1024core
All this speculation is silly. Just generate your own data set (since the
story is from the early 90s, if not earlier, the number of training examples
would have been quite small compared to today's data sets) and see if today's
networks make the same mistake.

~~~
gwern
Tanks are already in ImageNet: [http://image-
net.org/synset?wnid=n04389033](http://image-net.org/synset?wnid=n04389033)

------
aqsalose
I have heard this story multiple times. My impression is that often person who
tells the story treats the fact that the mistrained model was NN-based as a
minor detail (or the kind of juicy but ultimately insignificant detail that
make the story more fun to tell; and if the original story was about a NN,
nobody is going to change it to a SVM or something else).

From this viewpoint, I found the section where the author lengthly argues how
this could not possibly happen with the current state of the art visual task
CNNs (especially because people apply preprocessing steps such as whitening
and augmentation to get rid of exactly this kind of biases), let's say, weird.
The parable is not about CNNs, it is about the importance of paying attention
what features your model will extract from the training dataset and whether
your model is learning the right things.

------
thanatropism
Scare-quotes dataset bias (in science we say "selection bias") is the bread
and butter of any field that doesn´t get an opportunity to fine-tune sample
design issues.

There's even hierarchical models with an equation giving the probability that
an item will be observed at all, conditioned to known features.

Those who don't know their statistical models are bound to reinvent
statistical theory.

~~~
rrmm
reinvent it poorly as well?

------
haeffin
> a common preprocessing step in computer vision (and NNs in general) is to
> whiten the image by standardizing or transforming pixels to a normal
> distribution; this would tend to wipe global brightness levels, promoting
> invariance to illumination

Is there anybody still doing this?

~~~
gwern
Seems to still be pretty common:
[https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&as_yl...](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&as_ylo=2016&q=whiten+OR+normalize+image+site%3Aarxiv.org)
Am I wrong?

~~~
haeffin
"normalizing" is a bad search term, it can mean a lot of things. And whitening
images is pretty much dead. What is done is subtracting mean colors from each
pixel, but those are means over the whole database, not per-image, so that
keeps brightness shifts intact.

~~~
gwern
In searches you should err on the side of broadness. If you cut it down to
just 'whitening', as you can see from the snippets as well, there are plenty
of hits. You may not like whitening, but it does still seem to be common.

~~~
haeffin
Using your search, at least for me, none of the snippets on the first page use
normalization in the sense that you are in this context. So including that
term just got you a lot of noise. And the only reference to whitening on that
search page is not using it in an input pipeline, it is using ZCA to detect
images that are modified to be adversarial.

If you want some better data than unreliable searches, go download pretrained
models for popular architectures and popular frameworks and look at the input
pipelines for them. You'll find that whitening is absolutely not common for
image classification/detection today (yes, there are still some cases where it
is used, but typically on smaller datasets where you can't get that invariance
from data, which is the way you prefer it to be - if one class actually is
more likely to be present in dark images, you don't want to kill that
information).

------
gok
I always heard the version that went the other way around. After it was shown
that single layer perceptrons were unable to deal with data sets that weren't
linearly separable, there was an effort to figure out how the single layer
tank classifier was working.

~~~
gwern
That's an interesting variant - none of the versions I've seen so far link it
to Minsky's perceptron book. Any chance you recall where you saw that one?

~~~
gok
This was from a prof giving an undergrad ML lecture at Cornell about 10-12
years ago. Wikipedia's coverage of _Perceptrons_ suggests my lecturer also had
only heard the mistaken version of Minky's XOR example, so this could have
been entirely wrong :)

~~~
gwern
Oh. How boring. Wonder if I should include that as an example... It's a valid
example of how urban legends evolve, after all.

------
dontreact
It's funny, I heard this legend a bunch, but stopped hearing it after the 2012
AlexNet paper.

~~~
eltoozero
For reference:

 _AlexNet[1]_ is the name of a convolutional neural network, originally
written with CUDA to run with GPU support, which competed in the ImageNet
Large Scale Visual Recognition Challenge in 2012. The network achieved a top-5
error of 15.3%, more than 10.8 percentage points ahead of the runner up.
AlexNet was designed by the SuperVision group, consisting of Alex Krizhevsky,
Geoffrey Hinton, and Ilya Sutskever.

AlexNet Paper(PDF)[0]

[0]:
[http://vision.stanford.edu/teaching/cs231b_spring1415/slides...](http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf)
[1]:
[https://en.wikipedia.org/wiki/AlexNet](https://en.wikipedia.org/wiki/AlexNet)

------
digi_owl
Makes me think of the story of soviet soldiers training dogs to carry
explosives under tanks during WW2. Only they used their own tanks to train the
dogs. So when deployed to the battlefield, well...

------
InclinedPlane
I've always liked this story better, it's in a somewhat similar vein:
[https://www.damninteresting.com/on-the-origin-of-
circuits/](https://www.damninteresting.com/on-the-origin-of-circuits/)

------
11thEarlOfMar
What does one validate that a trained NN has learned a correct method?

------
xchaotic
Why is it called an urban legend, it seems to accurately depict how NNs work.

