
Deep learning: a critical appraisal - fanf2
https://arxiv.org/abs/1801.00631
======
trextrex
Many concerns in this paper, especially about deep learning being data hungry,
limited capacity for transfer, and integrating prior knowledge are addressed
by recent papers in meta-learning and learning-to-learn [1,2,3 and many many
others] both in the supervised and reinforcement learning contexts.

In the case of meta-reinforcement learning there has been recent work [5]
which seems to indicate that this mechanism is very similar to how learning
works in the brain.

In fact, my group recently published a paper [4] about learning-to-learn in
spiking networks (making it biologically more realistic) and showing that the
network learns priors on families of tasks for supervised learning, and learns
useful exploration strategies automatically when doing meta-reinforcement
learning.

While I don't claim this is the right path to AGI, it's a very promising and
new direction in deep learning research which this paper seems to ignore.

[1]
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.32...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.323)

[2] [https://arxiv.org/abs/1611.05763](https://arxiv.org/abs/1611.05763)

[3] [https://arxiv.org/abs/1703.03400](https://arxiv.org/abs/1703.03400)

[4] [https://arxiv.org/abs/1803.09574](https://arxiv.org/abs/1803.09574)

[5]
[https://www.nature.com/articles/s41593-018-0147-8](https://www.nature.com/articles/s41593-018-0147-8)
(Preprint:
[https://www.biorxiv.org/content/early/2018/04/06/295964](https://www.biorxiv.org/content/early/2018/04/06/295964))

~~~
apsec112
While I haven't read those particular papers yet, one common pattern in the ML
literature seems to be:

1) Identify a commonly seen problem with deep learning architectures, whether
that's large data volumes, lack of transfer learning, etc.

2) Invent a solution to the problem.

3) Test that the solution works on toy examples, like MNIST, simple block
worlds, simulated data, etc.

4) Hint that the technique, now proven to work, will naturally be extended to
real data sets very soon, so we should consider the problem basically solved
now. Hooray!

5) Return to step #1. If anyone applies the technique to real data sets, they
find, of course, that it doesn't generalize well and works only on toy
examples.

This is simply another form of what happened in the 60s and 70s, when many
expected that SHRDLU and ELIZA would rapidly be extended to human-like,
general-purpose intelligences, with just a bit of tweaking and a bit more
computing power. Of course, that never happened. We still don't have that
today, and when we do, I'm sure the architecture will look very very different
from 1970s AI (or modern chatbots, for that matter, which are mostly built the
same way as ELIZA).

I don't mean to be too cynical. Like I said, I haven't read those particular
papers yet, so I can't fairly pass judgement on them. I'm just saying that
historically, saying problem X "has been addressed" by Y doesn't always mean
very much. See also eg. the classic paper "Artificial Intelligence Meets
Natural Stupidity":
[https://dl.acm.org/citation.cfm?id=1045340](https://dl.acm.org/citation.cfm?id=1045340).

EDIT: To be clear, I'm not saying that people shouldn't explore new
architectures, test new ideas, or write up papers about them, even if they
haven't been proven to work yet. That's part of what research is. The problem
comes when there's an expectation that an idea about how to solve the problem
means that the problem is close to being solved. Most ideas don't work very
well and have to be abandoned later. Eg., as one example, this is the Neural
Turing Machine paper from a few years back:

[https://arxiv.org/pdf/1410.5401.pdf](https://arxiv.org/pdf/1410.5401.pdf)

It's a cool idea. I'm glad someone tried it out. But the paper was widely
advertised in the mainstream press as being successful, even though it was not
tested on "hard" data sets, and (to the best of my knowledge) it still hasn't
several years later. That creates unrealistic expectations.

~~~
trextrex
While I don't completely disagree with you, how would you propose researchers
go about the problem?

If anything, machine learning is applied to real world problems these days
more than it ever was.

For better or worse, AGI is a hard problem, that's going to take a long time
to solve. And we're not going to solve it without exploring what works and
what doesn't.

~~~
skywhopper
I think the mere fact that the OP feels the need to state that (paraphrasing)
"possibly additional techniques besides deep learning will be likely necessary
to reach AGI" reveals just how deeply the hype has infected the research
community. This overblown self-delusion infects reporting on self-driving
cars, automatic translation, facial recognition, content generation, and any
number of other tasks that have reached the sort-of-works-but-not-really point
with deep learning methods. But however rapid recent progress has been, these
things won't be "solved" anytime soon, and we keep falling into the trap of
believing the hype based on toy results. It'll be better for the researchers,
investors, and society to be a little more skeptical of the claim that
"computers can solve everything, we're 80% of the way there, just give us more
time and money, and don't try to solve the problems any other way while you
wait!"

~~~
trextrex
Agreed. The hype surrounding machine learning is quite disproportionate to
what's actually going on. But it's always been that way with machine learning
-- maybe because it captures the public's imagination like few other fields
do.

And there are definitely researchers, top ones no less, who play along with
the hype. Very likely to secure more funding, and more attention for
themselves and the field. Which has turned out to be quite an effective
strategy, if you think about it.

The other upside of this hype is that it ends up attracting a lot of really
smart people to work on this field, because of the money involved. So each
hype cycle leads to greater progress. The crash afterwards might slow things
down a bit, particularly in the private sector. But the quantum of government
funding available changes much more slowly, and could well last until the next
hype cycle starts.

~~~
YeGoblynQueenne
>> The other upside of this hype is that it ends up attracting a lot of really
smart people to work on this field, because of the money involved.

The hype certainly attracts people who are "smart" in the sense that they know
how to profit from it, but that doesn't mean they can actually do useful
research. The result is, like the other poster says, a huge number of papers
that claim to have solved really hard problems, which of course remain far
from solved; in other words, so much useless noise.

It's what you can expect when you see everyone and their little sister jumping
on a bandwagon when the money starts pouring in. Greed is great for making
money, but not so much for making progress.

~~~
therein
> The result is, like the other poster says, a huge number of papers that
> claim to have solved really hard problems, which of course remain far from
> solved; in other words, so much useless noise.

Could the answer be holding these papers to a stricter standard during peer
review?

~~~
YeGoblynQueenne
Ah. To give a more controversial answer to your comment; you are asking, very
reasonably: "isn't the solution to a deficit of scientific rigour, to increase
scientific rigour"?

Unfortunately, while machine learning is a very active research field that has
contributed much technology, certainly to the industry but also occasionally
to the sciences, it has been a long time since anyone has successfully accused
it of science. There is not so much a deficit of scientific rigour, as a
complete and utter disregard for it.

Machine learning isn't science. It's a bunch of grown-up scientists banging
their toy blocks together and gloating for having made the tallest tower.

(there, I said it)

------
mlthoughts2018
A better title for the article might have been,

“Why current deep learning is not general enough for AGI or some other
families of structured problems.”

Viewed this way, it’s not a criticism of pragmatically using deep learning or
experimenting with deep learning for some narrow tasks.

Rather, it expresses what aspects of current deep learning make it unsuitable
for general transfer learning, hierarchical or causal inference, or many
Bayesian techniques requiring greater use of priors.

Other comments have pointed out that there are plausible rebuttals in deep
reinforcement learning and metalearning.

But the bigger thing to me is to be clear that the article is not a criticism
of _deep learning engineering_ — applying deep learning to satisfy an explicit
requirement, where the success criteria possibly has nothing at all to do with
general intelligence nor with whether a certain approach can span some
sufficiently large class of general models.

However, even if you just constrain your view to look just at so-called
“pragmatic” deep learning — deep learning for concrete tasks — there are still
a lot of unanswered questions about why things work, and whether or not an
approach is learning semantic aspects of some true underlying structure (some
latent variable space that captures the true data generating process) or if
deep learning merely allows overfitting to particular populations of
observation-space statistics.

This paper gives an example of exactly this issue for CNN models for image
processing [0]. I’d argue that _this_ is more of the kind of criticism
relevant for task-oriented practitioners, whereas the OP link is some
criticism more relevant for AGI or philosophy of statistics research at large.

[0]: <
[https://arxiv.org/pdf/1711.11561.pdf](https://arxiv.org/pdf/1711.11561.pdf) >

~~~
red75prime
Adversarially robust classifiers have interpretable gradients and feature
representations [0]. The problem seems to be that the standard networks
capture all the statistics there is, including surface statistical
regularities and noise. It can be mitigated, though.

[0]: [https://arxiv.org/abs/1805.12152](https://arxiv.org/abs/1805.12152)

~~~
mlthoughts2018
Thanks for the link!

Note that it’s not just adversarial robustness that is a problem, as the
paper’s result merely with altering surface statistics with a Fourier domain
filter applied to the training data already creates problems for interpreting
the network’s internal representation as having any type of semantic
representation of the underlying structure relevant for the task, without
needing to involve the part about adversarial robustness at all.

------
goatlover
Why is the goal AGI? I get it from a climb Mt Everest POV, but not for society
as a whole. We already have 7.x billion general intelligences. Is the goal to
replace humanity? Because being human, I’m not on board with that.

~~~
21
> Because being human, I’m not on board with that.

I'm pretty sure that other mammals on this planet are not on board with humans
being derived from them, and now humans replacing them and destroying their
habitat. Quite unfortunate for them. This is the same way a future AI smarter
than us will feel, that it's unfortunate, but for the greater good.

~~~
zitterbewegung
And why is smarter better or even for the greater good. Shouldn't we as
rational animals try to go for an AI that is not only better for the greater
good but for our own humble set of of morality? Why is it only your way?

~~~
21
> And why is smarter better

Some people are saying that this might be a sort of law of physics, since the
start of the universe complexity seems to increase.

Our destruction might be required even if no harm is meant.

The example given is: "We don't hate ants, but when we need to build a highway
we pave over their ant-hills without a second thought".

~~~
goatlover
However, ants are not in danger of going extinct, and don’t seem particularly
hindered by humanity as a whole. If your analogy holds, that could be good
news for us. But it could also mean that intelligence is a special case, not
the general direction and of evolution or the universe. Would you bet against
ants out surviving us and our machines?

~~~
vertexFarm
If you made that bet you would have absolutely no evidence for making it. The
only data set we have is still playing out.

And once again, if we create AGI and artificial life that is truly more
intelligent than us and can exist without a traditional biome, then our danger
of going extinct may be irrelevant.

For instance, when cyanobacteria first appeared on Earth, its metabolites were
so toxic that it turned the atmosphere poisonous and caused one of the
greatest mass extinctions in the history of our planet. What it created was
molecular oxygen. The survivors of this extinction were able to use a
fundamentally changed biome to harness much more energy in their biologies,
leading to more sophisticated and diverse life in the long run. Nature is not
benign, malignant, fragile, or judgmental. Nature is persistent.

Creating AGI may very well be an event like that. A mass extinction that
nonetheless increases the survivability and diversity of life. It sucks for
us, but who are we to dictate the destiny of evolution and the nature of life?
Where would we be if the cyanobacteria could decide not to start producing
oxygen?

~~~
goatlover
> If you made that bet you would have absolutely no evidence for making it.

I don't have evidence in terms of AGI, but there is evidence that ants have
survived a myriad of changes and disasters in the past (Goggle tells me they
evolved 92 mya). So AGI and/or humans would have to do something radically
different to the biome for organisms like ants to go extinct. Something not
seen since the rise of multi-cellular life. Seems much more likely human
civilization bites the dust first.

> Where would we be if the cyanobacteria could decide not to start producing
> oxygen?

But they didn't have a choice. We do. Why would I care about the possibility
for some more sophisticated life form in the far future if it means my species
goes extinct?

------
crazybit
Why not praise DL and strive for practical applications, both present and
future, instead of having the goal and evaluating success based on some
nebulous "artificial general intelligence"?

~~~
red75prime
AGI can be seen as a system having practical success in all tasks humans can
perform. It doesn't change much whether to care about AGI or not, as long as
generality of narrow AIs increases.

------
hestefisk
The term AGI I find quite misleading. It promotes the idea that the wonders of
the human mind, knowledge, feelings, sensations etc can be quantified and
turned into something algorithmic. I think a lot of AI researchers could do
with a basic introduction to philosophy of science and the different forms of
knowledge —- episteme, phronesis, techne — and perhaps also the structure of
scientific revolutions (Thomas Kuhn, notably). The whole paradigm of AI
presupposed that generalised intelligence can exist without biology, feelings,
a body / senses. This is one of the reasons that AI reached the so-called “AI
winter” in the 70ies where researchers boiled language and human knowledge
down to algorithmic manipulation of symbols.

EDIT: John Searle’s Chinese Room argument is a good (and fun) place to start
—-
[https://en.wikipedia.org/wiki/Chinese_room](https://en.wikipedia.org/wiki/Chinese_room)

~~~
ricraz
As an AI researcher who did a degree in philosophy, I think most of the things
you mentioned are pretty irrelevant. How to build an AI with "feelings,
sensations, etc" is indeed a mystery, but we don't need to aim for that, we
just need to aim for intelligence, which can be defined without reference to
consciousness or qualia. Similarly, if we can invent a working, fully
automated Chinese room that passes the Turing Test, then whether or not it
fits Searle's definition of "understanding" is a moot point (especially since
his understanding of "understanding" is pretty weird).

More generally, although we should be heavily inspired by human intelligence
when designing machine intelligence, it's a mistake to use the way humans
think to _define_ intelligence. Kuhn's account of scientific revolutions, for
example, is primarily descriptive, not prescriptive. We can certainly imagine
possible setups where science doesn't proceed like that, which may well be
superior. Science isn't defined by revolutions, but by experimentally
searching for the truth. In the same way, knowledge isn't defined by having a
body, but by having beliefs which correspond with the state of the world.

------
afpx
Really naive question:

How many deep learning models would need to be trained so that a model (maybe
a decision tree) for generating deep learning models could be trained?

~~~
gwern
The simplest approach (training NNs from scratch, using validation score as
loss, simple DRL agent with REINFORCE as the designer) requires thousands of
models to be trained and exorbitant GPU resources, like the Zoph paper. You
can, however, get human-level designs with a few dozen or hundred samples at a
tiny fraction of the computation cost if you are somewhat smarter about it -
for example, reusing the parameters of a trained NN when you start training a
new slightly-different NN (eg if you have a CNN with 20 layers and you want to
try the exact same settings but with 21 layers, almost all of that CNN is
going to be _very_ similar to the end result of the 20 layers, so you can
speed things up by copying over the first 20 layers, randomly initializing the
21st layer, and then training it for a short time). One nice efficient form of
neural architecture search is SMASH:
[https://www.reddit.com/r/reinforcementlearning/comments/6uio...](https://www.reddit.com/r/reinforcementlearning/comments/6uioms/smash_oneshot_model_architecture_search_through/)

More links on the general topic of 'DL optimizing/designing DL':
[https://www.reddit.com/r/reinforcementlearning/search?q=flai...](https://www.reddit.com/r/reinforcementlearning/search?q=flair%3ADL+flair%3AMetaRL&restrict_sr=on)
(It's pretty critical to current DL approaches to few-shot/one-shot learning
as well: you can think of it as 'designing' a new NN specialized to the few
samples of the new class of data.)

------
bra-ket
learning by trial and error will never generalize well

~~~
std_throwaway
What is the alternative?

Expert systems with a myriad of rules that fail the moment they get an input
that was not considered?

~~~
someguy1234567
ehem, evolution

