
The deepest problem with deep learning - dsr12
https://medium.com/@GaryMarcus/the-deepest-problem-with-deep-learning-91c5991f5695
======
shafte
I seem to have missed the Twitter spat that precipitated this essay, but I
don't quite buy the larger argument he's making. We should judge approaches to
AI based on their results, not on their conformance to a (vague, incorrect,
untested) model of human cognition.

Symbolic AI fell out of favor primarily because it was not delivering results
in impactful problem areas. Deep learning is currently popular because we are
nowhere near the limit of what results it can produce.

Can this change? Of course! The history of deep learning itself proves as
much. But if you want to genuinely influence the direction of the field, you
have to lead by example and produce novel/interesting research results, not by
kvetching in The New Yorker that your favorite approach is not getting enough
attention.

~~~
dreamcompiler
Symbolic AI fell out of favor because it was overhyped. It was delivering
quite impressive results--just not the promised results. Neural nets fell out
of favor in the 90s for exactly the same reason.

Both failures ultimately were caused by not enough computing power. Even
though Deep Learning and Convolutional NNs look like major advances today,
they never could have been practical before about 2005: There just wasn't
enough computing power.

If modern computer power were thrown at symbolic AI the same way it's been
thrown at NNs, it _highly_ likely symbolic AI would experience similarly-
impressive gains.

~~~
hodgesrm
> If modern computer power were thrown at symbolic AI the same way it's been
> thrown at NNs, it highly likely symbolic AI would experience similarly-
> impressive gains.

What's the basis for this conjecture? Is there a mathematical model for
symbolic manipulation that would benefit from parallel execution/GPUs the way
ML applications do?

~~~
YeGoblynQueenne
Symbolic AI needs computing power to counteract combinatorial explosion.

[https://en.wikipedia.org/wiki/Combinatorial_explosion](https://en.wikipedia.org/wiki/Combinatorial_explosion)

The vulnerability of early logic-based AI to combinatorial explosion was the
main argument against funding AI research put forward in the Lighthill Report,
the document that shut down AI research in the UK in the 1970s and contributed
to the AI winter on the other side of the Atlantic, also.

Obviously, today we have more powerful computers so combinatorial explosion is
less of an issue, or anyway it's possible to go a bit further and do a bit
more than it was in the '70s.

One area of symbolic AI that actually does benefit from parallel architectures
(though not GPUs) is logic programming with Prolog. Prolog's execution model
is basically a depth-first search, which lends itself naturally to
parallelisation (one branch per search). Even more so given that data in
Prolog is immutable (no mutable state, no concurrency headaches).

But, in general, anything people did 20 or 30 years ago with comptuers can be
done better today. Not just symbolic AI or neural networks. I mean, even
office work like printing a document is faster today and that doesn't even
depend on GPUs and parallel processors.

~~~
YeGoblynQueenne
>> (one branch per search)

Oops, sorry. Meant "one branch per processor".

------
_cs2017_
I'm very disappointed with the quality of discussion between Marcus and LeCun.

First Marcus responds to Bengio's interview with the arrogant and pretentious
"I told you so" tweet, instead of simply saying that he agreed with Bengio.

Then LeCun tweets a snarky and disrespectful response, instead of simply
saying that Bengio's comments are different than Marcus's earlier critique.

As a result of these tweets, I lost a huge amount of respect for both
scientists.

~~~
evrydayhustling
While I agree that nobody looks great here, LeCun's response deserves to be
read in context. Marcus' comment (and the linked post) isn't primarily about
Bengio, it's about how he should be recognized as deep learning's earliest and
most important public detractor.

So, LeCun's response isn't about Marcus != Bengio, it's about the fact the
Marcus' critique fundamentally hasn't deserved response or recognition this
whole time, because there's not a constructive way to engage. That's a totally
fair point to make, though a guy in LeCun's position probably should have
said, "Gary, none of us have ever thought Deep Learning was (already) an
answer for everything. The primary work of this community is to make it
better."

~~~
woodandsteel
To repeat my question above: Marcus is saying that to get AGI, or at least get
a lot closer to it, we need to combine ML with symbolic manipulation. Do you
agree or disagree?

~~~
evrydayhustling
Not sure what you mean by above, but happy to answer. First, I think you're
referring most proximately to Marcus' paragraph including:

> And it’s where we should all be looking: gradient descent plus symbols, not
> gradient descent alone.

That's the most specifically defined proposal that I see in the article. It is
a reasonable mission statement to fire up your lab or community about. It's
also not something outside the ongoing deep learning discourse, and it's a
direction I personally am excited about. There has been great work recently by
DeepMind [1], for example, about using gradient descent to theorize about
symbolic relationships.

However, the statement above is also nowhere near operationalized enough to
say "agree / disagree" in any scientific sense. A specific model demonstrating
advantages of the marrying symbolic and gradient-based reasoning (c.f. the one
above) would open itself to productive discussion of its successes and
failures. People have asked for this (including in the tweetstorm), but most
weirdly Marcus seems to be responding that operationalizing a model or even a
success criterion is itself a waste of time! Quoting:

> I actually think benchmarks are to some degree the wrong way forward, and
> have said that for two decades. People need to take a step back, and reflect
> on where things stand, rather than rushing into the next bakeoff.

I couldn't disagree with this statement more, and it brings to my biggest
personal axe to grind about the "is this how we get to AGI" query: AGI itself
hasn't been defined in a way that is compatible with empirical discourse.
Machine Learning at a term in many ways exists to abandon "AI"'s association
with AGI, because the community realized that focusing on success at specific,
objectively measurable tasks would move them forward more effectively.

When people focused on AGI say that ML in its current form won't get there, my
answer is: "It's almost tautological that we're going to need new methods to
get to something called AGI, but we can't even discuss progress on it until
you give a measurable definition".

I don't blame folks (including Marcus) for wanting to continue discussing AGI
as an abstraction - maybe they will be the ones to find a good operational
definition! But it's weird and unscientific to say people shouldn't continue
work on other directions in the mean time, or to say that a specific technique
is essential before either the technique or the end goal has been effectively
defined.

[1] [https://deepmind.com/blog/neural-approach-relational-
reasoni...](https://deepmind.com/blog/neural-approach-relational-reasoning/)

~~~
woodandsteel
Thank you for your response.

------
twtw
I'm relatively unfamiliar with people who are actually arguing that deep
learning is all that is needed for AGI - Andrew Ng can say what he wants, but
do many of the researchers and engineers working with deep learning systems
actually think this way?

I tend to think of deep learning as adding additional sensory capabilities to
machines. In this sense, is a network misclassifying a school bus as a
snowplow significantly different from a GPS sensor generating garbage position
data due to multipath? All existing sensors are noisy and can get confused,
and yet no one says that GPS isn't useful because it can be so completely
wrong in urban canyons. Of course, no one says GPS is going to become sentient
- this brings us back to my first question above.

~~~
evrydayhustling
> do many of the researchers and engineers working with deep learning systems
> actually think this way?

No, this has been a straw-man argument, made repeatedly in popular press for
public recognition, for years. Even the NYT article to which Marcus responded
in 2012 [1] does not make the claims to which it responds! Here is the
sentence in Marcus' New Yorker post where it transitions to attacking a claim
nobody made (hint - look at where the quote ends):

> While the Times reports that “advances in an artificial intelligence
> technology that can recognize patterns offer the possibility of machines
> that perform human activities like seeing, listening and thinking,” deep
> learning takes us, at best, only a small step toward the creation of truly
> intelligent machines.

As a rule, the academic deep learning community avoids discussions and claims
about what constitutes a "truly intelligent machine", instead focusing on
results in specific applications which can be objectively measured.

I'm sure you can find both academics and public press that claim deep learning
is a panacea. But it's disingenuous punching up to continually imply that
leaders in the field are slow to acknowledge its limits.

[1] [https://www.nytimes.com/2012/11/24/science/scientists-see-
ad...](https://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-
deep-learning-a-part-of-artificial-intelligence.html)

~~~
AndrewKemendo
Very eloquently said.

A trend in the DL/ML research community is the strong and vocal distancing
from discussing or approaching the problem of General Intelligence or anything
which might look like sci-fi AI. You can see this with aggressive railing
against silly robots like Sophia and others, which have gotten press.

I think this is warranted, and understandable as the field is so scared of
another AI winter.

On the other hand it leaves the field without a real progress vector.
Especially given that broadly, ML research isn't scientific as such. In other
words, the vast majority of "research" is not around hypothesis testing and
using computing to test questions about intelligence. Rather it's
demonstrating increments of improvement within very narrow computing tasks.

I make that point not to diminish the work or results, as they've obviously
been astounding. Rather, to point out that unlike basic or social sciences
research it's not attempting to surface fundamental truths (theories, laws
etc...) about some natural or emergent phenomena, but rather continuously
benchmarking on narrow tasks - often without prior reference. Now that ILSVRC
is gone, I'm wondering what kind of benchmarking is going to emerge as the new
vector for the field.

------
mark_l_watson
Great read. I have 30 years combined experience in symbolic AI and neural
networks and my current day job is managing a deep learning team. I could not
agree more with: “”I think it is far more likely that the two — deep learning
and symbol-manipulation-will co-exist, with deep learning handling many
aspects of perceptual classification, but symbol-manipulation playing a vital
role in reasoning about abstract knowledge.””

My personal time projects are mostly combining symbolic AI and deep learning,
but I am still trying to find a non-Python solution, including Haskell
TensorFlow bindings, Armed Bear Common Lisp with DL4J, and exporting trained
Keras models to a Racket environment - all plausible hacking environments but
none feel ‘just right.’ If you are working on the same ideas please get in
touch with me.

~~~
huahaiy
Here is an attempt in Clojure
[https://m.youtube.com/watch?v=phA4bMjKvCY](https://m.youtube.com/watch?v=phA4bMjKvCY)

~~~
mark_l_watson
Thanks I am watching this right now.

EDIT: thank you so much, great talk! Even though the Juji system does not
directly link TensorFlow models into Clojure, the architecture of using Kafka
to manage low level black box model calls (they treat trained models as
functions as I do in my Racket hacks) is a good idea. I also like the
rule/template language they designed. Great work.

------
YeGoblynQueenne
It's interesting to note here that the early years of machine learning (like
the first couple of decades of it) were almost exclusively dedicated to
learning symbolic rules, primarily in propositional logic. So if you pick up
older machine learning papers, from around the '70s and '80s, you can see that
the "models" are almost always propositional logic rules [1].

To cut a very long story short, machine learning took off in part as an
attempt to automate knowledge acquisition for expert systems, the dominant AI
paradigm at the time. Somehow, for reasons that are not entirely clear to me,
the goal of learning the rules for a rule-based system was abandoned and since
then research has focused almost exlusively on just learning.

As a result of this, most machine learning work today does not consider what
one can do with a trained model. And yet, just having, say, an object
classifier, is not very useful on its own. For example, a robot car must be
able to take decisions based on the objects its machine vision algorithms
identify. Although there are learning techniques that consider this particular
problem, i.e. navigation, there is little work on general inference and
reasoning over the output of trained models.

And this is not a symbolic vs statistical thing. I work in symbolic machine
learning, and we are pretty rubbish at that, too. It's like we have lots of
little pieces of a puzzle all lying around and nobody is really trying to put
them together. Instead, we each carve our little (or bibger) niche and pretend
nothing exists outside of it.

Why? It's a mystery to me. Gary Marcus is doing good to try and shake up the
field a bit. We need to move on from our successes just as we move on from our
failures. And he's damn right about the how, too: symbolic AI is due for a
comeback.

____________________

[1] For instance, you can see that in the foundational theoretical text for
modern machine learning, Leslie Valiant's _A Theory of the Learnable_ where
the learning framework is described in terms of the propositional calculus,
with a concept represented as a predicate that recognises vectors of boolean
variables as members of itself or not.

And of course, probably the most famous representative of propositional logic
learning is the class of algorithms we probably all know as decision trees;
they learn disjunctions of conjunctions - propositional logic rules.

~~~
eli_gottlieb
>Why? It's a mystery to me. Gary Marcus is doing good to try and shake up the
field a bit. We need to move on from our successes just as we move on from our
failures. And he's damn right about the how, too: symbolic AI is due for a
comeback.

How would we define "symbolic", though? I'm about to give a presentation on
some related matters next week, and I've been trying to think about how to
avoid just saying, as Marcus is perceived to do, "lol psychology critiques AI
so AI needs to use symbols."

For example, there are loads and loads of _symbolic AI_ techniques which don't
demonstrate basic features of "symbolic" human cognition such as causal
inference and productivity/compositionality, on top of symbolic techniques
having essentially no way to address the Frame Problem.

------
sytelus
This is your usual Gary Marcus if you have been frequenting his social media
outlets. As far as I can tell, no one in the field is disputing his core
thesis that deep learning is not “AI” and there is fundamental issue that deep
learning doesn’t do causal reasoning and it’s rather just fancy way to do
function approximation. There have been plenty of papers on this including
from Bengio and there is now entire new field of adversarial deep learning
which is academic way of making fun of how it miserably fails.

The root cause of said Twitter firestorm was Marcus’s accusation that world’s
talent and money is moving en mass in deep learning research which is likely
to turn out dead end for achieving “real” AI. Even worse, many uninformed
decision makers think AI is already a done deal due to all the media hype.

The counter point to his accusations were that deep learning is what works now
and pretty well for many real world problems so we shouldn’t be putting it
down. It is likely that deep learning might become integral part of broader
some Artificial General Intelligence framework in future. So no harm in keep
improving it. This would be especially be important when most of the accusers
don’t actually have realistic alternative with much to claim.

~~~
YeGoblynQueenne
Well, the stiffling of funding to symbolic AI is a real danger. The field just
keeps shrinking. And it's not just funding. Many people entering machine
learning today would simply not know a predicate if it leaped up and bit them
in the existential quantifiers. They also have no idea that the field they are
now joining en masse actually started out with learning in the propositional
calculus, neither do they understand why it then moved to statistics.

In short, there is a real risk that researech funding will go to support
research that doesn't know its elbow from its knee, done by people who don't
understand what they are doing (because they don't understand what was done in
the past). That is not a rosy situation.

I get the counter point- I'm myself a bit annoyed by Marcus' insistence on how
deep learning is not true AI. Very few people really expect to see "true AI"
in our lifetimes. A classifier that works well is probably the best we can do
and it's very reasonable that there is such excitement about the fact that we
can do it. But if we spend too much effort developing classifiers, well, we'll
have great classifiers. And nothing else.

And then- what?

------
buboard
if this is the tweet referred to:

[https://twitter.com/tdietterich/status/948811917593780225](https://twitter.com/tdietterich/status/948811917593780225)

[https://cdn-images-1.medium.com/max/800/1*W5WhToR_WP4I74Lo4u...](https://cdn-
images-1.medium.com/max/800/1*W5WhToR_WP4I74Lo4uJ-1Q.png)

Lecun didn't say he is not allowed criticize. He said that his contributions
are zero but his criticisms abound.

Also some of the arguments are odd: It is not possible to show the limits of
deep learning because people don't know how to prove what those limits are. If
there was a "limits of the universal approximation theorem" paper^, then
people would use that to derive the limits of their DL systems. OTOH he
proposes a number of possible implementations for symbols for which the
evidence is not there, and there is no theoretical reason that necessitates
their existence. Both arguments are not really falsifiable.

^ Actually there is at least one, however it seems it requires knowledge about
the smoothness of the approximated function.
[https://www.sciencedirect.com/science/article/pii/S0888613X0...](https://www.sciencedirect.com/science/article/pii/S0888613X03000215)

------
visarga
I think the missing part - symbolic reasoning - can be handled by graph neural
nets and attention mechanisms. Graph neural nets take as input a set of
objects and their relations and predict information about the objects, their
relations and the whole graph. They learn invariance in relation space, while
being permutation invariant in the object space. This kind of invariance is
what we need to improve on. We already have some promising attempts such as
the GCN. This and creating environments for artificial agents - a child AI
needs a place for playing and experimentation, not a mere static dataset.

------
khiner
My main takeaway: “How can deep learning be good if it’s not symbolic
reasoning, you know, like scientists have proven brains work like? We think in
symbols, symbols have inherent meaning, why aren’t deep learning learning
symbols?! I’ve been saying exactly this for 20 years now! You know, all those
deep learning hot shots think they’re so cool, but have they anticipated
symbols like I have? Symbols won’t just allow us to play Atari games from raw
pixel inputs, they will give us general AI. Somebody please write a paper
showing how a symbol based algorithm of some kind is generally intelligent,
and also that any connectionist-leaning mechanism is orthogonal to complex
adaptive agent behaviors!”

------
woodandsteel
Here's my view on this controversy (from someone who is not involved in the
field).

Historically there seem to be three views. One is the classic symbolic
manipulation view, the second is what seems to be the ml view, and then there
is the hybrid view that Marcus advocates.

The first two views seem to have in common believing that their single
approach alone would be sufficient for AGI. Marcus thus differs from both of
them in that he thinks neither is sufficient.

Assuming Marcus is taking the right approach, the question then becomes
whether the hybrid approach could reach AGI. In my opinion it could not,
though it probably would be able to out-think human beings in some important
ways.

------
Invictus0
I'm intrigued by the rotated schoolbus images. I wonder if the failure of the
computer is due to the method of presentation. The schoolbus images are
fundamentally two different things: an object, and a background. Humans know
this intuitively: we understand the rules of 3D reality and extrapolate them
unconsciously to all the objects therein. A computer is presented with just
one thing: an array of pixels. Can we really expect a neural network to
understand depth, scale, motion, and rotation when all it knows is two
dimensions?

~~~
mr_toad
> Can we really expect a neural network to understand depth, scale, motion,
> and rotation when all it knows is two dimensions?

Frozen images at that. I’d expect better results from systems trained on
videos of moving objects.

------
strin
It depends on how "deep learning" is defined. If it's neural networks, the
answer is likely no. but a broader definition of feature learning or "end-to-
end" learning makes sense. It's likely for AGI what we need is a good, shared
representation across multiple tasks, and continual learning paradigm that
keeps building up new skills.

------
bra-ket
>"And it’s where we should all be looking: gradient descent plus symbols"

how about no gradient descent at all

~~~
lkrubner
The problem is the exponential explosion of possibilities. Fairly amateur
Python code can work through a problem in 10 minutes which, without gradient
descent, would literally take a thousand years.

~~~
bra-ket
our brain is not doing brute force search

~~~
eli_gottlieb
Right, it's doing stochastic search through a Bayesian posterior distribution.
The trouble is that established inference techniques can't really handle very
general, "human-level" model classes in real time, computationally.

~~~
throwaway2048
We have no idea if that is true or not.

------
paraditedc
As someone with only basic idea of deep learning, I want to ask this question:

If deep learning is mimicking how our brains work, wouldn't it be
theoretically as powerful as our brains? In terms of architecture and
complexity of the model.

~~~
mikehollinger
There’s no such thing as a free lunch. In the case of deep learning - we’ve
figured out as an industry how to apply a particular model of linear algebra
to solve really interesting problems. It does way better than you’d guess in
particular spaces. Hotdog or not hotdog is a perfect example of this. I
actually (for fun) pulled several thousand pictures of hotdogs from the MS
COCO dataset and trained a model to do that. It did well - but what I’d
actually created wasn’t a hotdog detector - is created a bun detector. I could
prove this by giving it an image of a sandwich and the model would confidently
score it as a hotdog.

We aren’t mimicking how our brains work in any broad sense. At best it’s a
very narrow definition.

In this case - the larger the matrices - the more you can do. There’ll
probably be a Moore’s law of AI at some point.

~~~
Baeocystin
Sounds like a real example the apocryphal tank/shadow detector.

[https://www.gwern.net/Tanks](https://www.gwern.net/Tanks)

~~~
gwern
If you showed a human who'd never heard of hotdogs before thousands of photos
of hotdogs, and every single one of them had buns, why do you think they would
assume that a naked wiener would be the true definition of hotdog, rather than
'bun with meat/filling'?

~~~
Baeocystin
>and every single one of them had buns

Why would you assume that? I did a cursory GIS and Bing search, and somewhere
between 1-5% of the returned images were bunless wieners.

~~~
gwern
He didn't use GIS/Bing, or anything derived from them like WebVision. He used
MS COCO. And if it had really returned 5% naked wieners, the results he
described would be a lot less likely.

~~~
Baeocystin
[http://cocodataset.org/#explore](http://cocodataset.org/#explore)

Literally the first image returned for hot dogs is a bunch of bunless ones on
a grill. There are plenty more.

~~~
gwern
Then maybe he didn't train it as well as he thinks he did.

For comparison, I tried out Clarifai's image classification API:
[https://clarifai.com/demo](https://clarifai.com/demo) It has no problem
classifying some bunless ones on a grill ([https://www.hot-
dog.org/sites/default/files/2016-09/sausage%...](https://www.hot-
dog.org/sites/default/files/2016-09/sausage%20on%20grill.jpg)) as being
sausage/pork/beef/hotdog, with nary a 'sandwich' to be seen, while conversely,
a Big Mac
([https://d1nqx6es26drid.cloudfront.net/app/uploads/2015/04/04...](https://d1nqx6es26drid.cloudfront.net/app/uploads/2015/04/04043402/product-
big-mac.png)) gets sesame/burger/lettuce/sandwich/bun/bread tags with no
sausage/bratwurst/hotdog to be seen. Clarifai's NNs seem to do fine on 'hot
dog or not hot dog'...

