
What's Wrong with Deep Learning? - jessehu
https://docs.google.com/file/d/0BxKBnD5y2M8NVHRiVXBnOVpiYUk/edit
======
discardorama
If someone were to ask me what's wrong with DL (and not that anyone would,
since I'm an unknown), I'd say the lack of theory. Most DL results look very
hacky to me. Someone says Max Pooling works; someone again comes along and
says it's not necessary. Someone says sigmoid or tanh are the best activation
functions; someone else says ReLUs are better. And so on. Why? Why is one
better than the other?

I'm no biologist, but I don't think our brains are going around trying to do a
grid search for the best hyperparameters. Most DL results today are the result
of throwing 1000s of Titans on the problem and then sitting back for a week
for the beast to cough up a solution.

Tangential nitpick: one (very minor) nit I have with Prof LeCun's
presentations is that I don't see him give more credit to Hinton and
Schmidhuber. Hinton is mentioned a couple of times (3), but Schmidhuber is
totally ignored; for example, when he mentions LSTM, it's cited as [Hochreiter
1997], even though it was a join publication with Schmidhuber. It should be
cited as [Hochreiter et 1997], as he does in the very next line.

~~~
Houshalter
Max pooling tests if a feature occurs anywhere in a certain area, rather than
being sensitive to the exact location.

ReLUs fit combinations of piece-wise linear functions. Whereas sigmoids are
more nonlinear and can be harder to optimize. They were originally continuous
approximations of binary threshold functions.

All these things can approximate each other. Neurons can approximate the max
function, and ReLUs can approximate sigmoids. So there really isn't much to
fret over.

It's like asking for a theory of which programming language is better. In
practice they will have different advantages in different domains, but they
are all Turing complete.

~~~
joe_the_user
_It 's like asking for a theory of which programming language is better._

There's nothing wrong with just ignoring programming-language theory and just
deciding on one, seat of the pants style. But this is because programming as
it exists now is a static "art form" with only marginal progress expected.

However, assuming deep learning currently works unexplainably well and one
aims to scientifically explain that good working, one would want an
explanation which guides one's approach to extending the process.

I've done a bit of applied math, where knowing which kind of function to pull
out of one's toolbox for which situation was the really-smart-people's
purview, a fairly well guarded folk-knowledge, actually. I'm used to the
"little bit of this, little bit of that" kind of explanation for which
functions to use when and why. If one weighs them long enough, I assume one
can intuitively figure out what to do.

But if we're aiming to advance fundamentally beyond the state-of-the-art, we
would aim to _quantify_ these advantages and disadvantages, to automate one
more layer. So here we really should know and have a "real" theory here.

~~~
nightski
Do you think no one is trying? Should researchers just ignore all results
until the underlying theory is found? What if we don't find it for another 50
years? I find it incredibly hard to be critical in this situation.

------
jhartmann
It is always impressive to me how both Professor LeCun, Bengio & Hinton stuck
to their guns and worked on these problems while others were not so
interested. I'm very glad Hinton's group at DNN Research was able to blow away
the competition in the Imagenet challenge. Now many more people are working on
these ideas and some really amazing things have been accomplished in a very
short time. I can't wait to see where we are in five years, and I love that
LeCun's points out the areas where we should focus and what we are not good at
yet.

~~~
fizixer
Do you have any information about what were the hot topics just before this
new resurgence of NNs (meaning around late 90's, and early 2000's).

I know symbolic AI was big in 60s, and 80s, but not sure about recent past.

~~~
oergiR
Probabilistic models. Recent research often focuses on Bayesian models.

Probabilistic models have never really gone away. This presentation by LeCun
actually suggests embedding neural networks inside of various types of
probabilistic models: factor graphs and conditional random fields. This is,
for example, how speech recognition works: the output of a neural network is
fed into a probabilistic model (a hidden Markov model).

~~~
jhartmann
Actually, state of the art speech recognition has switched over to having a
Recursive Neural Network directly run over the audio input. Take a look at the
paper at [http://arxiv.org/abs/1412.5567](http://arxiv.org/abs/1412.5567) and
[http://usa.baidu.com/deep-speech-lessons-from-deep-
learning/](http://usa.baidu.com/deep-speech-lessons-from-deep-learning/)

However combining learning features with other systems is a very powerful
approach and combining SVM's on top of the learned features of a Neural
Network I would say is common. I personally am more interested in approaches
like Deep Fried Convnets
([http://arxiv.org/abs/1412.7149](http://arxiv.org/abs/1412.7149)) that
combine kernel methods as part of the Neural Networks themselves.

~~~
agibsonccc
Not to nitpick. I just want people to realize there are actually recursive
nets that rely on a parser to be built (this is the recursive net that relies
on backpropagation through structure). Then there is the recurrent net
(LSTMs,multimodal) that rely on backpropagation through time.

Talking to some of the users of Recursive nets, they will be renaming them to
tree rnns which should help clear up confusion a bit.

------
Animats
Is that document available in some standard format? The player that's playing
it from Google Docs is buggy, and about 20% of the slides display an error
message.

"10:01:47.662 Cross-Origin Request Blocked: The Same Origin Policy disallows
reading the remote resource at
[https://drive.google.com/viewerng/img?id=ACFrOgBySwSrGvI-
XLL...](https://drive.google.com/viewerng/img?id=ACFrOgBySwSrGvI-
XLLt8yzMipqv7ffA6jN_fFpTRl88JGL9KzU39f4S0oroqq45kVAZ7staj8oYE6yPEsqdHD6S8r09_m-
ZRpYjNTUwNlILDrA2-H463PBLY9LZSb8=&w=2000&page=46). (Reason: CORS header
'Access-Control-Allow-Origin' missing).1 <unknown> "

~~~
ipsin
Look to the nav bar at the top of the page. The source file is (purportedly) a
PDF and there's a download button.

------
sgt101
The final 20 or so slides about building general AI using deep learning strike
me as really interesting. Seymour Papet said that if you can fit concepts into
your cognitive architecture then they are learned. I think that this part of
the presentation speaks to a need to demonstrate this as "learning" proper.
It's strange, because I believe that this needs to happen, but the idea that
you would do it with an "all nn all the way down" architecture, rather than
breaking out into a symbolic layer a-la SOAR just seems odd.

------
codewithcheese
Wow what in incredible amount of knowledge in those slides. Is there a video
of the keynote?

~~~
sjtrny
CVPR usually makes all talks available when the conference has concluded.
Watch this page
[http://www.pamitc.org/cvpr15/](http://www.pamitc.org/cvpr15/).

------
chestervonwinch
I would like to see or hear more regarding the theory slides - in particular
on the objective being a piecewise polynomial, and the distribution of weights
using random matrix theory. Anyone know where I could find more?

~~~
selimthegrim
[http://arxiv.org/abs/1412.0233](http://arxiv.org/abs/1412.0233) (and the
references within he cites by Gerard Ben Arous)

------
kragen
For some reason, most of these slides say, "¡Vaya! Hubo un problema para
cargar la página." Is there a better URL, maybe with the PDF itself?
([https://doc-04-4c-docs.googleusercontent.com/docs/securesc/h...](https://doc-04-4c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/0ptf7tt7v8qkg9j40pi37h2cvhm14jn8/1434326400000/06392201561352539427/*/0BxKBnD5y2M8NVHRiVXBnOVpiYUk?e=download)
doesn't look like it's going to work reliably for other people.)

~~~
abecedarius
I downloaded the pdf; will mail you.

------
Animats
See slides 134-135. It's amazing that works. They get induction without any
understanding at all. It still gets the right answer.

Intelligence may be dumber than we thought it was.

------
rlucente
I have attempted to put together the math stack for deep learning at

[http://rlucente.blogspot.com/2014/08/deep-learning-
mathemati...](http://rlucente.blogspot.com/2014/08/deep-learning-mathematical-
stack.html)

------
aswanson
Is there a volume summary of research papers or recommended book(s) for DNN?

