
Scientists See Promise in Deep-Learning Programs - mtgx
http://nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html
======
bravura
Since I see some misunderstanding about deep learning, let me explain the
fundamental idea: It's about reusing intermediate work.

The intuition is let's say I told you to write a complicated computer program.
Let's say I told you that you could use routines and subroutines, but you
couldn't use subsubroutines, or deeper levels of abstraction. In this
restricted case, you _could_ write any computer program, but you would have to
use a lot of code-copying. With arbitrary levels of abstraction, you could do
code reuse much more elegantly, and your code would be more compact.

Here is a more formal description: If you have a complicated non-linear
function, you can describe it similarly to a circuit. If you restrict the
depth of the circuit, you can in principle represent any function, but you
need a really wide (exponentially wide) circuit. This can lead to overfitting.
(Occam's Razor) By comparison, with a deep circuit, you can represent
arbitrary functions _compactly_.

Standard SVMs and random forests can be shown, mathematically, to have a
limited number of layers (circuit depth).

It turns out that expressing deep models using neural networks is quite
convenient.

I gave an introduction to deep learning in 2009 that describes these
intuitions: <http://vimeo.com/7977427>

~~~
tgflynn
_If you restrict the depth of the circuit, you can in principle represent any
function, but you need a really wide (exponentially wide) circuit._

Are you sure it's exponential ?

If you look at binary functions (ie. boolean circuits) any such function can
be represented by a single layer function whose size is linear in the number
of gates of the original function (I think it's 3 or 4 variables per gate) by
converting to conjunctive normal form.

Of course it's not obvious that a similar scaling exists for non-binary
functions but I'd be a bit surprised if increasing depth led to an exponential
gain in representational efficiency.

~~~
bravura
I am not _sure_ in the sense of: If I were dropped on a desert island, I could
derive a water-tight proof of this result from scratch.

I am _confident_ , though, based upon my reading of secondary sources written
by people that I trust.

From one of Bengio's works
(<http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf>): "More interestingly,
there are functions computable with a polynomial-size logic gates circuit of
depth k that require exponential size when restricted to depth k − 1 (Hastad,
1986)."

~~~
tgflynn
I think my argument was mistaken. The CNF form I was thinking of involves
adding unknown variables so it doesn't actually allow you to compute the
function in one step.

------
claudiusd
I'm not an expert on this, but I think this article overstates the
relationship between "deep learning methods" and "neural networks". Neural
nets have been around forever and, in the feed-forward case, are actually
fairly basic statistical classifiers.

Deep learning, on the other hand, is about using layers of classifiers to
progressively recognize higher-order concepts. In computer vision, for
example, the first layer of classifiers may be recognizing things like edges,
blocks of color, and other simple concepts, while progressive things may be
recognizing things like "arm", "desk", or "cat" from the lower-order concepts.

There's a book I read a while ago that was super-interesting and digs in to
how one researcher leveraged knowledge about how the human brain works to
develop one of these deep learning methods: "On Intelligence" by Jeff Hawkins
([http://www.amazon.com/On-Intelligence-Jeff-
Hawkins/dp/B000GQ...](http://www.amazon.com/On-Intelligence-Jeff-
Hawkins/dp/B000GQLCVE/))

~~~
wookietrader
No.

All currently used deep learning algorithms are special cases of neural
networks. The reason why this is called "deep" learning is that before 2006,
no one knew how to efficiently train neural nets with more than 1 or 2 hidden
layers. (Or could, because of computing power.) Thanks to a breakthrough by Dr
Hinton, this is now the case.

But all models used are neural nets. It's just that a vast amount new
algorithms for training them have been developed in the last years and people
came up with new ideas on how to use them.

But it is all neural nets. And that's the whole beauty of it.

~~~
sjg007
What was the breakthrough?

~~~
wookietrader
The breakthrough was the insight that while you cannot train a deep neural net
at once with backprop, you can train one layer after the other greedily with
an unsupervised objective and later fine tune it with standard backprop.

Years later, Swiss researchers (Dan Ciresan et al) found that you can train
neural nets with backprop, but you need lots of training time and lots of
data. You can only achieve this by making use GPUs, otherwise it would take
months.

~~~
iskander
You can't train fully connected deep models with backprop, or at least not
easily or well. An alternative solution to this problem is spatial weight
pooling (Yann's convolutional networks) which play well with SGD.

~~~
wookietrader
Yes you can.

Check out the publications by Ciresan on MNIST, have a look at Hinton's
dropout paper or at the Kaggle competition that used deep nets. Or try it
yourself and spend a descent amount of time on hyper parameter tuning. :)

~~~
iskander
Which of Ciresan's projects are you referring to? Everything I've seen by him
uses convolutional layers of some sort.

------
theschwa
Geoffrey Hinton, mentioned in the article, has his class on neural networks
available on Coursera <https://www.coursera.org/course/neuralnets>

~~~
djacobs
Hinton was one of the people who invented backpropagation, which has let
neural nets be as powerful as they are today. Somehow, despite his brilliance
and intimate familiarity with backpropagation, his explanation of it is
stunningly clear and simple. I'm thoroughly enjoying this course and recommend
it to anyone who wants to build their own neural networks.

~~~
T-A
"If you can't explain something simply, you don't know enough about it. You do
not really understand something unless you can explain it to your
grandmother." - Some German dude :)

~~~
hcrisp
C.S. Lewis? "Any fool can write learned language: the vernacular is the real
test. If you can't turn your faith into it, then either you don't understand
it or you don't believe it."

------
dave_sullivan
For those looking to learn about these techniques, I'd highly recommend the
deep learning theano tutorials.

Hinton has a class on Coursera--I think it would be very confusing for
beginners, but it has really great material.

Also, I run the "SF Neural Network Aficionados" meetup in san francisco and
will be giving a workshop in January about building your own DBN in python, so
feel free to check that out if you're in SF (although space was an issue last
time).

~~~
stewie2
How is "deep learning" different from "neural network"?

~~~
Yoshua
The idea of having multiple levels of representation (deep learning) goes
beyond neural networks. A good example is the recent work (award-winning at
NIPS 2012) on sum-product networks, which are graphical models whose partition
function is tractable by construction. Several important things have been
added since 2006 (when deep learning was deemed to begin) to the previous wave
of neural networks research, in particular powerful unsupervised learning
algorithms (which allow very successful semi-supervised and transfer learning
- 2 competitions won in 2011), often incorporating advanced probabilistic
models with latent variables, a better understanding (although much more
remains to be done) of the optimization difficulty of training gradient-based
systems through many composed non-linearities, and other improvements to
regularize better (such as the recent dropouts) and to rationally and
efficiently select hyper-parameters (random sampling and Bayesian
optimization). It is also true that sheer improvements in computing power and
amounts of training data are in part responsible for the impressively good
results recently obtained in speech recognition (see recent New York Times
article, 24 nov, J. Markoff) and object recognition (see NIPS 2012 paper by
Krizhesky et al).

------
gdahl
I was involved in the speech recognition work mentioned in the article and I
led the team that won the Merck contest if anyone has any questions about
those things. I also spend some time answering any machine learning question I
feel qualified to answer at metaoptimize.com/qa

~~~
lbenes
Congratulations on winning the Merck contest! That was an impressive
demonstration.

About 12 years ago, I switched from a Bio major to CS. I hoped to major in AI,
but after taking 2 upper level classes, one focusing on symbolic AI and the
other focusing on Bayesian networks, I was completely turned off.

Our brains are massively parallel redundant systems that share practically
nothing in common with modern Von Neumann CPUs. It seemed the only logical
approach to AI was to study neurons. Then try to discover the basic functional
units that they form in simple biological life forms like insects or worms.
Keep reverse engineer brains of higher and higher life forms until we reach
human level AI.

Whenever I tried to relate my course material in AI to what was actually going
on in a brain, my profs met my questions with disdain and disinterest. I
learned more about neurons in my high school AP Bio class than either of my AI
classes. In their defense, we've come a long ways, with new tools like MRIs
and neural probes.

The answers are all locked up in our heads. It took nature millions of years
of natural selection to engineer our brains. If we want to crack this puzzle
in our lifetimes, we to copy nature, not reinvent it from scratch. Purely
mathematical theories like Bayesian statistics that have no basis in
Biological systems might work in specific cases, but are not going to give us
strong AI.

Are these new deep learning algorithms for neural networks rooted in
biological research? Do we have to necessary tools yet to start reversing
engineering the basic functional units of the brain?

~~~
fuelfive
We think so (<http://vicarious.com/>), but we are obviously biased.

------
taliesinb
I played around for a while with writing an RBM learner in Go (RBMs are a
particular instance of deep learning which Hinton specializes in).

More an experiment than anything else, but for anyone who is interested:
<https://github.com/taliesinb/gorbm>. I don't claim there aren't bugs, and
there is no documentation.

The consensus I've picked up from AI-specializing friends is that there are a
lot of subtle gotchas and tricks (which Hinton and friends know about but
don't necessarily advertise) without which RBMs are a non-starter for many
problems. Which I suppose is pretty much standard for esoteric machine
learning.

------
jhartmann
Deep belief networks are extremely powerful, we are finally getting to the
point where we don't need to do tons of feature engineering to make useful
complex classifiers. Used to be you would have to spend a ton of time doing
data analysis and feature extraction to get useful and robust classifiers. Of
course the usefulness of those sorts of networks were limited by how well you
did the feature extraction. Now you train networks with much more minimally
processed data, and get great results out of them.

------
mbq
Since the fall of AI, there are two groups of people in this topic -- one
trying to make some reproducible, robust results with well defined algorithms
and second importing random ideas from the first group onto some questionably
defined ANN model and getting all the hype because of the "neural" buzzword.
"Deep learning" is actually called boosting and has been around for years.

~~~
robrenaud
Unsupervised pre-training is fundamentally different than boosting.

Boosting is a clever way of modelling a conditional distribution. The insight
behind the success of pre-training is that, for many perceptual tasks, having
a good model of the input (rather than the input->output mapping) is key.

I have no delusion that the algorithms that work for training deep networks
are anything like what the brain actually does, but I don't care. There are
many tasks where deep neural nets are state of the art.

~~~
osdf
Not to argue with you, robrenaud, but Hinton himself writes in their 2006
paper 'A Fast Learning Algorithm for Deep Belief Nets':

 _The greedy algorithm bears some resemblance to boosting in its repeated use
of the same “weak” learner, but instead of reweighting each data vector to
ensure that the next step learns something new, it re- represents it._

I guess that most people however would not think of this interpretation of
greedily pretraining deep networks :). (I wonder if mbq had this in mind).

In the same article your point about good models of the input is mentioned,
too (only copy&paste a small part of the paragraph):

 _Unsupervised methods, however, can use very large unlabeled data sets, and
each case may be very high-dimensional, thus providing many bits of constraint
on a generative model._

The 2006 paper is really an amazing read in my opinion.

------
mturmon
Pegged to the NIPS conference next week: <http://nips.cc/Conferences/2012/>

------
Create
_The students were also working with a relatively small set of data;_

ANN-s are overfitted more often than not.

------
radarsat1
Are there any good C++ or Python SciPy libraries for building and training
deep learning networks?

~~~
jhartmann
There is a C++/Cuda library with a python frontend that I am starting to play
with that is from one of the guys who works with Hinton. It is written by Alex
Krizhevsky and has lots of tools for training feed forward networks with lots
of different connection topologies and neuron types. If I am not mistaken this
was the library that was used in the recent Kaggle drug competition that is
referenced in the article. There is some good starting point documentation
here as well to look into, as long as you know enough about the mechanics of
Artificial Neural Networks it has some really interesting stuff in there.

Here is the link: <http://code.google.com/p/cuda-convnet/>

------
teeja
Is there a good place to plug-in to get an overview of what (has been and is)
going on in this area, without having to dive in all the way? An overview of
the concepts, not the nuts and bolts, not the heavy-lifting.

~~~
pilooch
the one overview I've found the most useful is
<http://www.youtube.com/watch?v=ZmNOAtZIgIk> (Bay Area Vision Meeting:
Unsupervised Feature Learning and Deep Learning by Andrew Ng in April 2011).

------
mikecane
Can someone contrast what's in that article with what Jeff Hawkins' Numenta is
attempting?

