

A Brief Overview of Deep Learning - ozansener
http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html

======
amelius
I found this to be insightful:

> ... human neurons are slow yet humans can perform lots of complicated tasks
> in a fraction of a second. More specifically, it is well-known that a human
> neuron fires no more than 100 times per second. This means that, if a human
> can solve a problem in 0.1 seconds, then our neurons have enough time to
> fire only 10 times --- definitely not much more than that. It therefore
> follows that a large neural network with 10 layers can do anything a human
> can in 0.1 seconds.

~~~
svantana
This seems wrong. My understanding is that the ~200 spikes/second limit
derives from the cell needing to "reload" before firing again, rather than
some built in latency. Relaying a spike can be very quick indeed. A better
conclusion would be that we don't have time for too many recursions in that
short a time. Also, I can't think of any particularly hard problem that humans
can solve in 0.1 seconds (see e.g.
[http://en.wikipedia.org/wiki/Hick%27s_law](http://en.wikipedia.org/wiki/Hick%27s_law))

~~~
bnegreve
> Also, I can't think of any particularly hard problem that humans can solve
> in 0.1 seconds

Recognizing someone is a hard problem that most humans can do efficiently
(except me maybe)

------
anjc
Good article, but I still don't understand why they're suddenly popular again.
So processing is faster, but is there some development in processing which has
improved this domain especially? Developments in "big data" processing?
Concepts like mapreduce? What's with the resurgence :S

~~~
discardorama
There are a myriad reasons why they're popular again.

1\. Hardware has caught up, and is cheap. When Backprop was invented back in
the 80s, you couldn't train networks with more than a couple of 1000 nodes
tops. Today, with GPUs, you can train networks with billions of parameters.

2\. More data is available. Back in those days, you had a few dozens (maybe a
few 100s) of examples in your training set. Today, people play with sets
larges than 1TB.

3\. Dramatic successes. For a while, the ImageNet competition was seeing slow
and stead progress. Then DL comes along, and there's a 20% jump in performance
(I'm too lazy to look up the exact numbers...). If you've ever competed in
such competitions, progress is painfully slow (see, for example, the Netflix
competition). So a jump of that magnitude in performance in 1 step is mind-
blowing. On top of that, every year since then, the performance has increased
significantly.

These are just 3 that come to mind.

------
dharma1
I've been playing with Caffe for recognising images. It's kind of mind blowing
how well it works. Yet the networks I tested could "only" recognise photos,
not drawings or anything abstract.

A human could easily attribute meaning to a drawing, even if the drawing was
very abstract or she had never seen a similar drawing before. Whereas a deep
networks seem to rely on visual similarity to things it has seen in the past,
on a pixel level. The networks I tried could tell something was a cartoon, but
not what the cartoon depicted, even if it's something simple like a face.

The deep networks I tried also really struggled with recognising different
textures. Like closeups of sand, water etc, things that a human would
instantly recognise. They could classify it as a texture but not what kind of
texture.

~~~
Houshalter
NN's have been able to represent fairly abstract art, e.g.
[http://i.imgur.com/HU66Vo7.png?1](http://i.imgur.com/HU66Vo7.png?1)

They can also generate abstract images when the images are optimized to be
recognized by the NN:
[http://i.imgur.com/Mixk96V.png?1](http://i.imgur.com/Mixk96V.png?1)

I think it's likely that cartoons contain a lot of meaning and symbols that is
specific to human culture. Imagine a stick figure in the simplest case. It's
not obvious that a circle and sticks should be a person. Same with a lot of
other cartoon features that look nothing like reality.

------
joyofdata
> therefore follows that a large neural network with 10 layers can do anything
> a human can in 0.1 seconds.

very funny ... as if ANNs are sufficiently comparable to actual neural
activity. also I think it is naive to assess the "powerful"-ness of the brain
to what is going on in a single neuron - it is certainly the parallel
interaction which creates the human intelligence.

> And if human neurons turn out to be noisy (for example), which m...

it is pretty naive to consider noise as something of only handicapping nature
- a lot of algorithms are as powerful as they are by utilizing noise and
stochasticity

> What is learning? Learning is the problem of finding a setting of the neural
> network’s weights that achieves the best possible results on our training
> data.

Wrong - this is memorizing ... learning is the process leading to a low _out-
of-sample_ error.

~~~
lars
This post is written by Ilya Sutskever, who has co-authored some of biggest
breakthroughs in machine learning the last five years. Which do you think is
most likely: a) That he has a naive understanding of machine learning and
neuroscience, or b) that this was written informally, and without guarding
against every possible way it can be misinterpreted. Please be a little
charitable when interpreting other peoples writings.

~~~
joyofdata
just curious - can you give an example for a big breaktrhough he co-authored?

nonetheless - some of his remarks are very specific and I don't see how
informal style applies here to excuse them.

~~~
lars
He was second author on the AlexNet paper, wherein Alex Krizhevsky, Sutskever
and Hinton blew everyone else out of the water on the ImageNet competition
[2]. Their error rate was about 10 percentage points lower than others.
Relatively speaking they had about 40% fewer errors than anyone else. This is
possibly the biggest result in computer vision the last five years. So it
seems a little silly to educate him on the basics of machine learning :)

[1]
[http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf](http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf)

[2] [http://www.image-
net.org/challenges/LSVRC/2012/results.html](http://www.image-
net.org/challenges/LSVRC/2012/results.html)

~~~
joyofdata
well thanks for the info - but then I shift my critique to that I find it
unnecessary to distort ML and biological concepts just to simplify the
subject, when an accurate depiction wouldn't be much more difficult.
Especially to not differentiate properly between memorization and
generalization/learning is odd b/c this is one of the most prominent mistakes
- it is specifically not the goal to minimize the in-sample-error! that would
lead to very bad results most of the time

~~~
p1esk
Actually, Ilya explains his statement regarding minimizing training errors in
his comment exchange with Bengio:

"Although I didn't define it in the article, generalization (to me) means that
the gap between the training and the test error is small. So for example, a
very bad model that has similar training and test errors does not overfit, and
hence generalizes, according to the way I use these concepts. It follows that
generalization is easy to achieve whenever the capacity of the model (as
measured by the number of parameters or its VC-dimension) is limited --- we
merely need to use more training cases than the model has parameters / VC
dimension. Thus, the difficult part is to get a low training error."

------
sadkingbilly
"so I implemented a small neural network and trained it to sort 10 6-bit
numbers, which was easy to do to my surprise"

Does anyone know what the inputs and outputs of a neural network that sorts
numbers would look like?

~~~
fchollet
Input: a 60-dimensional vector that is the concatenation of 10 6-dimensional
binary vectors encoding the binary representation of the input numbers.

Output: the same, sorted.

At least that's one dead simple way to formulate the problem, multiple other
solutions would work as well, and some would probably work better.

~~~
primaryobjects
I started playing with this. I take each digit and normalize it by dividing by
9. Then use each normalized digit as an input:

Example sorting 987654 and 123456

    
    
      Input: 1, .9, .8, .7, .6, .5, .2, .3, .4, .5, .6, .7
      Expected output: .2, .3, .4, .5, .6, .7, 1, .9, .8, .7, .6, .5
    

You can then encode/decode the inputs and outputs accordingly. if (value <= 1)
digit = 9; if (value <= 0.9) digit = 8; ... if (value <= 0.2) digit = 1; if
(value <= 0.1) digit = 0; etc.

I'm able to get 100% accuracy on a limited training set with 2 hidden layers
of 10 nodes. 33% accuracy on the test set (but likely need a lot more data to
train with).

~~~
primaryobjects
Update: I was able to train the network to sort sets of two 3-digit numbers. I
used a neural network with 2 hidden layers of 25 nodes. The training/test
accuracy after 10 minutes is 78%/74%. Not bad.
[https://github.com/primaryobjects/nnsorting](https://github.com/primaryobjects/nnsorting)

------
elliptic
Has anyone had experience training deep nets for domains in which examples are
large heterogenous collections (as opposed to speech, or text, or images),
like say transactional or click-stream data?

------
fchollet
This post, while very interesting, attempts to draw a completely unwarranted
parallel between deep nets and the human brain, as if layers of artificial
neurons running on a GPU and the cortical layers of your brain were two
interchangeable things.

So far, there has been no evidence that the brain works anything like an
artificial neural network. Maybe it does, and there are several theories in
that direction, but at the moment we have no solid reason to think so.

~~~
lscharen
The point of drawing comparisons to the human brain is that we know how
quickly humans can perform visual recognition tasks and speed of signal
propagation between neurons. Combining these two properties implies that the
human brain is able to solve these tasks without feedback, i.e. no loops.
Thus, a DNN should be able to perform similar tasks if it can be trained
(which it can).

Recurrent neural nets add feedback and are are whole different kettle of fish.

~~~
LoSboccacc
The brain also appears to have dedicated network structures that are not
trained, but constrained, i.e. programmed to one transformation, say, the
equivalent of feeding both an image and it's edge enhanced version to the same
DNN.

Current approach of feeding raw bitmaps to DNN falls short of that and is very
sensitive to training data[1]

I remember an old paper I cannot find now about how to normalize image for NN
processing in face recognition. Software extracted the face, centered it on a
square and projected that square on a circle around the center to make face
orientation irrelevant (hard to explain without images)

Anyway, it is unfair to expect a DNN to perform vision recognition tasks from
raw bi-dimensional image points.

[1] [http://www.i-programmer.info/news/105-artificial-
intelligenc...](http://www.i-programmer.info/news/105-artificial-
intelligence/8064-the-deep-flaw-in-all-neural-networks.html)

------
rdtsc
Whatever happened to shallow learning (or you know the regular learning)
everyone did before deep learning.

Anyone still doing that?

Is this like BigData. As soon as someone mentioned BigData, anyone in the
world who touched data all of the sudden did BigData.

So is this something coming out of Google and Facebook and such and everyone
else in Academia is happily building SVMs and 2 layer neural networks or some
new discovery happend and overturned the whole ML and AI field on its head?

> Crucially, the number of units required to solve these problems is far from
> exponential --- on the contrary, the number of units required is often so
> “small” that it is even possible, using current hardware,

Number of units is not what's important. There are "only" what, 10B
(100B?)neurons in the brain? But isn't the trick in the connections. And there
are orders of magnitudes more connectsion (hundreds of trillions). Not
exponential but even quadratic at those numbers is still quite large.

~~~
nl
_Whatever happened to shallow learning (or you know the regular learning)
everyone did before deep learning._

Deep learning happened, and it pretty much always beats other approaches.
Saying that sounds unbelievable, so here's a quote from Pete Warden:

 _I know I’m a broken record on deep learning, but almost everywhere it’s
being applied it’s doing better than techniques that people have been
developing for decades_ [1]

There's a great paper from a group of researchers who set out to prove that
their technique, which they had many years of experience in (SVMs?) was just
as good as deep learning (I can't remember their field). They ended up proving
the opposite, and switched their whole lab over to doing deep learning. I
can't find the paper (!!) so I'll refer you to [2] instead.

[1] [http://petewarden.com/2015/01/01/five-short-
links-76/](http://petewarden.com/2015/01/01/five-short-links-76/)

[2] [http://petewarden.com/2014/06/10/why-is-everyone-so-
excited-...](http://petewarden.com/2014/06/10/why-is-everyone-so-excited-
about-deep-learning/)

