
Resurgence of Neural Networks - marmalade
http://tjake.github.com/blog/2013/02/18/resurgence-in-artificial-intelligence/
======
moron4hire
Really interesting stuff.

I had once attempted to build a genetic algorithm for manipulating the synapse
weights, specifically because of the problems of traditional back-propagation
falling into local minima (unfortunately, some serious shit at work made it
drop by the wayside). This RBM approach sounds better than back-propagation,
but it also sounds like it would be prone to runaway feedback.

One of the performance problems with neural networks is that the number of
cores on a typical machine are far less than the number of input and
intermediate nodes in the network. The output nodes are less of a concern as
you're trying to distill a lot of data down to a little data, but there is no
reason to treat them differently. There are (very few) examples of NNs on
GPUs, so that helps, but I've recently been curious to try a different, more
hardware-driven approach, just because one could.

Texas Instruments has a cheap DSP chip that you'ns are probably familiar with
called the MSP430. It's pretty easy to use, the tool chain is free and fairly
easy to setup (especially for a bunch of professional software devs like us,
right? _right?_ Well, there's an Arduino-like tool now, too, if not), costs
around 10 cents in bulk for the simplest version, requires very few external
parts to run (something like a power source, 1 cap, and two resistors), and it
has a couple of serial communication protocols built in. I'm quite fond of the
chip; I've used it to build a number of digital synthesizers and toys.

For about $50 and quite a bit of soldering time, you could build a grid of 100
of these, each running at 16Mhz, and I bet with a clever design you could make
them self programmable, i.e. propagate the program EEPROM over the grid. Load
up a simple neural network program, maybe even having each chip simulating
more than one node, and interface it with a PC to pump data in one end and
draw it out the other. It might not be more useful than the GPGPU approach,
but having physical hardware to play with and visualize node activity through
other hardware would be a lot of fun.

~~~
stiff
_I had once attempted to build a genetic algorithm for manipulating the
synapse weights, specifically because of the problems of traditional back-
propagation falling into local minima (unfortunately, some serious shit at
work made it drop by the wayside)._

There are many ways to avoid this, for example have a look at:

<http://en.wikipedia.org/wiki/Rprop>
<http://en.wikipedia.org/wiki/Conjugate_gradient_method>

In traditional (I am not talking about the "deep" stuff) neural networks
optimization is hardly ever the problem though, most often under- or over-
fitting is the issue that produces poor performance.

 _One of the performance problems with neural networks is that the number of
cores on a typical machine are far less than the number of input and
intermediate nodes in the network._

This sounds odd. Certainly doing neural networks in hardware is interesting,
but it sounds a bit like an imagined problem, I mean, when multiplying 20
numbers one does not complain that the number of cores is less than 20. And
most often it is the training of the network that is resource-intensive not
the actual running of it.

~~~
mierle
Both RProp and Conjugate Gradient are descent methods-- they find a path to a
local minima from the starting configuration. They do not help with finding
the global minimum.

Using simulated annealing to guide random initializations can help find a
better minima, but to get the global minima with simulated annealing takes an
inordinate amount of time.

~~~
stiff
(Poor) local minima can be sometimes reached in back propagation due to a bad
setting of the learning rate and RProp does help with that. You can also
repeat the learning process using one of those algorithms many times with
different initial random weights and use the results of the best attempt. I
think this most often suffices in practical applications of NN for getting
decent results, you are right I got some of this wrong, I basically wanted to
say that backprop is no longer state of the art in training and that
optimization is the relatively easy part of learning.

------
bsenftner
I've been working for several years as the "applications developer" for a
neural net lab. The neural lab has spent 11 years developing and refining a
neural net pipeline - a series of neural nets which given one or more photos
of a person's face, the pipeline performs forensically accurate 3D
Reconstructions of the person's face and head. The system is used by
government & police agencies the world over when trying to determine what a
"person of interest" looks like given random photos of their subject. I've
additionally exposed an "entertainment" version of the technology which can be
seen at www.3d-avatar-store.com. There one can create a 3D avatar, get a Maya
rigged version for professional quality animation, as well as license my
WebAPI to embed avatar creation into your own software. And the best part is
the avatars look just like the person in the source photo.

~~~
chanced
Thats some pretty impressive stuff. I would suggest changing your site name
though, it seems a bit spamish (I almost didn't go for that reason).

------
jph00
I'm the President and Chief Scientist of Kaggle, which ran the drug discovery
project mentioned in the article. As it happens, I did my Strata talk on
Tuesday about just this topic. I will be repeating the talk in webcast form
(for free) in a few weeks: <http://oreillynet.com/pub/e/2538> . I'll be
focussing more on the data science implications, rather than implementation
details.

------
tansey
Nice write up. I gave a presentation on DBNs for my Neural Networks class in
Fall 2011. If you'd like references to the relevant papers and some more
details on the algorithms and applications, here are the slides:
[https://docs.google.com/presentation/d/18vJ2mOmb-
Cbqsk0aNoUM...](https://docs.google.com/presentation/d/18vJ2mOmb-
Cbqsk0aNoUMNKCjEaiWyryFAVBxM3nMFfY/edit?usp=sharing)

~~~
SatvikBeri
This was a really fun read-thank you! If you don't mind answering, I've got a
few questions:

1\. "Vanishing gradients after 2-3 layers"-does this mean that the partial
derivatives tend to be smaller on the higher layers, and therefore the network
finds local minima that aren't very useful?

2\. Step 3 (p 18) mentions that the outputs are not continuous variables,
they're binary. What's the reasoning behind that?

~~~
tansey
1\. Basically. It means that the network has a hard time pulling itself in any
direction since the weights in the deeper layers are never really adjusted by
very much.

2\. It's been a while since I read the paper, but I believe that the
justification has to do with the proof of convergence of Gibbs sampling. I
haven't tried using continuous values, so I can't give an intuition for what
happens in those cases.

------
nicholasjarnold
If you're really interested in understanding more about he "hierarchy of
filters" quote, and much more related to that theory of how our brains
operate, I strongly suggest the book On Intelligence by Jeff Hawkins. Super
interesting stuff!

~~~
krenoten
Here's a page that gives a high level overview of the technology that he has
helped to develop: <https://www.numenta.com/technology.html>

On Intelligence has dramatically changed the way I think about thinking. It's
an awesome book.

~~~
nicholasjarnold
Thanks for this, I forgot to mention about Numenta. Your comment prompted me
to search my old archives for the setup file to "Vitamin D Video", a motion
and object detection program that was a very early example of Numenta
technology being successfully implemented.

Now, it looks like those Vitamin D people have their own company:
<http://www.vitamindinc.com/>

Even the really early versions of Vitamin D were impressive. Anybody use it
for anything interesting now?

------
return0
First, it's Geoffrey, not Gregory Hinton.

Here's a very good tech talk from him about RBMs:
<http://www.youtube.com/watch?v=AyzOUbkUf3M>

That said, both approaches loosely mirror the function of the brain, as
neurons are not simple threshold devices, and both backpropagation and the
RBMs training algorithms do not have a biophysical equivalent.

~~~
tjake
Oh sorry. I fixed it. Sorry Geoffrey!

~~~
freyr
Second, it's Geoffrey, not Gregory Hinton.

------
dave_sullivan
Oh, backprop isn't so bad...

After all, a deep belief network starts with an RBM for unsupervised pre-
training, but the finetuning stage that follows just treats the network as a
standard MLP using backprop.

Also, you can use an autoencoder instead of an RBM, which I think are getting
better results these days? And there are better regularization techniques for
backprop now--weight decay, momentum, L1/L2 regularization, dropout, probably
more that I'm leaving out.

The pre-training (RBM or autoencoder) helps to not get stuck in local minimas,
but there's also interesting research that suggests you're not even getting
stuck in local minima so much as you're getting stuck in these low slope, high
curvature corridors that gradient descent is blind to, so people are looking
into second order methods that can take curvature into account so you can take
big steps through these canyons and smaller steps when things are a bit
steeper. Or something like that :-)

All that being said, anyone care to weigh in on the pros/cons of RBMs vs
something like a contractive autoencoder? No such thing as a free lunch, so
what are the key selling points of RBMs at this point? I keep seeing them pop
up, but afaik, they don't provide a particular advantage over autoencoder
variants.

Great article though, I'm really glad to see more and more people getting
interested in neural networks, they've come a long way and people are just
starting to wake up to that.

~~~
jghrng
_All that being said, anyone care to weigh in on the pros/cons of RBMs vs
something like a contractive autoencoder?_

For some problems, it may be nice to have a generative model as offered by
RBMS (although Rifai et al. published a sampling method for contractive auto-
encoders recently: <http://icml.cc/2012/papers/910.pdf>). I feel like with
RBMs, you can design models which incorporate prior knowledge more "easily"
(you may end up with pretty complex models...), e.g. the conditional RBM, the
mean-covariance RBM or the spike & slab RBM. Additionally, there's the deep
boltzmann machine that consists of multiple layers that are jointly trained in
an RBM-like fashion.

Auto-encoders are straightforward to understand and implement. With
contractive terms or denoising, the are powerful feature extractors as well.

But as you already noted, if you "just" want to have a good classifier, I
think it pretty much boils down to personal preference since you're going to
spend some effort on making these techniques work well on your problem anyway.

------
smalieslami
In fact we're only scratching the surface when it comes to the generative
capabilities of deep models. See e.g. our recent work on using Deep Boltzmann
Machines to learn how to draw object silhouettes:
<http://arkitus.com/ShapeBM/>

------
boothead
As mentioned in this thread by nicholasjarnold, Jeff Hawkins work on HTM
(detailed in his excellent book "On Intelligence") seems superficially similar
to this. Has anyone had experience of both approaches. HTM seems to have much
more structure in the network, but I know next to nothing about AI and would
love to hear from those who know a bit more.

~~~
rm999
I have some experience with both, but can't give a great comparison. The tldr
is I've always had a better impression of Hinton than Hawkins, and have
studied/followed Hinton's approaches much more carefully.

In late 2006/early 2007 I was working a lot with standard two layer feed-
forward neural networks (first for my research and then for my job). Hinton
had a great paper on practical deep networks at NIPS 2006 (a big AI/machine
learning conference), which sparked my interest in more complex neural
networks. I had read Hawkins' book a few years earlier, and my impression of
it was somewhat negative; I thought it was a really interesting book, but it
was too fluffy and high-level to be intriguing. He hit a lot of points about
hierarchies in intelligence that were intriguing but not new or drastically
insightful. After NIPS I downloaded some of Numenta's code (numenta is
Hawkins' company) and it was pretty slow on toy problems so I didn't spend too
much time with it - this isn't a judgement of their code, I just didn't have
the time to dig deeply into it. My impression at the time, which may be
unfair, is that Numenta's approach was ad-hoc while Hinton's was principled. I
was negatively biased by Hawkins' book and my professors' opinions of him vs
Hinton.

------
leot
I remember running into Hinton one afternoon back in 2005 while on St. George.
He was walking home, and especially cheerful from having just figured out how
to do learning efficiently on deep belief nets. It's amazing to see the
influence this work has had.

------
sherjilozair
MNIST is not a good dataset to show any artificial intelligence on. The
dataset is so simple, a good programmer can probably write 100 lines of python
to write a classifier for it, based on no machine learning.

Neural Network techniques which work so well in small, easy and trivial
datasets like MNIST do not generalize to more serious datasets, and that's
where the "and this is where the magic happens" component is needed.

~~~
spin
Deep neural networks are now being developed for use in Google's speech
recognition tools and Microsoft's. Microsoft claims a 30% increase in accuracy
by using deep neural networks (developed in cooperation with Hinton's group at
U Toronto)...

------
theschwa
That Coursera class has been showing a start date of "Oct 1st 2012" for a
while. Does anyone know when the next class might be?

~~~
SatvikBeri
Haven't heard anything about the next class starting, but I believe you can
still sign up and download all the videos.

~~~
sk2code
I believe you can. Your work won't be evaluated but atleast you have the
material to get hands on.

------
scottmp10
It is great to see more interest in neural networks, but the types of neural
networks the author describes are missing some key aspects of what the brain
is doing. I work with Jeff Hawkins at Numenta, and while our product is based
on a type of neural network, it is quite different from the class of NN
described in this post. For background:

A recent blog post by Jeff: [https://grok.numenta.com/blog/not-your-fathers-
neural-networ...](https://grok.numenta.com/blog/not-your-fathers-neural-
network.html)

And more detailed information on the technology (I would recommend the CLA
white paper): <https://grok.numenta.com/technology.html>

------
Rnnguy
Sitting in a class right now reading this while Hinton is teaching neural
nets.

~~~
textminer
I can remember times being a student, learning a lot, but losing focus of how
amazing it was I had dedicated time solely to study. As someone working full-
time now, who only gets to learn new fascinating math in his precious free
time, I implore you to fully embrace the awesome opportunity in front of you.
Enjoy the lecture, and milk whatever you can from this man's teaching.

~~~
tripzilch
Similarly, I am glad I did not have a laptop/smartphone with wireless Internet
[in the classroom] in my college days. I fear I'd have learned so much less.
In fact, there's a pretty strong correlation between my study results going
down the moment we got cable Internet at my students-home... :-/

(on the positive side, Internet made me loads of international friends, and
there's truckloads of things I could not have learned without it)

~~~
textminer
You just made me realize something. I attributed my own college grades
slipping a bit my senior year to burnout, but it was also Fall '08, and I had
just got an iPhone 3G, my first smartphone...

------
m12k
I looked at Restricted Boltzmann Machines for a while when searching for a
topic for my master's thesis. One very interesting use is to train an RBM with
animations, and then use it generatively to create new animations - Hinton and
one of his students, Graham Taylor, wrote a paper about it
([http://www.cs.utoronto.ca/~hinton/csc2515/readings/nipsmocap...](http://www.cs.utoronto.ca/~hinton/csc2515/readings/nipsmocap.pdf)
(PDF)). Imagine if it was expanded, so animators could train an RBM with a
body of animation from a character, then simply specify "go from here to here"
and the RBM would create an interstitial animation. Afaik a lot of animation
work is just boilerplate like "line the character up so we can fire the sit
down animation".

------
maaku
Great post, and thank you for the link to Hinton's coursera page - I didn't
know about that. I also hope to learn a thing or two from your github code.
But it was so depressing to read this:

> Now, when I say Artificial Intelligence I’m really only referring to Neural
> Networks. There are many other kinds of A.I. out there (e.g. Expert Systems,
> Classifiers and the like) but none of those store information like our brain
> does (between connections across billions of neurons).

This is a middle-brow dismissal of almost the entire field of A.I. because it
does not meet an unnecessarily narrow restriction. (Which, by the way, neural
nets don't either. Real neurons are analog-in, digital-out, stochastic
processes with behavior influenced by neural chemistry and with physical
interconnectivity and timing among other things not accurately modeled at all
by any neural net. It's closer modeling to the mechanisms of the brain, but
far from equivalent and as a CogSci student you should know that.)

A.I. is the science of building artifacts exhibiting intelligent behavior,
intelligence being loosely defined as what human minds do. But in theory and
in practice, _what_ human minds do is not the same thing as _how_ they do it.

The human mind does appear to be a pattern matching engine, with components
that might indeed be well described as a hidden Markov model or restricted
Boltzmann machine. It may be that our brains are nothing more than an
amalgamation of some 300 million or so interconnected hidden Markov models.
That's Ray Kurzweil's view in _How to Create a Mind_ , at any rate.

However it is a logical fallacy to infer that neural nets are the only or even
the best mechanism for implementing all aspects of human-level intelligence.
It's merely the first thing evolution was able to come up with through trial
and error.

Take the classical opposite of neural nets, for example: symbolic logic. If
given a suitable base of facts to work from and appropriate goals, a theorem
prover on your cell phone could derive all mathematics known up to the early
20th century (and perhaps beyond), without the possibility of making a single
mistake. And do it on a fraction of the energy you spend splitting a bill and
calculating tip. A theorem prover alone does not solve the problem of initial
learning of ontologies or reasoning about uncertainty in a partially
observable and even sometimes inconsistent world. But analyzing memories and
perception for new knowledge is a large part of what human minds do
(consciously, at least), and if you have a better tool, why not use it?

Now I myself am enamored by Hilton-like RBM nets. This sort of unassisted
deep-learning is probably a cornerstone in creating a general process for
extracting any information from any environment, a central task of artificial
general intelligence. However compared with specialized alternatives, neural
nets are hideously inefficient for many things. Doesn't it make sense then to
use an amalgam of specialized techniques when applicable, and fall back on
neural nets for unstructured learning and other non-specialized tasks? Indeed
this integrative approach is taken by OpenCog, although they plan to use
DeSTIN deep-learning instead of Hilton-esque RBM's, in part because the output
of DeSTIN is supposedly more easily stored in their knowledge base and parsed
by their symbolic logic engine.

~~~
visarga
By the way, is there a simple command line tool implementing RBM? Something
easy to pick up and use.

~~~
maaku
Not one that I am personally aware of, but this is not my area of focus (yet).
If Google doesn't work I would try emailing Hilton directly and see if he
knows any open source / freely available solutions.

------
mhluongo
Check out the rest of Geoffrey Hinton's work as well-
<http://scholr.ly/person/3595934/geoffrey-e-hinton>

------
spin
You can play with a Python version of this same algorithm (cd for rbm) here:
<https://github.com/Wizcorp/Eruditio>

(I wrote it... :-)

------
frooxie
Does anyone have a link to a web page (or to a book) that would be useful if
you want to learn to program a Deep Belief Network?

~~~
countersixte
For a (slightly technical) in-depth guide to training RBM's:
<http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>

It discusses how to choose a learning algorithm, selecting hyperparameters,
number of hidden units, etc.

