

Training a deep belief network on a GPU with 100 million free parameters - henning
http://ai.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf

======
bravura
Just to summarize:

"Deep learning" is the new big trend in Machine Learning. It promises general,
powerful, and fast machine learning, moving us one step closer to AI.

An algorithm is deep if the input is passed through several non-linearities
before being output. Most modern learning algorithms (including decision trees
and SVMs and naive bayes) are "shallow".

For intuition, imagine if I told you that your main routine can call
subroutines, and your subroutines could call subsubroutines, but you couldn't
have any more abstraction than that. You can't have subsubsubroutines in your
"shallow" program. You could compute whatever you wanted in a "shallow"
program, but your code would involve a lot of duplicated code and would not be
as compact as it should be. Similarly, a shallow machine learning architecture
would involve a lot of duplication of effort to express things that a deep
architecture could more compactly. The point being, a deep architecture can
more gracefully reuse previous computations.

Deep learning is motivated by intuition, theoretical arguments from circuit
theory, empirical results, and current knowledge of neuroscience. Here is a
video where I give a high-level talk on deep learning, and describe this
intuition: [http://bilconference.com/videos/deep-learning-artificial-
int...](http://bilconference.com/videos/deep-learning-artificial-intelligence-
joseph-turian/)

Here is more detailed information about deep learning:
<http://deeplearning.net/tutorial/>

Deep learning techniques can typically be expressed as dense matrix
operations, making them suitable to being sped up using GPUs. Which makes deep
learning even more powerful.

~~~
lliiffee
Since you seem to know this stuff, I hope you doing mind if I ask you a
marginally related question... It seems like deep learning is based on
greedily building representations in an unsupervised setting. (Not sure if
this is a defining characteristic of deep learning or just a common usage.)
The postulate seems to be that greedily learning in a _supervised_ setting is
inferior. Why should this be so? Is there any justification (theoretical or
experimental)? I can only find vague asides on this point in a few papers.

~~~
Groxx
It's not by any means required to be unsupervised, they just tend to be better
suited for it, from what I remember. You can train a deep neural network just
like a shallow one, but do you _know_ you're training it in the best way? The
unsupervised one might key in on something you're not aware of.

They are also more capable of dealing with tons of variables (in this case,
~100 million). If you had 100 million yes/no values that needed to be checked,
could you really answer all of them accurately? In some cases, sure, but not
many, and what's the correct number of variables for simulating a language,
and what do they relate to? An unsupervised network will try to fit them
automagically, rather than taking your word for it.

Unsupervised learning also poses a lot of intelligence-modeling uses, because
_we're_ essentially unsupervised for some of the "hardest" tasks to learn.
Especially prior to learning to speak. All there really is for feedback is
pain, and that doesn't necessarily lead directly to language.

But bravura should be able to answer better / more accurately; hopefully
they'll chime in too. I'd like to know if there's more to it than what I know,
which is pretty minimal.

------
alex656
If you're planning to implement some of the ideas from this paper, check out
the open source CUDAmat library for Python:

<http://code.google.com/p/cudamat/>

~~~
bravura
Better yet, try out Theano, which allows you to write numpy-like math
equations that get automatically compiled and optimized and then target the
GPU:

<http://deeplearning.net/software/theano/>

~~~
cudamatUser1
Theano is really cool. It is a lot more ambitious than CUDAMat, but cudamat
might be easier to get started with.

------
mark_l_watson
Really nice paper.

In the 1980s at SAIC, we spent a small fortune designing custom (Harvard
memory architecture) hardware to do back-prop neural networks relatively fast,
but fast back then was about 23m/sec FP operations. My droid phone could
probably do more than that now :-)

------
dnsworks
It would be nice when you link to a PDF to say so in the title ...

~~~
apu
All pdf links on Hacker News automatically get the [scribd] link attached, so
that's your PDF indicator.

~~~
dnsworks
Whoever put that together, well, I hope I never have to maintain their code,
because it's clear they're the type who document for themselves.

