
Deep Learning From The Bottom Up - zercool
http://metacademy.org/roadmaps/rgrosse/deep_learning
======
PieSquared
In addition to issues raised by other commenters, one of the problems with
deep learning (deep nets in general) is that they can be very hard to train.
If you're interested in some techniques people have been using, I highly
suggested you read up on optimization methods such as conjugate gradient and
hessian-free optimization. I did this recently [0] and have a brief write-up,
but honestly the original Martens paper may be more understandable [1].

[0] [http://andrew.gibiansky.com/blog/machine-learning/hessian-
fr...](http://andrew.gibiansky.com/blog/machine-learning/hessian-free-
optimization/)

[1]
[http://machinelearning.wustl.edu/mlpapers/paper_files/icml20...](http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf)

------
cjrd
Hi, I'm one of the creators of Metacademy. I hope you find it useful. Feel
free to follow our new Twitter account if you'd like low volume updates:

[https://twitter.com/meta_learning](https://twitter.com/meta_learning)

Also, you can register an account for an occasional email.

PS) We're completely free and open source:
[https://github.com/metacademy/metacademy-
application](https://github.com/metacademy/metacademy-application)

------
nrmn
For anyone actually interested in implementing DNN's I wrote up a quick blog
post (essentially a brain dump) of general guidelines to adhere to when
training DNNs. The source for this information is primarily from videos given
by Geoffrey Hinton as well as various papers.

[http://343hz.com/general-guidelines-for-deep-neural-
networks...](http://343hz.com/general-guidelines-for-deep-neural-networks/)

------
jarvic
I just skimmed the post as I don't have time to fully read it right now, but
I'll point out a couple of problems that you can run into with neural nets and
associated approaches.

One issue that can be a back breaker depending on your application is that, to
produce a generalizable model, nets tend to need much more training data than
the alternatives. There are ways to work around this, though.

The bigger problem to me is interpretability. Deep learning often gives
feature sets that are very good for whatever task you are working on, they are
in some senses artificial and it is difficult to relate changes in features to
changes in the input data. I work with a lot of biological and medical data,
and this is an issue because for some applications it is important not to just
get accurate classification results, but to be able to understand what your
features mean in the context of the original problem. I saw some interesting
work in a computer vision paper earlier this year on trying to learn how to
visualize how changes in input and outputs of a neural net were related, I'll
try to dig that up later if anyone is interested.

I'm not sure how coherent that was as I was trying to get this typed out in a
hurry.

~~~
tinkerdol
Sure, please post the link to the paper, it sounds interesting.

------
agibsonccc
To address some of the comments being presented here, neural nets despite
being harder to train can be debugged visually.

A few tips for those of you who use neural nets:

Debug the weights with histograms. Track the gradient and make sure the
magnitude is not too large and its normally distributed.

Keep track of your gradient changes when using either gradient descent or
conjugate gradient.

Plot your filters, visualize what each neuron is learning.

Watch the rate of change of your cost function. If it seems like its changing
too fast and stops early lower your learning rate.

Plot your activations: if they start out grey you're fine. If you start all
black, you need to retune some of your parameters.

Lastly, understand the algorithm you're using. Convolutional nets are
different from recursive neural tensor are different denoising autoencoders
are different from RBMS/DBNs.

Pay attention to your cost function, reconstruction entropy is used
differently from negative log likelihood is used differently for different
objectives.

If you are trying to do feature learning, you are using RBMs, Denoising
AutoEncoders and you will use reconstruction entropy. This is what you use for
feature detectors. You may end up using negative log likelihood if you are
dealing with continuous data.

For RBMs, pay attention to the different kinds of units[1]. Hinton recommends
Gaussian visible with recitifed linear for continuous data, binary binary
otherwise.

For denoising autoencoders, watch your corruption level. A higher one helps
generalize better, especially with less data.

For time series or sequential data, you can use a recurrent net,moving window
with DBNs, or recursive neural tensor

Other knobs:

If your deep learning framework doesn't have adagrad find one that does.

Dropout: crucial. Dropout is used in combination with mini batch learning to
handle learning different "poses" of images as well as generalizing feature
learning. This can be used in combination with sampling with replacement to
minimize sampling error.

Regularization: L2 is typically used. Hinton once said: you want a neural net
that always overfits but is regularized (youtube video...don't remember link
right now).

Would love to answer questions! Source: I work on/teach this stuff. Still
working my way up there, but it seems to be going well so far.[2][3]

Lastly, tweak one knob at a time. Neural nets have a lot going on. You don't
want a situation where you A/B tested 10 different parameters at once and you
don't know which one worked or why.

[1]:
[http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf](http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf)

[2]: [http://deeplearning4j.org/](http://deeplearning4j.org/)

[3]: [http://zipfianacademy.com/](http://zipfianacademy.com/)

[4]: [http://arxiv.org/abs/1206.5533](http://arxiv.org/abs/1206.5533)
[http://deeplearning4j.org/](http://deeplearning4j.org/)
[http://deeplearning4j.org/debug.html](http://deeplearning4j.org/debug.html)
[http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggi...](http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggingRestrictedBoltzmannMachine.pdf)

~~~
cjrd
Nice to see you HN, Adam =)

We just opened up the roadmap for contributions (click the "view source" with
a logged in account). Feel free to add any of these notes where you think
they'd fit in nicely -- don't worry about messing anything up, we have version
control for a reason. Also, please email me if you run into any
problems/confusion.

~~~
agibsonccc
Will do! Like we discussed before, great initiative!

------
tshadwell
I'm pretty familiar with neural networks, and skimming that article it appears
to describe something that is a neural network. Is 'Deep Learning' new
terminology for 'Neural Network', or does it describe a subset of ways of
using them?

~~~
makeset
Deep learning models are neural networks, but their recent popularization is
due to a new method of building them incrementally, adding generative hidden
layers trained as autoencoders to extract representative features, until the
final discriminative layer. The resulting model is still a neural network,
which can be finetuned by gradient methods, though conventionally training the
same model from scratch with a random initialization would not have worked.

~~~
tshadwell
Ah, I see. Thanks.

------
sytelus
THANK YOU for this link. Meta Academy is amazing! I always wanted a tool like
this which tells me graph of concepts I need to learn first before I can learn
X. I wish we had this kind of learning plan graph for other fields as well.

------
elliptic
Anyone know of good papers relating to deep learning that are not from image
classification or speech/text recognition?

~~~
ninjin
"Recursive Deep Models for Semantic Compositionality Over a Sentiment
Treebank" by Socher et al. (2013) is a personal favourite of mine.

[http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf](http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)

"Linguistic Regularities in Continuous Space Word Representations" by Mikolov
et al. (2013) is also a nice read.

[http://research.microsoft.com/pubs/189726/rvecs.pdf](http://research.microsoft.com/pubs/189726/rvecs.pdf)

