
Hacker's guide to Neural Networks (2012) - catherinezng
http://karpathy.github.io/neuralnets/
======
frenchie4111
I've read so many of these, none of them include the information I need.

If someone wrote a "Hackers guide to Tuning Hyperparameters" or "Hackers guide
to building models for production" I would ready/share the shit out of those.

~~~
albertzeyer
The problem is, this really is different on each task. It's really hard to
state any generic rules of thumb. Everyone has their own default parameters
and intuitions but I would say most of them are heavily biased to the tasks
you have worked with. For example, I work with deep bidirectional LSTMs on
acoustic modeling for speech recognition, and I use Adam with a starting
learning rate of 0.0005, Newbob learning rate scheduling, pretraining, dropout
of 0.1 in addition to L2 regularization, and so on. I tried to collect these
results here: [https://www-i6.informatik.rwth-
aachen.de/publications/downlo...](https://www-i6.informatik.rwth-
aachen.de/publications/download/1030/Zeyer-ICASSP-2017.pdf)

~~~
nomel
I know nothing about any of this, but has there been any work into using
neural networks to guide and vary the parameter values during training?

~~~
chriskanan
There has been some work related to what you are describing:

 _Learning to learn by gradient descent by gradient descent_
[https://arxiv.org/abs/1606.04474](https://arxiv.org/abs/1606.04474)

Related (but less so), there are also some papers about learning neural
network architectures:

 _Designing Neural Network Architectures using Reinforcement Learning_
[https://arxiv.org/abs/1611.02167](https://arxiv.org/abs/1611.02167)

 _Neural Architecture Search with Reinforcement Learning_
[https://arxiv.org/abs/1611.01578](https://arxiv.org/abs/1611.01578)

------
NegatioN
This has been submitted quite a few times in the past:
[https://hn.algolia.com/?query=karpathy.github.io%2Fneuralnet...](https://hn.algolia.com/?query=karpathy.github.io%2Fneuralnets&sort=byPopularity&prefix&page=0&dateRange=all&type=story)

~~~
sillysaurus3
Aw. And I was hoping Chapter 3 and 4 would be finished sometime soonish. It's
the only part of the guide that you can learn from by example.

~~~
desku
I don't think it'll ever be finished.

~~~
nsthorat
Yep, Andrej leads Tesla Autopilot now. Doubt he'll be following up here.

~~~
karpathy
yep :(

~~~
ashwinp92
I guess ancestor commenters need only wait for a weekend now :)

------
postit
A good sit in probability theory and multivariate calculus is the first thing
you should spend your time if you want to understand NN, ML and most of AI for
once.

These hacker guides only scratch the surface of the subject which, in part,
contributes to creating this aura of black magic that haunts the field; I'm
not saying that is a bad thing though, but it needs to be a complementary
material, not the way to go.

------
stared
When it comes to backpropagation, PyTorch introduction contains some valuable
parts:
[http://pytorch.org/tutorials/beginner/deep_learning_60min_bl...](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

------
debacle
Static neural networks on Rosetta Code for basic things like Hello World, etc,
would do a lot to aid in people's understanding of neural networks. It would
be interesting to visualize different trained solutions.

------
nategri
Knew this wasn't for me when he had to introduce what a derivative was with a
weird metaphor. I like this approach to teaching things (it's Feynman-y) but
half the time I end up hung up on trying to understand a particular author's
hand-waving for a concept I already grok.

------
adamkochanowicz
Thank you for posting this! I hadn't seen it and have been looking for a
simple guide like this one.

------
finchisko
thanks for sharing, apparently i missed past submits

------
amelius
Hmm, I've just scanned through this, but it seems this gets the concept of
stochastic gradient descent (SGD) completely wrong.

The nice part of SGD is that you can backpropagate even functions that are not
differentiable.

This is totally missed here.

~~~
kleebeesh
How do you compute the gradient of a non differentiable function? I'm not an
expert, but that contradicts anything I've learned about gradient descent.

~~~
Houshalter
Pseudo gradients are a thing. You can pretend its a continuous function and
get gradients that push it in the right direction.

------
GoldDust
As someone who is quite new to this field and also a software developer I
really look forward to seeing this progress. I write and look at code all day
so for me this is much easier to read than the dry math!

------
du_bing
Wonderful guide, thanks for sharing!

