
Hacker's Guide to Neural Networks - bernatfp
http://karpathy.github.io/neuralnets/
======
karpathy
Thanks for the upvotes! I'm a little conflicted about this being linked around
too much because it is still very much work in progress. I work on this guide
on a side because I think there's a lot of interest in these models out there,
and not enough explanations.

On a related note, some of you might also be interested in a Stanford CS class
I'll be co-teaching with my adviser next quarter. It will be focused primarily
on Convolutional Networks (but a lot of the learning machinery is generic):

CS231n: Convolutional Neural Networks for Visual Recognition
[http://vision.stanford.edu/teaching/cs231n/](http://vision.stanford.edu/teaching/cs231n/)

I hope to make a lot of the materials/code freely available so everyone can
follow along, and I will continue my work on this guide in parallel whenever I
can squeeze in time. (And I'd be happy to hear any feedback on the
content/style/presentation)

~~~
wodenokoto
Thank you for publicizing an early draft. It is really good and very helpful.

I think your explanations are great, but you seem to imply that the gradient 3
for f(x,y) = xy with x = -2 and y=3 derived w.r.t x is somehow the "best"
gradient (you call it the correct or actual gradient) but I don't understand
why this is better than any other positive number.

Wouldn't we optimize faster of we tug with a force of 6 instead of 3?

    
    
      x = -2, y= 3, 
      dx = 3, dy = -2
      step = 0.0001
    
      f(x,y) = -6
    
      f(x+dx*step, y*dy*step) =  -5.998...
    
      // Now if we double the gradient
      f(x+2*dx*step, y*2*dy*step) = -5.997...

~~~
Derander
I'm not the OP but:

The analytical gradient is actually a general tool used in multivariable
calculus. It's vector-valued, in that if you take the gradient of a function
of three variables: f(x, y, z), then you'll get a 3-vector back. Vectors are
defined by two characteristics: a direction and a magnitude. The gradient's
direction is the direction of greatest increase and the magnitude of the
gradient is the instantaneous rate of change.

The gradient is being put to work here in order to optimize a function using a
process called gradient ascent. Intuitively it makes sense that in order to
optimize your function you'd want to "move" in the direction of greatest
increase at every step. However, the size of the step that you take is tricky.
As you point out, in this case, we can increase the objective function value
more if we double the step size. However, you're not actually doubling /the
gradient/.

If you look at the expression that you wrote you should see:

    
    
        x + 2 * dx * step, y * 2 * dy * step.
    

What you've done is double the step-multiplier, not the gradient itself (<dx,
dy>). This means that the optimization process jumps further at each step.
However, the step size that OP chose is somewhat arbitrary to begin with, so
it's not clear why any particular choice would be better or worse. The reason
that the step size matters is that if your step size is too large your
optimization process might jump over a minimum location or something similarly
bad -- there are pathological functions like the Rosenbrock Function [1] which
are notoriously hard to optimize with gradient ascent. In practice, you'll
often choose your step size more intelligently based on a few different tests,
or vary it as the optimization process progresses.

In this particular instance, the surface that you're trying to optimize is
pretty simple so basically any step value will do the trick. It may take a
different number of steps in order to compute the global maximum, but most
reasonable step sizes will get there.

[1]
[http://en.wikipedia.org/wiki/Rosenbrock_function](http://en.wikipedia.org/wiki/Rosenbrock_function)

------
antimora
I wanted to share this resource as well.

I started this online free book "Neural Networks and Deep Learning"
([http://neuralnetworksanddeeplearning.com/](http://neuralnetworksanddeeplearning.com/)).
I think it has a pretty good explanation and illustration.

~~~
raphar
I've enjoyed the published chapters so far. When would you finish the missing
chapters?

------
jaza
Great guide - the only material I've ever read on the subject, that hasn't
completely made my head hurt and my brain not grok. I'm not a maths or ML guy,
just a regular programmer. I've dabbled in NN before, but only to the extent
of using some libraries as a "black box" to pass parameters / training data
to. No doubt there are many more people like me (too many!).

My understanding after reading this guide, is that a neural network is
essentially just a formula for guessing the ideal parameter(s) of another
formula, and for successively refining those parameters as more training data
is passed in. I already knew this "in theory" before, but now I think most of
the magic and mystery has been brushed off. Thanks karpathy!

~~~
junto
I second this. I've always struggled with maths at this (relatively simple)
level.

However seeing the code and then the related equation is teaching me
mathematics that I couldn't grok in school 20 years ago.

This is (for me) a much better way to explain it!

Thank you for sharing.

------
zackmorris
This is the first explanation of neural nets that has really clicked for me.
That's because it's written in the same explanatory tone that I would use in
person while trying to convey the concepts. I wish more people would write
like this.

I think of these things like lectures in college. Assuming that a) you have
all the context up to that point, b) you grok every step, and c) it doesn't
run beyond the length of you attention span - call it an hour - then you can
learn astonishing things in a matter of a few sessions. But if you miss a
step, and don't raise your hand because you think your question is stupid,
then you might as well pack up and go home.

I find that even though the time element of articles is removed, I still trip
over certain sections because the author was too lazy to anticipate my
thinking process, which forces me to search other sites for the answers, and
before you know it the impact of the writing has been completely diluted.

------
xanderjanz
[https://class.coursera.org/ml-007](https://class.coursera.org/ml-007)

Andrew Ng's Standford course has been a god send in laying out the mathmatics
of Machine Learning. Would be a good next step for anybody who was intrigued
by this article.

------
bribri
Absolutely Amazing. I wish I could be taught more math in terms of
programming. Hope some more people make "Hacker's Guide to _"

~~~
wodenokoto
There's a series of books called "Thinking ..." that aims for exactly this.

[http://www.greenteapress.com/thinkbayes/](http://www.greenteapress.com/thinkbayes/)
[http://www.greenteapress.com/index.html](http://www.greenteapress.com/index.html)

------
hoprocker
This is great. Always appreciate different approaches to NN and ML. I'm amazed
that a Stanford PhD candidate has the time to put this together (I won't tell
your advisor :-), but still, thank you!

------
ilyaeck
I saw the JS demos a few months ago and was blown away. It seems like the
community is really thirsty for NN tools and materials, and the way you are
going about this (interactive JS) is right on the money? Why not engage the
community to keep building the site up, then? You may want to look into open-
sourcing the entire site + the JS library, it may really pick up steam. Worse
case, it breaks apart (can always reclaim it), but the exercise could be a
very interesting one.

------
sireat
This was a very good read.

My naive question are the gates are always so trivial or are they usually
black boxes with an unknown function underneath(instead of clearly defined
_,+, etc).

In other words do we usually/always know what stands behind f(x,y)?

Otherwise, if f(x,y)=x_y then obviously you can hardcode a heuretic that just
cranks up max x and max y and gets max f(x,y).

That is the question is why should we tweak the input slightly not go all out.

------
breflabb
inspiring! i have always been interested in AI and neuralnets but i have been
discuraged by the math, many educators that try to learn out these topics
assume that their readers have the same math skills as them selfs and omits
the boring parts for them. its actually the first time i enjoyed to read about
AI and at the same time learn calculus! :) keepup the great work!

------
infinitone
I was wondering the other day, since I have a CV project that could use neural
networks to solve some problems. How big of a training dataset does one need?
Is there some analysis on the relationship between the size of training data
and accuracy?

~~~
adam-_-
We've built a model to analyse the value of a CV. We had a dataset of 50,000
CVs.

Feel free to drop me a line if you'd like to discuss any further. I'd be
interested to hear what you're working on.

------
jdminhbg
I worked through the first few chapters on a flight yesterday and found this
to be exactly what I need -- my ability to read code and sort of intuit
alogrithms vastly exceeds my ability to read and understand math. Thanks for
putting this together.

------
jimmaswell
"it is important to note that in the left-hand side of the equation above, the
horizontal line does not indicate division."

It pretty much is, though.

------
JoelHobson
I haven't finished it yet, but I find this much easier to understand than any
other article on neural networks that I've read.

------
ajtulloch
This is a great piece of work - thanks @karpathy.

------
thewarrior
This seems less like a biological neuron and more like a multi level table
lookup.

------
midgetjones
This was a fantastic read, thanks so much!

Can anyone define 'pertubations' for me?

~~~
nekopa
I think it means small changes, but there is also this:
[http://en.m.wikipedia.org/wiki/Perturbation_theory](http://en.m.wikipedia.org/wiki/Perturbation_theory).

------
Elzair
Thank you karpathy. This is a great introduction.

------
sathya_vj
Thanks a lot for this!

------
therobot24
HN loves Neural Nets & Deep Learning - it's all i ever see in my RSS feed
(with regard to ML methods)

~~~
nojvek
For a guy who is not very good at Maths, this was a great explanation. Thanks.

