
Hacker's guide to Neural Networks (2012) - headalgorithm
http://karpathy.github.io/neuralnets/
======
theCricketer
Since we're on the topic of tutorials to understand neural nets and modern
deep learning, I will throw in Michael Nielsen's excellently written free
online "book" on neural nets. It's really a set of 6 long posts that gets you
from 0 to understanding all of the fundamentals with almost no prerequisite
math needed.

Using clear and easy to understand language, Michael explains neural nets, the
backprop algorithm, challenges in training these models, some commonly used
modern building blocks and more:

[http://neuralnetworksanddeeplearning.com/](http://neuralnetworksanddeeplearning.com/)

This book opened my eyes to the power of textbooks written in such easy to
understand, clear style. Bet it took repeated revisions, incorporating
feedback from others and hours of work but such writing is a huge value add to
the world.

~~~
f00_
+1, I made this repo awhile ago as a learning exercise, basically a c+p of
chapter 1: [https://github.com/f00-/numpy-mnist-
nn](https://github.com/f00-/numpy-mnist-nn)

------
jeraguilon
Great post that I often go back on. A curious fact about Karpathy is that he
actually has a long history of teaching (relative to his age). About 9 years
ago, I learned how to speed solve Rubik's cubes in ~12 seconds through his
YouTube channel [0]. It's interesting to see that his simple teaching style
transfers quite well to more technical topics than twisty puzzles.

[0]
[https://www.youtube.com/user/badmephisto](https://www.youtube.com/user/badmephisto)

~~~
Rainymood
WHAT?!

Badmephisto == Andrej Karpathy?!

I would've never made the connection ... badmephisto also got me into
speedcubing, my pb is ~14 sec, crazy ...

------
yorwba
(2012)

Previous submissions:

[https://news.ycombinator.com/item?id=14769525](https://news.ycombinator.com/item?id=14769525)

[https://news.ycombinator.com/item?id=9249924](https://news.ycombinator.com/item?id=9249924)

[https://news.ycombinator.com/item?id=8553307](https://news.ycombinator.com/item?id=8553307)

------
freediver
From a student to the director of AI at one of the most innovative companies
on Earth in, what, 4 years? Must be one of the greatest untold stories.

~~~
jaimex2
The story will be told soon enough once his work has changed the world, it's
very much on its first chapters I would imagine.

------
sdan
Any new updates to this article? This guide is well known by ML practitioners
for a while now.

~~~
otaviogood
I think this eventually turned into Andrej Karpathy's class at Stanford,
CS231n. The class notes are here:
[http://cs231n.github.io/](http://cs231n.github.io/) The class is on youtube.
If you like this hacker's guide, I think you'll definitely like the class and
the notes. edit: A lot of the compute graph and backprop type stuff that is in
the hacker's guide is covered in this specific class, starting about at this
time:
[https://www.youtube.com/watch?v=i94OvYb6noo&t=207s](https://www.youtube.com/watch?v=i94OvYb6noo&t=207s)

------
peteretep
I've made a lot of progress in my mental models recently by implementing the
perceptron in Excel

~~~
gnulinux
I've made a similar progress by implementing some ML algorithms in pure RISC-V
assembly. Makes you think.

------
asnyc
There is a (at least one more) newer version: (from 2017 August)
[https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PLC1qU-
LWwr...](https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PLC1qU-
LWwrF64f4QKQT-Vg5Wr4qEE1Zxk)

------
amelius
There's a lot of high-school math there, but the trouble is that the real
workings of neural networks (the speed of convergence, and why/if it works on
samples outside the training/validation set) are left a mystery, if you ask
me.

~~~
orbifold
It is relatively clear why it works beyond the training and validation set:
What is being approximated is a smooth function, which in the case of a
classification task is a function from the space of things to be classified
(images of a certain size) to the n-simplex, where n is the number of classes.
Then the preimage theorem tells you that over a regular point of this smooth
map lies a codimension n submanifold in the space of things to be classified.
That in turn can be interpreted as the submanifold of all things that look
like the class you are assigning, especially close to the corners of the
n-simplex (being a regular point is an open condition). In short: Because the
map is constructed to be smooth it will make sense beyond whatever the
training / validation data was. Note that this does not guarantee that it has
learned something reasonable about the dataset, just that it will have found
some way to smoothly separate it into different components.

~~~
ohfunkyeah
Some days I think I'm smart, then I read stuff like this. I wish there was an
EILI5 Plugin for chrome that I could use to tag any paragraph and someone
would come along and break it down for me. I'd do it for your comment!

~~~
yorwba
Maybe the ELI5 plugin could be reasonably approximated by adding links to all
phrases with corresponding Wikipedia articles, like:

[https://en.wikipedia.org/wiki/Validation_set](https://en.wikipedia.org/wiki/Validation_set)

[https://en.wikipedia.org/wiki/Smooth_function](https://en.wikipedia.org/wiki/Smooth_function)

[https://en.wikipedia.org/wiki/N-simplex](https://en.wikipedia.org/wiki/N-simplex)

[https://en.wikipedia.org/wiki/Preimage_theorem](https://en.wikipedia.org/wiki/Preimage_theorem)

[https://en.wikipedia.org/wiki/Codimension](https://en.wikipedia.org/wiki/Codimension)

[https://en.wikipedia.org/wiki/Submanifold](https://en.wikipedia.org/wiki/Submanifold)

I guess it depends on whether you're willing to read a textbook's worth of
Wikipedia articles. (There is no shortcut. Actual 5-year-olds also require a
few years until they can understand advanced mathematics.)

------
lettergram
I actually wrote a similar style guide just last week (had some time off, was
meaning to do it):

[https://austingwalters.com/neural-networks-to-production-
fro...](https://austingwalters.com/neural-networks-to-production-from-an-
engineer/)

Uses updated Keras and python and doesn’t go so much into the network
connections itself.

I do regular trainings and teach seminars on neural networks and find most
tutorials online go too in-depth regarding constructing a network from scratch
(such as this one) - they lose people.

The biggest issue is actually data formatting, ingestion then hyperparameter
tuning today. You really only need to grasp the basics to get started in 2019.

