Hacker News new | past | comments | ask | show | jobs | submit login
Hacker's guide to Neural Networks (2012) (karpathy.github.io)
383 points by headalgorithm 6 months ago | hide | past | web | favorite | 24 comments

Since we're on the topic of tutorials to understand neural nets and modern deep learning, I will throw in Michael Nielsen's excellently written free online "book" on neural nets. It's really a set of 6 long posts that gets you from 0 to understanding all of the fundamentals with almost no prerequisite math needed.

Using clear and easy to understand language, Michael explains neural nets, the backprop algorithm, challenges in training these models, some commonly used modern building blocks and more:


This book opened my eyes to the power of textbooks written in such easy to understand, clear style. Bet it took repeated revisions, incorporating feedback from others and hours of work but such writing is a huge value add to the world.

+1, I made this repo awhile ago as a learning exercise, basically a c+p of chapter 1: https://github.com/f00-/numpy-mnist-nn

Great post that I often go back on. A curious fact about Karpathy is that he actually has a long history of teaching (relative to his age). About 9 years ago, I learned how to speed solve Rubik's cubes in ~12 seconds through his YouTube channel [0]. It's interesting to see that his simple teaching style transfers quite well to more technical topics than twisty puzzles.

[0] https://www.youtube.com/user/badmephisto


Badmephisto == Andrej Karpathy?!

I would've never made the connection ... badmephisto also got me into speedcubing, my pb is ~14 sec, crazy ...

Thanks for that.

Not as educational but funny to see him getting owned in WoW.


From a student to the director of AI at one of the most innovative companies on Earth in, what, 4 years? Must be one of the greatest untold stories.

The story will be told soon enough once his work has changed the world, it's very much on its first chapters I would imagine.

Any new updates to this article? This guide is well known by ML practitioners for a while now.

I think this eventually turned into Andrej Karpathy's class at Stanford, CS231n. The class notes are here: http://cs231n.github.io/ The class is on youtube. If you like this hacker's guide, I think you'll definitely like the class and the notes. edit: A lot of the compute graph and backprop type stuff that is in the hacker's guide is covered in this specific class, starting about at this time: https://www.youtube.com/watch?v=i94OvYb6noo&t=207s

I've made a lot of progress in my mental models recently by implementing the perceptron in Excel

I've made a similar progress by implementing some ML algorithms in pure RISC-V assembly. Makes you think.

There is a (at least one more) newer version: (from 2017 August) https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PLC1qU-LWwr...

There's a lot of high-school math there, but the trouble is that the real workings of neural networks (the speed of convergence, and why/if it works on samples outside the training/validation set) are left a mystery, if you ask me.

It is relatively clear why it works beyond the training and validation set: What is being approximated is a smooth function, which in the case of a classification task is a function from the space of things to be classified (images of a certain size) to the n-simplex, where n is the number of classes. Then the preimage theorem tells you that over a regular point of this smooth map lies a codimension n submanifold in the space of things to be classified. That in turn can be interpreted as the submanifold of all things that look like the class you are assigning, especially close to the corners of the n-simplex (being a regular point is an open condition). In short: Because the map is constructed to be smooth it will make sense beyond whatever the training / validation data was. Note that this does not guarantee that it has learned something reasonable about the dataset, just that it will have found some way to smoothly separate it into different components.

The question is why dont kernel methods (with appropriate smoothness inducing kernels) and relatedly Gaussian processes do as well. I am being deliberately presumptuous, it is not a proven result that RKHS based methods would be dominated by DNNs. These two approches, RKHS and DNNs arent that fundamentally different, and we have known that even before the recent explosion of interest in DNNs. That infinite DNNs converge to Gaussian processes is an old result.

Of course there are differences, once you choose a kernel the feature map is set in stone, on the otherhand DNNs can search over the space of feature maps. It is likely that the function that a DNN converged to was already in the Hilbert space associated with the kernel, but that does not mean one would converge to it in a finite amount of training data.

Once one goes to infinite dimensions all separable Hilbert spaces are the same (up to isomorphisms) but with finite data it matters what basis / kernel one chooses.

Some days I think I'm smart, then I read stuff like this. I wish there was an EILI5 Plugin for chrome that I could use to tag any paragraph and someone would come along and break it down for me. I'd do it for your comment!

Maybe the ELI5 plugin could be reasonably approximated by adding links to all phrases with corresponding Wikipedia articles, like:







I guess it depends on whether you're willing to read a textbook's worth of Wikipedia articles. (There is no shortcut. Actual 5-year-olds also require a few years until they can understand advanced mathematics.)

You might be confusing "smart" with "knowledgeable". There's no way to be smart about concepts you haven't yet learned. And the ELI5 idea is cool - just need to find a business model :)

Yes, but these things should be in the guide, or at least with lower-dimensional framing to make it understandable.

General popular opinion seems to be that these are (to greater or lesser extent) a mystery for everyone. Can you suggest any intermediate reading on things like generalisation? I've looked online but only found either "here's how to recognize numbers in the MNIST dataset using numpy" or "First we take <long string of squiggles> which trivially implies <longer string of squiggles>..."

That is because hyperparameters are indeed a mystery and the fine tuning of those is more an heuristics based art, than a science.

The Approximation Power of Neural Networks https://towardsdatascience.com/the-approximation-power-of-ne...

I actually wrote a similar style guide just last week (had some time off, was meaning to do it):


Uses updated Keras and python and doesn’t go so much into the network connections itself.

I do regular trainings and teach seminars on neural networks and find most tutorials online go too in-depth regarding constructing a network from scratch (such as this one) - they lose people.

The biggest issue is actually data formatting, ingestion then hyperparameter tuning today. You really only need to grasp the basics to get started in 2019.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact