
Show HN: minigrad – a minimal, educational autograd library (~100 loc in Python) - jellyksong
https://github.com/kennysong/minigrad
======
paperwork
I love this. The code is simple and documented. However, whenever I’ve tried
to understand autograd, I get stuck at dual numbers.

As a programmer, I understand building up a computation graph where each node
is some sort of an elementary function which knows how to take its own
gradient. So a constant/scalar node has derivative/gradient of zero, x^n has
derivative of nx^(n-1), etc. these gradients are passed from the end to the
beginning according to the chain rule, etc., etc.

However, autograd is not supposed to be the symbolic differentiation we
learned in high school.

This project doesn’t seem to have anything to do with duals...confused!

~~~
jellyksong
Author here:

There are two ways to implement autograd, reverse-mode and forward-mode.
Reverse-mode is what minigrad uses, and what most ML libraries these days use
by default, since it computes gradients of all inputs (wrt one output) in a
single pass. It's exactly what you describe in the 2nd paragraph.

Forward-mode autograd is the technique that can use dual numbers. It computes
all gradients of one input (wrt all outputs) in a single pass. Dual numbers is
a pretty neat mathematical trick, but I'm not aware of anyone that actually
uses them to compute gradients.

The most approachable explanation of dual numbers I've seen is in Aurelien
Geron's book Hands-On Machine Learning (Appendix D). There are articles online
but I found them more technical.

Thanks for checking out the project!

~~~
paperwork
Thanks for the reply. I do have that book, but haven’t made my way to the
appendix yet :)

Your explanation clears things up quite a lot.

Thanks and congrats on a useful and cool project!

