
Differentiation for Hackers - adamnemecek
https://github.com/MikeInnes/diff-zoo
======
tobmlt
This is really a nice thing to put out there. I want to add that AD (automatic
differentiation) can be a great jumping off point to other things. (It sure
was for me in my studies anyway) Here is the path I followed (though I am sure
there are many paths; I am too dumb and tired to dream up another right now.
Perhaps that is why I am so eager to put this all out here in a "gushy" way --
my apologies):

First I implemented simple forward mode AD in Python (with NumPy, handling
gradients and Hessians so you could apply AD to find extrema of constrained
variational problems, (to design constrained B-spline geometry for example...
or construct equations of motion), then extended to handle interval analysis,
which introduces a tiny bit of topology (fixed point theorems and such -
provably find all solutions to nonlinear systems over some domain).

Then pick up declarative programming (unification) and revisit overloading to
build up a mathematical expression tree for anything you have implemented. (In
detail, overload math such that, for some object which is a mathematical
combination of two other objects, store in the new object's memory some
representation of the two original objects, and also the operation used to
combine them. Remember to track your automatically generated-connectives! (all
expressions are ternary - n-ary stuff get broken out into ternary stuff) )

Then starting at the final "node" of this mathematical tree, walk back down
the tree to build whatever you want. (Reverse mode ad can be done this way,
and is a good thing to try) I used it to automatically compile math down into
logical rules for constraint programming. Much fun!

Now re-do it all in C++ but use overloading to build a math lib to support
everything, or, use eigen, or some other expression template lib. (I'm not
done with this.)

Ah, expression template libraries are another instance of "reverse mode style"
expression tree computation. AKA the "interpreter pattern". Good fun here, but
more involved I think.

------
mlevental
here's a very readable survey paper

[http://jmlr.org/papers/volume18/17-468/17-468.pdf](http://jmlr.org/papers/volume18/17-468/17-468.pdf)

------
Bostonian
What do automatic differentiation tools do for functions with integer
arguments?

~~~
osipov
Great question. Both TensorFlow and PyTorch pretty much do nothing
[https://github.com/tensorflow/tensorflow/issues/20524](https://github.com/tensorflow/tensorflow/issues/20524)

~~~
mcabbott
That seems strange, if I'm reading it correctly.

In the author's packages, integers which label dimensions etc. have gradient
`nothing`, but arrays which happen to contain integers do not signal anything:

    
    
        julia> Zygote.gradient((x,d)->abs(sum(sin,fft(x,d))), [2 2; 0 0], 2)
        (Complex{Float64}[-0.346356+0.0im 1.65364+0.0im; -2.0+0.0im 0.0+0.0im], nothing)

------
adamnemecek
Autodiff is insane. Kids waste how much time in school doing analytical
calculus like troglodytes.

~~~
gugagore
It's good to question what we teach. Like I never learned any methods for
computing a square root, to arbitrary precision, whereas my parents did.

But autodiff doesn't replace symbolic differentiation. Consider the derivative
of cuberoot(x^3). Autodiff struggles at x=0 because x^3 is flat (slope 0) and
cuberoot is vertical (slope inf). And together, with autodiff, you get slope
NaN.

If you wanted to try to fix that problem with autodiff, well first off it's
likely not worth it because similar problems are inherent to autodiff, but
you'll need an analytic understanding of what's going on.

~~~
adamnemecek
I'm out right now but this doesn't sound right. How does autodiff struggle
with it?

~~~
tobmlt
The differentiation of a fractional power (fraction less than 1.) involves a
negative fractional power. 0 raised to some negative power involves dividing
by zero. This gives you some kind of NAN in, e.g., vanilla languages such as
Python. The "issue with automatic differentiation" is really an issue with
dividing by zero. Graph x __(1. /3.). It's slope is vertical at zero.

~~~
gugagore
I don't think this is right. An autodiff package will likely report +inf for
the derivative of cuberoot at 0.

The problem is really that there are a lot of ways the function can have
infinite slope. The sqrt function also has infinite slope at 0.

Dividing by zero isn't where the NaN comes from.

~~~
tobmlt
My mistake then. I did not intend to try and be so precise. I was thinking
loosely. I have a very simple autodiff tool hand rolled (in Python) here to
play with. I raise 0 to a negative exponent in the gradient of the cube root
at 0. I do not get a NAN. Strictly speaking I get this:

"ZeroDivisionError: 0.0 cannot be raised to a negative power"

Is that satisfying? I do not claim anything more specific here. Sorry to be
imprecise.

