
Automasymbolic Differentiation - tel
http://jtobin.ca/blog/2014/07/06/automasymbolic-differentiation/
======
yummyfajitas
Am I the only one who thinks this is basically a computational implementation
of nonstandard analysis?

[edit: There was an excellent reply here which is now deleted, which said
essentially that it's not exactly nonstandard analysis. Nonstandard analysis
requires you extend R to a field containing infinitesimals.

In contrast, all that is necessary here is extending R to an algebra with a
nilpotent element (e^2 = 0).

[http://en.wikipedia.org/wiki/Dual_number](http://en.wikipedia.org/wiki/Dual_number)
]

------
idunning
There is a lot of support and interest in automatic differentiation [1] in the
Julia [2] community, partly because the language design makes it relatively
easy to do. In fact, there is a whole "organization" dedicated to AD packages,
JuliaDiff [3]. In particular there are packages for dual numbers and their
generalizations, as well as reverse-mode AD packages. ReverseDiffSparse.jl,
for example, uses some clever tricks including graph coloring to create very
efficient Hessian matrices.

You can make use of AD for more than just playing around too, esp. for
optimization (JuliaOpt [4]): Optim.jl will use them to calculate exact
derivatives if you don't provide them, and JuMP.jl will use them to calculate
the sparse Jacobian and Hessian matrix for a nonlinearly constrained
optimization problem (which can be used by, e.g. Ipopt.jl)

[1]:
[http://en.wikipedia.org/wiki/Automatic_differentiation](http://en.wikipedia.org/wiki/Automatic_differentiation)

[2]: [http://julialang.org/](http://julialang.org/)

[3]: [http://juliadiff.org/](http://juliadiff.org/)

[4]: [http://juliaopt.org/](http://juliaopt.org/)

~~~
mlubin
To follow up on this, a lot of the overhead from reverse-mode AD can be
avoided by flattening out the expression graphs and compiling specialized
functions (at runtime). A number of the JuliaDiff packages support this
approach. This is not a new idea, but Julia's hooks to LLVM make this
surprisingly easy to do.

------
mjw
Automatic differentiation is very nifty and should be more widely known about.
Many machine learners will already know about reverse-mode AD under a
different name though: backpropagation.

The author might be interested in the theano library used to compute gradients
for neural models in pylearn2. It supports performing some algebraic
simplifications on the compute graph before running AD. I'm not sure if it
supports exactly what's being proposed here, but it does seem to enable some
interesting hybrid symbolic / automatic approaches.

