
Automatic Differentiation: The most underused tool in the machine learning toolbox? - gaika
http://justindomke.wordpress.com/2009/02/17/automatic-differentiation-the-most-criminally-underused-tool-in-the-potential-machine-learning-toolbox/
======
mattj
Easiest answer: If you're using neural nets (his example), you could just
write the backprop algorithm. Chances are performance matters, so you can hand
tune your code to generate the best assembly.

Most of machine learning work involves huge data sets. You divide your time
between cleaning up / massaging your data until it's usable, coming up with
models, deriving properties of the models, implementing inference for those
models, and, most importantly, tuning your code so you can actually get
meaningful results on huge datasets.

Doing the differentiation is, by far, the easiest part of all of that.

Also, in many cases, your model won't have a tractable form (like, say,
requiring you to sum over all permutations in your data set at each step of
your training). You have to come up with ways of approximating these results,
often using sampling techniques.

Being able to find a derivative to a function that takes O(n!) time to
calculate exactly isn't exactly useful - for gradient optimization methods,
you'll often have to calculate the value more often than the gradient.

Basically, when finding a derivative is feasible it's more useful and not much
more work to derive it yourself.

~~~
timr
That may be true of neural net research, but I think there are still tons of
places where automatic derivative calculation can be a big win. In protein
simulation/design, for example, people spend a _lot_ of time and effort coming
up with derivatives for functions that do things like calculating the change
in potential/kinetic energy of a protein side-chain atom, given a perturbation
in one of the backbone angles. It's not always trivial to come up with
efficient methods for derivatives in these problems.

The one real limitation here seems to be that you have to _know_ that your
function is differentiable (over the domain of interest) to use autodiff
software. That can be difficult to determine. However, some of these packages
say that they're able to detect non-differentiability, so even that point may
be moot, if they can do it reliably, in advance.

~~~
lliiffee
To my knowledge, most autodiff tools will tend to silently ignore
discontinuities, as long as you don't compute them at a non-differentiable
point. E.g. if you have

    
    
      y = abs(x)
    

you will get back the derivative

    
    
      g = sign(x).
    

This works as long as you don't try x=0. Similar things would happen for
floors, rounding, if statements, etc. In general, as long as each local
operations is differentiable, the whole program will be. That's isn't too hard
to check.

------
yummyfajitas
Ok, reading that article left me with one important question: wtf is automatic
differentiation?

Luckily wikipedia exists.

<http://en.wikipedia.org/wiki/Automatic_differentiation>

~~~
mark_h
There's a very accessible paper by sigfpe on both the topic in general, and an
implementation using operator-overloading in C++:
<http://homepage.mac.com/sigfpe/paper.pdf> (Automatic Differentiation, C++
Templates and Photogrammetry)

I see now that it's linked from the wikipedia page, but I still think it's
worth pointing out. That was my introduction to it anyway.

------
tectonic
Very cool, I didn't know about this.

Python library for this: <http://www.seanet.com/~bradbell/pycppad/index.xml>

