
Calculus on Computational Graphs: Backpropagation - inetsee
http://colah.github.io/posts/2015-08-Backprop/index.html
======
xtacy
It's also known as "automatic differentiation" \-- it's quite different from
numerical/symbolic differentiation.

More information here:

\- [https://justindomke.wordpress.com/2009/02/17/automatic-
diffe...](https://justindomke.wordpress.com/2009/02/17/automatic-
differentiation-the-most-criminally-underused-tool-in-the-potential-machine-
learning-toolbox/)

\-
[https://wiki.haskell.org/Automatic_Differentiation](https://wiki.haskell.org/Automatic_Differentiation)

The key idea is extending common operators (+, -, product, /, key mathematical
functions) that usually operate on _real numbers_ to tuples of real numbers
(x, dx) (the quantity and its derivative with respect to some variable) such
that the operations preserve the properties of differentiation.

For instance (with abuse of notation):

    
    
        - (x1, dx1) + (x2, dx2) = (x1 + x2, dx1 + dx2).
        - (x1, dx1) * (x2, dx2) = (x1 * y1, x1 * dx2 + x2 * dx1).
        - sin((x, dx)) = (sin(x), cos(x)).
    

Note that the right element of the tuple can be computed precisely from
quantities readily available from the inputs to the operator.

It's also extensible to derivatives of scalars that are functions of many
variables by a vector (of those variables) (common in machine learning).

It's beautifully implemented in Google's Ceres optimisation package:

[https://ceres-solver.googlesource.com/ceres-
solver/+/1.8.0/i...](https://ceres-solver.googlesource.com/ceres-
solver/+/1.8.0/include/ceres/jet.h)

~~~
conistonwater
I don't think you're describing quite the same thing. You're talking about
forward-mode differentiation, whereas backpropagation corresponds to what's
usually called adjoint-mode differentiation (I think it's called reverse-mode
in the post). The difference is in computational efficiency when the number of
parameters is large.

------
versteegen
Anyone reading this who hasn't already should do themselves a favour and read
the other articles on colah's blog. Beautifully presented demonstrations of ML
algorithms, a number running live in your browser
[http://colah.github.io/](http://colah.github.io/)

------
jmount
My demonstration Scala automatic differentiation library: [http://www.win-
vector.com/blog/2010/06/automatic-differentia...](http://www.win-
vector.com/blog/2010/06/automatic-differentiation-with-scala/)

------
outlace
This is beautiful. I've never seen a more concise yet powerfully clear
explanation of backpropagation. This explanation is so fundamental in that it
relies on the fewest number of axioms.

------
plg
Love these tutorials. Not sure that the LaTeX font is the right choice for a
web page though.

------
misiti3780
this is easily the best explanation of back-propagation i have found on the
web - nice work

