From automatic differentiation to message passing [video] 64 points by adamnemecek 8 days ago | hide | past | web | favorite | 10 comments help

 >there is a follow up paper to the DL via Hessian-free optimization paper by James Martens that develops a variant of AD which calculates a special curvature quantity which is useful for efficient second order optimizationDeep Learning via Hessian-free Optimization: http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pd...Optimizing Neural Networks with Kronecker-factored Approximate Curvature: http://arxiv.org/abs/1503.05671James Martens' list of publications with links to sample code for the above two papers, slides/condensed conference versions, etc: http://www.cs.toronto.edu/~jmartens/research.htmlPretty neat stuff
 Haha thanks for finding links! Was on a bus on my phone, so didn't have the patience...
 > For example, would it be useful to have an AD-like pass that calculated trusted regions for gradient updates?To answer my own question, yes: http://papers.nips.cc/paper/7112-scalable-trust-region-metho...
 > Automatic differentiation is an elegant technique for converting a computable function expressed as a program into a derivative-computing program with similar time complexity. It does not execute the original program as a black-box, nor does it expand the program into a mathematical formula, both of which would be counter-productive. By generalizing this technique, you can produce efficient algorithms for constraint satisfaction, optimization, and Bayesian inference on models specified as programs. This approach can be broadly described as compiling into a message-passing program.
 > nor does it expand the program into a mathematical formula.Meh, this is not entirely accurate. Sure it doesn't expand the program into analytic functions whose derivatives are easy to compute; but it still handles the program symbolically. So, in a way, it still transforms the program into a known set of mathematical primitives of which it can then construct a program that computes the derivative in compile time.
 It seems that way to me too, but without more details about implementation I'm not sure.
 At the end of the presentation the presented mentions that this is structurally identical to loopy belief propagation... Isn't that a big issue, since they inherit many of its tractability issues with regards to training and inference? Modern DL models are far too interconnected for inference to be tractable in general, so the best we can hope for is that we can make simplifying assumptions that make loopy belief propagation feasible.As a side note, when modern compilers optimize abstract syntax trees, I'm pretty sure they do operations that are similar to the message-passing algorithm described. And they work great, albeit for specialized purposes.
 It seems to me that the message-passing aspect is kind of an implementation detail.In any case, compare and contrast with "Compiling to categories" http://conal.net/papers/compiling-to-categories/
 Another approach to AD is: https://github.com/keithalewis/epsilon.

Search: