
Differentiating SSA-Form Programs in Julia (2018) [pdf] - idiliv
https://arxiv.org/abs/1810.07951
======
cs702
As the author explains, this approach to automatic differentiation (AD) via
transformation of source code "supports control flow, higher-order functions
and nested derivatives. The differentiated code can be further fed into a
traditional compiler such as LLVM, which results in an extremely efficient
derivative program. Further, it opens up the opportunity for robust
traditional compiler techniques to be extended to machine learning, enabling
kernel fusion or compilation for accelerators with no artificial limitations
on the kinds of models that researchers can express. This combination has not
previously been possible in a high-level, general-purpose programming
language."

The author's package, Zygote, makes _all_ Julia code differentiable, so _any
program_ can be optimized as an ML/AI model to learn a set of parameters given
some training objective.[a]

:-)

[a] That said, you will not be able magically to overcome the limits of
mathematics, in case you're wondering. See darawk's and b_tterc_p's comments
below.

~~~
darawk
This seems almost tautologically false to me. This seems to imply that if you
implement SHA-256 in Julia, you can differentiate it and solve it backwards
via SGD. Is there something i'm missing here? There must be _some_
limitations.

~~~
nextos
Well, the limitations are the usual limitations in mathematical analysis.
Despite being differentiable, if your function has a very complicated
structure you will get stuck in local minima with high probability.

People are referring to limitations regarding expressiveness.

------
KenoFischer
Hi folks, this is part of of larger effort in the Julia community to do a
ground up re-think of infrastructure for machine learning. You can find an
overview of the entire effort in our recent blog post [1]. Happy to answer
questions.

[1] [https://julialang.org/blog/2018/12/ml-language-
compiler](https://julialang.org/blog/2018/12/ml-language-compiler)

~~~
chrispeel
As I recall, after the publication of [1] on Julia+TPUs there was some talk
about doing a comparison of some ML models on the CPU, GPU, and TPU. Is this
something you're working on?

[1]
[https://arxiv.org/pdf/1810.09868.pdf](https://arxiv.org/pdf/1810.09868.pdf)

~~~
KenoFischer
We have these numbers now and the results are quite encouraging (basically on
par with tuned tensorflow for the same model, while retaining significant
flexibility). At the moment, we're working with Google on stability and
waiting for a new public release of the tpu software stack to enable multicore
support for non-TF frontends.

------
infogulch
Is this related to previous discussion The Simple Essence of Automatic
Differentiation [1]?

[1]:
[https://news.ycombinator.com/item?id=18306860](https://news.ycombinator.com/item?id=18306860)

