
Differentiable Programming Manifesto - kristianp
https://github.com/apple/swift/blob/master/docs/DifferentiableProgramming.md
======
kragen
This looks like it's going to be a really big deal unless they totally fuck it
up. Autodiff greatly simplifies all kinds of continuous mathematical
optimization algorithms, and using mathematical optimization means you only
have to write code that can recognize things that look similar to a solution
instead of code that can find the solution — in a sense, mathematical
optimization derives your code from tests.

Autodiff is useful for other kinds of things too, not just mathematical
optimization, as they mention in the manifesto. This weekend I wrote a real-
time SDF raymarcher in a page of Lua
([https://gitlab.com/kragen/bubbleos/blob/master/yeso/sdf.png](https://gitlab.com/kragen/bubbleos/blob/master/yeso/sdf.png)
[https://gitlab.com/kragen/bubbleos/blob/master/yeso/sdf.lua](https://gitlab.com/kragen/bubbleos/blob/master/yeso/sdf.lua)),
but the surface shading is pretty lame because without sampling the SDF
multiple times it doesn't yet have any way to calculate surface normals. Guess
what a surface normal is? It's the gradient of the SDF, and reverse-mode
autodiff can sample the SDF and its gradient in only double the time of just
sampling the SDF.

Pretty much anything you do in continuous domains can get simpler and/or
better-performing with autodiff. This is why I've written so much about it in
Dercuano.

It's worth pointing out that you don't need to redesign your language to get
first-class differentiable programming. Source-to-source transformation has
been used to do autodiff in FORTRAN for at least ten years, including on
pretty massive applications. (This is mentioned in the manifesto.)

~~~
seanmcdirmid
Conal Elliott (the guy who co-invented FRP) did a nice project on shading
computational surfaces by extracting derivatives from code more than a decade
ago: [http://conal.net/Vertigo/](http://conal.net/Vertigo/)

He is now very active in the differentiable programming community, but he got
started on the problem in the graphics field.

~~~
kragen
The manifesto links one of his papers, "The simple essence", which I've
started but haven't finished.

~~~
seanmcdirmid
He has been working on that for a long time. Though much of his earlier work
is symbolic differentiation, I’m not very familiar with the more recent AD
stuff.

------
dual_basis
I like differentiable programming a lot, but can anyone explain the motivation
for Swift as their language of choice for this? I assume it was simply because
they wanted to get Chris Lattner, and Swift was his baby. Julia seems like a
much better target for this, I wish they had just dedicated some money or
resources to that. I don't have a problem with Swift's language design per-se
(although Julia's dynamic nature really lends itself more naturally to the
sort of problems you might be trying to solve using differential programming,
in my opinion) but its cross-platform support is terrible. Why would you
choose Mac's boutique language as the foundation of your new paradigm?

~~~
gwd
> I like differentiable programming a lot, but can anyone explain the
> motivation for Swift as their language of choice for this?

Back in January, Jeremy Howard (of fast.ai fame) wrote up a blog post talking
about why he was exploring using Swift for this purpose:

[https://www.fast.ai/2019/01/10/swift-
numerics/](https://www.fast.ai/2019/01/10/swift-numerics/)

The post is quite in-depth; but just to give a brief take on other potential
languages:

8<\---

Here’s my personal view of some languages that I’ve used and enjoyed, but all
of which have limitations I’ve found frustrating at times:

 _Python_ : Slow at runtime, poor support for parallel processing (but very
easy to use)

 _C, C++_ : hard to use (and C++ is slow at compile time), but fast and (for
C++) expressive

 _Javascript_ : Unsafe (unless you use Typescript); somewhat slow (but easy to
use and flexible)

 _Julia_ : Poor support for general purpose programming, but fast and
expressive for numeric programming. ( Edit: this may be a bit unfair to Julia;
it’s come a long way since I’ve last looked at it!)

 _Java_ : verbose (but getting better, particularly if you use Kotlin), less
flexible (due to JVM issues), somewhat slow (but overall a language that has
many useful application areas)

 _C# and F#_ : perhaps the fewest compromises of any major programming
language, but still requires installation of a runtime, limited flexibility
due to garbage collection, and difficulties making code really fast (except on
Windows, where you can interface via C++/CLI)

~~~
C1sc0cat
Not sure why Julia is ruled out as not a general purpose language here - I got
the impression that Julia was for numerical ie Technical programming

~~~
DagAgren
Because it would be a challenge to implement, for instance, an actual user-
facing application in Julia.

~~~
ddragon
Mostly because the end-user application community is not as strong in Julia
(though you can create web services/sites and GUIs right now), in the same way
the scientific/numerical computing/machine learning/HPC isn't as strong in
Swift. Perhaps the point he made was that since his area is ML, then he could
contribute better in trying to create a community around that in Swift than he
could bring the other domains to Julia up to par with those languages by
adding another competing library.

And in terms of the language itself, Julia is very much a general purpose
language. Exceptionally general purpose as it's basically a Lisp below the
surface, so it can not only support any domain, it can be comparatively easily
extended to better support them. Differentiable programming is one such
example, as the compiler was not designed for that, but you can just import
this functionality as a library. Though the focus is still from desktop to
HPC, as opposed to mobile/IoT to desktop (and apple focused) like Swift, which
makes so having both languages supporting differentiable programming an
overall positive over having only one of these languages.

------
targonca
So now everyone's pet project gets a language extension in Swift? Apple
extended the language with @_functionBuilder for SwiftUI. Now Google wants to
extend it with this differentiable stuff for TensorFlow.

Why won't they implement proper metaprogramming support instead of
proliferating the core language?

~~~
classified
> Why won't they implement proper metaprogramming support instead of
> proliferating the core language?

Proper metaprogramming support is hard. Polluting the language with special
case hocus pocus is relatively easy.

~~~
marmaduke
> Polluting the language with special case hocus pocus is relatively easy

or if it's well done is it still hocus pocus pollution?

~~~
classified
Then it's pollution with compiler magic. But only if it's _really_ well done.

------
jpz
Does anyone else dislike the usage of the word "manifesto" in computing? It
always seems to me to indicate a faith-based rather than evidence-based
commitment to some train of thought.

Seems strange to see manifestos discussed in the ML space, which is nominally
a measure-oriented world. Mind you, when has statistics not been polemical.

~~~
new2628
My experience is that evidence-based commitments are overrated and are usually
a post-hoc rationalization for faith- (or gut-feeling)-based ones.

------
fuklief
Meanwhile, Google researchers (Plotkin and Abadi) are going to present "A
Simple Differentiable Programming Language" at POPL20[1] in January.

I can't find a preprint though :'(

[1]: [https://popl20.sigplan.org/track/POPL-2020-Research-
Papers#e...](https://popl20.sigplan.org/track/POPL-2020-Research-Papers#event-
overview)

~~~
ml_thoughts
Went up this morning on arXiv:
[https://arxiv.org/abs/1911.04523](https://arxiv.org/abs/1911.04523)

------
nuclx
Striving for concise syntax for mathematical operations is a noble intention,
but it's hardly a good idea achieve that by shoehorning changes into an
existing language.

A language framework designed for extensibility has to be the basis allowing
for the implementation of concepts such as differential or probabilistic
programming, and it's not Haskell, i.e. we're not there yet.

~~~
byt143
Julia,'s source to source autodiff is implemented as a custom compiler pass in
a third party package. Similar approached are bring taken by prob programming
packages.

------
bcheung
Pardon my ignorance but I have a few questions...

1) This seems like it would only work for varying the values passed to
variables in a function. Can you differential a function other than just
changing some values -- like using a genetic algorithm. Seems like this is
just finding the right values to pass to a function not finding a different
function itself.

2) Is this limited to only continuous / ordered values? How can you
differentiate logic, branching, and what other functions to call? Seems like
to do so would require mapping a function to some kind of n-dimensional
euclidian space / manifold.

3) Why does this need to be an extension to a language? Can't ordinary
interfaces in the OOP sense or monads in the FP sense be used to wrap
functions and give them this functionality for free?

~~~
ddragon
1) It's not restricted to the variables being passed, for example Zygote [1]
has an example in the README with IO gradient(x -> fs[readline()](x), 1), and
it's not using numerical differentiation (varying inputs to check the output),
but finding a formula (approximate, not a closed form like a symbolic
differentiation) of how changing the output affects the input. Genetic
algorithms is a black box metaheuristic, so people would favor using gradient
descent given that the code is differentiable, but how the model is optimized
is technically open to any method.

2) Yes, it can differentiate all control flow. What happens is that each
forward/backward pass will have a different graph based on what branches it
would follow (effectively a subderivative). Each one of these graphs is
independently differentiable and do not contain the control flow by
themselves.

3) It doesn't need a language extension (Julia doesn't, but that's because you
can access Julia IR using the language itself, not many language support this
level of metaprogramming). The OOP strategy (like Pytorch) overload methods so
every time you call the overloaded method it builds the graph. It depends on
customized types and methods and does not support code that isn't using those
custom types that can hold the graph (native/custom types), native control
flow (only implicitly by changing what graph is being created), reduces the
possible optimizations (you only have one subderivative, not the complete
graph to optimize) and end up creating a DSL with it's own error messages and
quirks that is less natural than just using the host language. I can't say how
it would look using monads though.

[1] [https://github.com/FluxML/Zygote.jl](https://github.com/FluxML/Zygote.jl)

------
mratsim
So the software is improving by leaps and bounds but where is the hardware?

OpenCL --> deprecated

Cuda --> No Nvidia GPU

ROCm/HIP: does it work on Mac or are you strong-armed into using Apple Metal?

Metal --> I couldn't find an AXPY example

------
Nelkins
I was recently looking at a library in F# that does something similar:
[https://diffsharp.github.io/DiffSharp/](https://diffsharp.github.io/DiffSharp/)

Also related: [https://github.com/fsprojects/fsharp-ai-
tools](https://github.com/fsprojects/fsharp-ai-tools)

------
keithalewis
There is a simpler way to do this;
[https://github.com/keithalewis/epsilon/blob/master/README.md](https://github.com/keithalewis/epsilon/blob/master/README.md).

