
First-Class Automatic Differentiation in Swift - bmc7505
https://gist.github.com/rxwei/30ba75ce092ab3b0dce4bde1fc2c9f1d
======
simonbyrne
This is really cool, and I'm excited to see how this goes. When people first
hear about automatic differentiation, their first reaction is "that's cool,
why isn't it everywhere?", and the answer has been "it's really hard to use in
practice".

Until recently, AD systems only worked on restricted subsets of languages, or
required extremely cumbersome template usage throughout the code. ML
frameworks like TensorFlow pushed AD into the mainstream: they are essentially
AD-compatible runtimes, but they are cumbersome to use (you have to use their
functions instead of the language-provided ones) and they miss potential
optimisations.

So it's become increasingly clear that the next step was to actually embed AD
as part of a language. I work on Julia, so I hope Zygote.jl
([https://github.com/FluxML/Zygote.jl](https://github.com/FluxML/Zygote.jl))
gets there first, but it's always good to see others moving in the same
direction.

------
blt
I desperately want to write my ML code in a statically typed language. I hope
one of these Swift-based projects takes off. But I am worried that they will
not treat Linux as a first-class platform.

~~~
amelius
> I desperately want to write my ML code in a statically typed language.

Why? ML programs are typically sufficiently compact to not require a type
system.

~~~
geezerjay
Type errors can happen in small programs. In ML applications, which are
computationally very expensive and take a lot of time to run, an avoidable
type error can result in a considerable waste of time.

------
paultopia
I have to confess to some confusion about how all these Tensorflow for Swift
changes work. Is this being built into a tensorflow-ey fork of Swift? Is it on
the roadmap for being incorporated into the core language? How does one tell
from these Github manifestos?

~~~
afthonos
Swift for Tensorflow adds language-level support for Tensorflow to Swift. It
does start in the fork, but the ultimate goal is for all changes to be merged
upstream. Some aspects, like dynamicLookup[0], which allows dot syntax for
methods resolved at runtime, are already in the language. Others, like
dynamicCallable[1] are accepted proposals.

[0] [https://github.com/apple/swift-
evolution/blob/master/proposa...](https://github.com/apple/swift-
evolution/blob/master/proposals/0195-dynamic-member-lookup.md) [2]
[https://github.com/apple/swift-
evolution/blob/master/proposa...](https://github.com/apple/swift-
evolution/blob/master/proposals/0216-dynamic-callable.md)

~~~
saagarjha
> It does start in the fork, but the ultimate goal is for all changes to be
> merged upstream. Some aspects, like dynamicLookup[0], which allows dot
> syntax for methods resolved at runtime, are already in the language.

Personally, I feel like it would be unwise to merge in all of TensorFlow's
requirements into Swift. Many features, such as the ones you mentioned, are
actually useful to outside of the context of machine learning, so it does
actually make sense to make them generally available. But compiler support for
something as specific as @differentiable seems like an anti-goal for Swift.

~~~
bmc7505
The idea is that in the not-too-distant-future, differentiable programming
will become much more mainstream. c.f.
[https://medium.com/@karpathy/software-2-0-a64152b37c35](https://medium.com/@karpathy/software-2-0-a64152b37c35)

~~~
wool_gather
> Neural networks [...] represent the beginning of a fundamental shift in how
> we write software.

That's an...interesting statement, considering that neural networks are nearly
as old as software itself. The original Pitts-McCulloch paper was published in
1943. Active use of them has certainly been ongoing longer than that blog
author has been alive.

~~~
tree_of_item
And yet it's true, since they haven't been used in the way they are now. There
was no DeepMind in 1943. Computer science stuff usually takes a few decades to
mature.

------
adamnemecek
Dual numbers are magic. They are somewhat obscure but very powerful e.g for
reasoning about dynamically evolving systems. They are like complex numbers
but instead of i^2=-1 you have epsilon such that epsilon^=0. Shit gets real
from then on.

Dual number functions give you the value at the function as well as the
derivative at that point. Very cool.

~~~
nestorD
And if you want the second derivative... You have hyper-dual numbers!
[http://adl.stanford.edu/hyperdual/](http://adl.stanford.edu/hyperdual/)

~~~
eigenspace
That’s not normally necessary. One can often just use tagged dual numbers.

I guess one might consider tagged duals hyperduals.

~~~
adamnemecek
What’s a tagged number? Google isn’t helpful.

------
pacala
For whom it may concern, would be nice to consider autobatching [0] of DyNet
fame when designing AD systems. Makes the world of difference for NLP
applications.

[0]
[https://arxiv.org/pdf/1705.07860.pdf](https://arxiv.org/pdf/1705.07860.pdf)

------
dep_b
I come from business programming in Swift (the usual CRUD API based stuff) and
for me I completely get lost reading this once the formulas kick in. I have
this problem as well when reading about more theoretical algorithm articles.
Usually when I see the resulting code I do get it but I feel I miss out on a
lot of theory.

Can anybody help me grok the more theoretical stuff in this article?

~~~
paraschopra
This may help
[http://colah.github.io/posts/2015-08-Backprop/](http://colah.github.io/posts/2015-08-Backprop/)

------
georgewsinger
Still wish there was an easy way to render LaTeX in GitHub documents (the
author of this paper had to manually paste in images it looks like).

