
Swift: Google’s Bet on Differentiable Programming - BerislavLopac
https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/
======
gonzus
In all honesty, this sounds to me like a whole lot of BS and hype. The name is
pretentious, the quotes are ridiculous ( _" Deep Learning est mort. Vive
Differentiable Programming."_; _" there will be a need for the creation of a
whole new set of new tools, such as a new Git, new IDEs, and of course new
programming languages"_). Maybe I am just ignorant and fail to grasp the
importance of such capabilities in a machine learning context, but I have to
say, the grandstanding is a bit grating.

Who knows, perhaps this _will_ become the greatest thing since Lisp... What do
I know.

~~~
hinkley
A particularly salty developer I knew ages and ages ago once said
(apocryphally, apparently) that there was an old Inuit proverb, “everyone
likes the smell of their own farts.”

Google is still taking themselves very seriously while everyone else is
starting to get bored.

The problem with being 25 is that you have about 8 years ahead of you before
you figure out how full of shit everyone is in their twenties, and maybe
another 8 before you figure out that _everyone_ is full of shit and stop
worrying quite so much about it.

~~~
dreamcompiler
Sturgeon's law applies both spatially and temporally.

~~~
lamontcg
This doesn't seem to be crap, but it does seem to be hype.

You can use CasADI to do automatic differentiation in C++, Python and Matlab
today:

[https://web.casadi.org/](https://web.casadi.org/)

Tight integration with the language may be beneficial in making it simpler to
write, but its not like you can't do this already in other languages. Baking
it into the language might be useful to make it more popular. Nobody should be
doing the chain rule by hand in the 21st century.

------
flipgimble
Reading through the top level comments, they are all a form of surface level
aversion to the unfamiliar. It really highlights that many industry trends are
based on hot takes and not any sort of deep analysis. This makes sense why
Javascript and C++ will remain as entrenched favorites despite their technical
flaws.

For those who actually spent time with Swift, and realize its value and
potential, consider yourself lucky that large portions of the industry have an
ill informed aversion to it. That creates opportunity that can be taken
advantage of in the next 5 years. Developers who invest in Swift early can
become market leaders, or run circles around teams struggling with slowness of
python, over-complexity of c++.

Top three comments paraphrased:

> 1) "huh? Foundation?"

but you have no qualms with `if __name__ == "__main__"` ?

> 2) "The name is pretentious"

is that an example of the well-substantiated and deep technical analysis HN is
famous for?

> 3) Swift is "a bit verbose and heavy handed" so Google FAILED by not making
> yet another language.

~~~
sparrc
You're completely misinterpreting (I'm the author of (1)).

C++ and javascript are languages of professional software engineers, of which
there are many many more languages with various pros and cons.

Python has been the defacto standard in scientific/data/academic programming
for decades. The only other language you could say rivals it would be MATLAB
which is even more simplistic.

My point is that simplicity and clarity matters to people who don't care that
much about programming and are completely unfocused on it, they are just using
it do get data for unrelated research.

'if __name__ == "__main__"' is not in the example code nor is it a required
part of a python program so not really sure what your point is here.

~~~
seccess
> Python has been the defacto standard in scientific/data/academic programming
> for decades

In my experience (Genomics) this is simply not true. Python has caught on over
the last 5 or so years, but prior to that Perl was the defacto language for
genetic analysis. Its still quite heavily used. Perl is not a paragon of
simplicity and clarity.

~~~
Darvon
I was in academic compsci/ai from 2001-2017 and it was entirely c++ and python
in my department, except for one oldschool professor who used delphi.

~~~
seccess
Haha, there is always one :)

I feel like trying out various languages/frameworks would affect compsci labs
a lot less than other fields, since the students probably have some
foundational knowledge of languages and have already learned a few before
getting there. Might be easier for them to pick up new ones.

------
yahyaheee
Google really messed up here, they had an unprecedented opportunity to create
a new language for numeric computation and rally the scientific community
around that. They hired Chris Lattner and he basically steamrolled that dream
by forking Swift.

I don’t see people running over here to write numerical libraries like you see
in Julia, that’s largely because of the crowd around Swift. The language is
also a bit verbose and heavy handed for what data scientists would prefer.
Latner was too close to Swift to understand this. The blame really falls on
google project management.

~~~
okareaman
I don't understand why Swift for Windows has not been updated for 2 years. I
guess no one, especially Apple, cares about Swift becoming a general purpose
language. For that reason alone, I'm skipping over it, although I hear
interesting things about it.

[https://swiftforwindows.github.io/](https://swiftforwindows.github.io/)

~~~
rvz
A quick search on the Swift forums brings you an announcement that the Swift
team is going to support the Windows port which is already mostly finished and
will be available in Swift 5.3 [0] and above [1].

[0] [https://swift.org/blog/5-3-release-
process](https://swift.org/blog/5-3-release-process)

[1] [https://forums.swift.org/t/on-the-road-to-
swift-6/32862](https://forums.swift.org/t/on-the-road-to-swift-6/32862)

~~~
okareaman
"Saleem Abdulrasool is the release manager for the Windows platform
(@compnerd), is a prolific contributor to the Swift project and the primary
instigator behind the port of Swift to Windows." Saleem's github,
[https://github.com/compnerd](https://github.com/compnerd), lists swift-win32
repo, which is a "a thin wrapper over the Win32 APIs for graphics on Windows."
So it's one person wrapping Win32. Not too promising yet, but it's early and
there's room for Windows programmers to get involved.

~~~
rvz
Incorrect. That GitHub repository isn't the Swift port for Windows.

This is the _actual_ port which has the CI and Installer for Swift on Windows:
[0]

[0] [https://github.com/compnerd/swift-
build/releases](https://github.com/compnerd/swift-build/releases)

~~~
okareaman
I took that text from "5-3-release-process" I'm not talking about Swift
compiling on Windows, I'm talking about the GUI situation, but I'll install it
and hopefully be pleasantly surprised with a full featured GUI SDK. But don't
get me wrong, a supported compiler and std lib for Windows from Apple is a
fantastic start.

------
sparrc
IMO the first three lines of the program basically explain why academics and
data programmers are never going to use Swift:

Python:

    
    
      import time
      for it in range(15):
         start = time.time()
    

Swift:

    
    
      import Foundation
      for it in 0..<15 {
         let start = CFAbsoluteTimeGetCurrent()
    

This is why people like Python:

\- import time: clearly we are importing a 'time' library and then we clearly
see where we use it two lines later

\- range(15): clearly this is referring to a range of numbers up to 15

\- start = time.time(): doesnt need any explanation

This is why academics and non-software engineers will never use Swift:

\- import Foundation: huh? Foundation?

\- for it in 0..<15 {: okay, not bad, I'm guessing '..<' creates a range of
numbers?

\- let start = CFAbsoluteTimeGetCurrent(): okay i guess we need to prepend
variables with 'let'? TimeGetCurrent makes sense but wtf is CFAbsolute? Also
where does this function even come from? (probably Foundation? but how to know
that without a specially-configured IDE?)

EDIT: Yes everyone, I understand the difference between exclusive and
inclusive ranges. The point is that some people (maybe most data programmers?)
don't care. The index variable you assign it to will index into an array of
length 15 the way you would expect. Also in this example the actual value of
'it' doesn't even matter, the only purpose of range(15) is to do something 15
times.

~~~
naasking
> \- range(15): clearly this is referring to a range of numbers up to 15

Is it? Does that range start at 0 or 1 or some other value? Does it include 15
or exclude it?

> \- start = time.time(): doesnt need any explanation

Doesn't it? Is that UTC or local time? Or maybe it's CPU ticks? Or maybe the
time since the start of the program?

You've basically just demonstrated the assumptions you're used to, not any
kind of objective evaluation of the code's understandability.

~~~
ericlewis
I also agree that the Swift is more clear, but only because it seems very
"this is what is happening, go look up what you don't know"

disclaimer: I use swift, but also I have used python.

~~~
earthboundkid
Languages that dump crap into the namespace on import are doing the wrong
thing. Python has from x import * and every style guide says to never use it.
Swift has a lot of other nice features, but the import thing is really a
bungle. It is worse for everyone, beginners and experienced users alike. It is
even bad for IDE users because you can't type imported-thing.<tab> to get the
autocomplete for just the import. You're stuck with the whole universe of
stuff jamming up your autocomplete.

~~~
freck
`import func Foundation.CFAbsoluteTimeGetCurrent` imports just that function.

------
fluffything
Can't you do this in Rust by writing a #[derivative] proc macro without having
to modify the language?

    
    
        #[derivative]
        fn cube(x: f32) -> f32 { x * x * x }
        // expands to 
        // impl Derivative for cube {
        //    type Args = (f32,);
        //    type Return = f32;
        //    fn derivative(x: Self::Args) -> Self::Return {
        //        3 * x * x 
        //    }
        // }
    
        let cubeGrad = gradient(cube);
        assert_eq!(cubeGrad(2), 12);
    

where `gradient` is just a normal Rust function:

    
    
        fn gradient<F: Derivative>(_: F) -> impl Fn(F::Args) -> F::Return {
            |x: F::Args|  Derivative::derivative(x) 
        }
    

My original attempt was accidentally more powerful, in that it allowed `cube`
to be a stateful function like a closure.

To implement this, the only thing one needs to do in the `#[derivative]` macro
is:

* parse the function into an AST

* fold the AST using symbolic differentiation

* quasi quote the new AST

Swift can probably differentiate functions in other translation units because
it keeps the bytecode of all functions around. A proc-macro based approach
wouldn't be able to achieve that, at least, not in general, but a Rust plugin
could, since plugins can drive compilation to, e.g., compile whatever function
is being derived, and differentiate its bytecode instead of Rust, and if the
function happens to be in a different translation unit, access their byte
code.

~~~
jasode
_> Can't you do this in Rust by writing a #[derivative] proc macro without
having to modify the language?_

That question can be generalized to _" since <Turing Complete language> can do
<anything-we-want-via-
library/function/macro/codegen/customcode/extension/whatever>, can't we just
do that instead of modifying the core language?"_

Well yes, but the advantage of having 1st-class language syntax includes:

\+ more helpful compiler error messages because it has _intelligence of the
semantics_

\+ more helpful IDE because the UI has intelligence about the intentions of
the programmer

\+ interoperability because programmers are all using the same syntax instead
of Programmer Bob using his custom macro-dialect and Progammer Alice using her
idiosyncratic macro-dialect.

~~~
fluffything
> That question can be generalized to "since <Turing Complete language> can do
> <anything-we-want-via-
> library/function/macro/codegen/customcode/extension/whatever>, can't we just
> do that instead of modifying the core language?"

Not really, since that usually requires using weird syntax or APIs, but the
Rust proc-macro solution has the exact same syntax as the Swift one:

    
    
        @differentiable
        ...cube function
        let grad = gradient(at: cube);
    

vs

    
    
        #[derivative]
        ...cube function....
        let grad = gradient(cube);

~~~
jasode
_> , but the Rust proc-macro solution has the exact same syntax as the Swift
one_

You're focusing on _syntax_ but I was talking about the higher-level
_semantics_. When programmers write custom code (e.g. custom macros), as far
as the compiler/IDE is concerned, it's just an opaque string that it checks
for valid syntax. Those tools have no "intelligence" of what the higher
concepts of derivatives/deltas etc.

E.g. the IntelliJ plugin for Rust will have no idea what a the higher
semantics of a _differentiable_ function is. It just sees a recursive
"gradient()" which is just an _arbitrary string_ of code to parse and it might
as well be spelled "abcdefgzyxwvut123()".

I.e. the blog post is talking about an _symbiotic ecosystem_ of tools (e.g.
compilers, IDEs, etc) that treats differentiable programming as a 1st-class
language feature. The ultra flexibility of Rust or Lisp macros to _reproduce
the exact syntax_ doesn't really solve the _semantics_ understood by the
ecosystem of tools.

edit reply to: _> You seem to be at least suggesting that if one adds
differentiable programming as a first class language feature, IDEs would
automatically be intelligent about it,_

No, that's an uncharitable interpretation. I never claimed that IDEs will
automagically understand higher level semantics with no extra work required to
enhance the IDE's parser.

I'm only saying that your macro proposal only _recreates the syntax but not
the semantics_ \-- so you're really not fully solving what the blog author is
talking about.

~~~
fluffything
> No, that's an uncharitable reading. I never claimed that IDEs will
> automagically understand higher level semantics.

So what are you saying? Because that's what I'm honestly understanding from
your post.

That somehow this feature being a Rust macro means that it cannot be as good
as if it was a first class feature. You are making this claim in general, as
if first class features are always strictly better than user-defined features,
yet this claim isn't true, and you haven't mention any particular aspect of
automatic differentiation for which this is the case.

Swift doesn't have proc macros, so they have to make the feature first class.
For languages with proc macros, the question that one should be asking is:
what value does a first class language feature adds over just using a proc
macro.

------
Eridrus
I'm surprised by all the negativity in this thread. The need to write C++
components for models is pretty real when you need things to be fast and run
on CPU. God forbid you ever want to deploy that on some sort of accelerator,
necessitating a rewrite to CUDA or whatever language that accelerator speaks.

Maybe that's too niche of a subject for people to really care about, and maybe
it's going to fail now that Lattner is gone (though there still seemed to be a
bunch of activity last I checked), but the problem it is trying to solve is
real.

~~~
gigatexal
Perhaps it could be this: if my value comes from running some crazy
interesting high level Math derived machine learning model on data and I’m
highly paid the opportunity cost to invest the time and effort to both become
good at and write C++ is higher than if I can do the same thing in a few lines
of python and just throw some more compute at the problem since that’s a fixed
cost.

~~~
867-5309
parsing this single sentence would constitute passing an EFL exam in my book

~~~
gigatexal
Yes, yes, I tend to ramble.

~~~
carapace
> Perhaps it could be this: if my value comes from running some crazy
> interesting high-level Math-derived ML model on data, and I’m highly paid,
> then the opportunity cost of becoming good at C++ is higher than if I can do
> the same thing in a few lines of python and just throw some more compute at
> the problem, since that’s a fixed cost.

Add some hyphens and commas and shed a bit of cruft, and it's fine IMO. (I
hope you don't mind my speculative editing.)

~~~
gigatexal
I appreciate the feedback. Defending wordy stream-of-consciousness writing
style is not a hill I want to die on.

I did realize that if I have to throw a million cores at something vs. 1000
perhaps it would make sense to spend the effort or buy the time from an expert
in C++ so as to save on the compute cost. But then, what if those million
cores are only needed for a a day or an hour? Then python or some other rapid
prototyping language would make a bit more sense imo.

~~~
carapace
Cheers!

------
pjmlp
Until Google actually offers Swift tooling comparable to Julia, Pytorch and
ML.NET on Windows or Android, I will keep ignoring it.

Note that the recent Tensorflow Summit had zero annoucements related to S4TF,
including post event blog posts.

From the outside it looks like there is a small group trying to push S4TF,
without regards for usability outside Google Cloud, while everyone else is
doing JavaScript, Python and C++ as usual.

~~~
lalord
Yeah I agree that the tooling at the moment is inadequate, specially on linux,
but both Apple and Google are working towards improving it.

Regarding the tensorflow dev summit, it was supposed to be 2 days long
initially, with the S4TF talk taking place on the second day. One week before
the summit, the whole day 2 got scraped due to covid19 though. So the
intention was there at least.

~~~
pjmlp
So not enough stuff to at very least write a blog post?!

------
m12k
"So, what is differentiable programming?

In a nutshell, differentiable programming is a programming paradigm in which
your program itself can be differentiated. This allows you to set a certain
objective you want to optimize, have your program automatically calculate the
gradient of itself with regards to this objective, and then fine-tune itself
in the direction of this gradient. This is exactly what you do when you train
a neural network."

Isn't this just declarative programming as we know it from e.g. Prolog, SQL or
other places where the programmer declares what their objective is, and it's
left up to the interpreter, compiler or scheduler to figure out the best way
to achieve that? And now that's being applied to ML (which probably makes
sense, since it involves a lot of manual tweaking). Sounds like a great use
case for a library, but hardly worthy of being called a new programming
paradigm.

~~~
ummonk
Eh, differentiable programming is a lot simpler and more specific than
declarative programming, and only marginally a "paradigm". It just means you
can take any (reasonably deterministic and parametrized) function specified in
the programming language and calculate its partial derivative wrt some
parameter.

~~~
downerending
Is the presumption that these parameters are floats, or at least numeric? It
seems like if I took the functions from some random program (e.g., GNU _diff_
), most would not be meaningfully differentiable. Or perhaps I'm missing
something?

~~~
ummonk
Yes, the parameters you're differentiating with respect to need to be floats.

Though there _might_ be potential for extending the frameworks to e.g.
differential cryptanalysis - I'm not knowledgeable enough about it to say how
much differential cryptanalysis can be done programmatically.

------
mjlee
Accidental Tech Podcast Episode 371 is essentially two and a half hours of
Chris Lattner:

[https://atp.fm/episodes/371](https://atp.fm/episodes/371)

It's a bit of an outlier, most ATP episodes are Apple focussed news with
adjecent tech interests. Episode 371 is not that.

~~~
yreg
Did ATP ever have a guest on the show before?

~~~
ld00d
IIRC, Lattner's the only interviewee, and this was his second appearance.

~~~
yreg
How could I have forgotten, they had Phil Schiller in 317.

[https://atp.fm/episodes/317](https://atp.fm/episodes/317)

------
garyclarke27
Nice article, learned a lot about Swift, love the named parameters and
argument labels, makes calling functions with many parameters much more
reliable and with labels reads like English, which is great for immediate
comprehension. Lack of Named parameters in Javascript, is for me, a critical
glaring missing capability, that should be fixed urgently.

~~~
enlyth
Just use an object with keys as the parameter names, this pattern is actually
very common so I don't see the issue.

doStuff({ foo: 3, bar: 7 })

~~~
tomp
Indeed. What that pattern also solves, and AFAIK most static languages lack,
is argument forwarding (though Python solves this even more elegantly than
JS):

    
    
        def f(x, **kwargs):
          return g(x, foo=1, **kwargs)

~~~
zeroxfe
Isn't the JS version just:

    
    
        function f(x, ...args) {
            return g(x, ...args);
        }
    

Or did you mean forwarding object fields?

    
    
        function f(x, params) {
           return g(x, {foo: 1, ...params});
        }

~~~
garyclarke27
JS parameters are named to some extent, as in you can refer to them by name
within the function body - BUT (a critical shortcoming in my view) you can’t
assign values by name when calling functions, you can only assign values by
position.

------
cutler
One thing to bear in mind with Swift is how difficult it is to stay up to date
with the latest version. Swift versions are tied to Xcode versions and Apple
has a history of leaving older versions of OS X in the dust. I'm on OS X
Mojave, which is only one version behind Catalina, and I'm already denied
access to Swift 5.2. Talk about building a community.

~~~
ken
Even worse, macOS is tied to hardware years, so even though I've got 3 working
Macintoshes here, only 1 of them can run the current version of Swift (and
just barely). My quad-Xeon can't run the latest minor upgrade, even though the
compiler is _faster_ than the previous one.

I'm sick of this pointless and never-ending upgrade cycle. I started a new
project this week, and it's in Clojure. I find it a much more productive
language, and I'm confident I can run it on nearly any computer from the past
20 years.

(This is probably not my most productive comment ever, but I'm pretty
frustrated right now with having chosen Swift in the past.)

~~~
machello13
This is only a problem if you're using Xcode-bundled Swift, right? It seems
like you can easily run Swift on earlier version of macOS completely
independently of Xcode if you want to.

~~~
cutler
Swift's main use - apps for iPhone, Mac OS X and Apple devices - most
certainly does tie it to Xcode.

~~~
machello13
Well, yes, but that's a completely different problem. Xcode is tied to
specific versions of macOS, but Swift isn't.

------
bla3
Speculation time! Chris Lattner designed Swift while he was at Apple. Apple
seems committed to it, and Chris seems to spend his time making it a language
that isn't Apple-only. His stint at Tesla was maybe short because Tesla didn't
want to adopt Swift. At Google, he got to start a project with it, but him
leaving suggests it didn't catch on. SiFive is small and hardware ramps are
slow, so maybe he's given up on the Swift-world-domination plan. Again, this
is all speculation.

~~~
favorited
Lattner publicly stated that he and Musk didn't work well together.

------
gaogao
Taichi, [http://taichi.graphics/](http://taichi.graphics/), is another
interesting stab in this field.

~~~
hamanin
wow this one looks more promising.

~~~
rcxdude
It's nice that they've actually demonstrated something in it. The differential
physics engine is pretty freaking cool. Also, if you look at some of the
details they talk about it's clear that just having a differentiable language
isn't enough: it's still very easy to have gradients which don't work properly
if you're not careful, especially in boundary conditions (the example they
give is collision detection).

------
gtycomb
OP likes Swift and he compared it to other languages Python, Go, C++ & Rust,
Julia. In the enterprise sphere that I am in, I haven't seen Swift and its
hard to move a team in this direction at this point.

Python already drives a significant market and captured talent in numerical
differentiable programming, particularly because of the ease with which
prototyping and tuning is done with it. Obviously when we want to scale we
have the conversation on some other option.Personally its a bit frustrating
that Go support for TensorFlow or one of the competitors is not quite
satisfactory and this is surprising, I can't explain it.

Instead of inventing any new language why don't one of you -- Python, Go, C++
& Rust, Julia, or Swift -- complete the job with end-to-end differentiable
programming. 'Complete' to me means a language level seamless GPU (or related
distributed/parallel architecture) and language level deployment ease (the
kind of thing done with Kubernetes) and integration with embedded hardware.

------
ChrisMarshallNY
I’ve been writing Swift since the day it was announced.

I believe in the language as a system programming language (and I’ve done
systems programming in the past).

That said, like C#, it is strongly associated with Apple as a “private”
language (despite being open-source).

That’s probably a big reason it’s slow to be adopted in a wider fashion, but
I’m sure there’s other issues.

It would be nice to see it be more widely adopted, but I’m OK with it being
confined to the Apple ecosystem, as that is my domain.

~~~
oaiey
There is more in that rationale than being privately owned. In comparison to
C#, Swift is not natively adopted by any of the three big cloud providers
(AWS, Azure and GCS) despite being the go-to language for half of all mobile
apps (which is a huge market). C# is supported despite being Microsoft
controlled. I believe that Swift does not have the confidence of the community
due to Apple's radical technology choices.

IBM may have counter-balanced this.

What do other think is the reason for the slow backend adoption of Swift?

~~~
Bigpet
> What do other think is the reason for the slow backend adoption of Swift?

It's not different enough. All the demos I've seen of it, felt like it was
mostly an "also ran" language that modernised it to be a significant
improvement over Obj-C.

But I've never seen any argument for it being a huge enough improvement over
C#, Kotlin or others to be worth it to switch ecosystems.

~~~
andybak
I feel differently. I read the docs when it was first released and felt it was
potentially hitting the sweet spot for "static yet expressive" and "familiar
yet modern". I was mainly Python at the time and the readability and clarity
impressed me and those are areas where I have very high standards.

More radical languages all tend to look a bit "read only" to me but I am aware
that might just be unfamiliarity on my part.

C# and Kotlin both have nice features but they are more conservative Java-
likes than Swift. Swift had enough modern features and syntax to feel like it
would be reward the effort learning.

But I don't own an iOS device so I'm still waiting for a use case!

~~~
pjmlp
.NET and JVM also have Clojure, Scala, F#, C++ to choose from, metaprogramming
runtimes, and AOT compilation support (although many seem to forget about this
one).

So while Swift is a welcomed improvement over Objective-C, hardly brings
anything worthwile to dump either the Java or .NET ecosystems.

~~~
DeathArrow
Add the fact that both JVM and NET are huge ecosystems with tons of libraries.

------
erwincoumans
No mention about Google JAX is odd, both in the article and in the comments
here. It was discussed yesterday:
[https://news.ycombinator.com/item?id=22812312](https://news.ycombinator.com/item?id=22812312)

~~~
antpls
Agreed, this is the most clickbait title of the week.

From my understanding, the author is someone from outside Google who doesn't
know anything about Google bets more than what is publicly available, and the
author is biased toward the Swift language.

JAX makes more sense, as it uses Python, but more importantly, Google works at
a lower level of the stack with TPU, MLIR and XLA. It doesn't really matter
what language is on top of that stack.

------
threw4234324
Does anyone here use Swift for Deep Learning work ? I'm yet to see any real
advantage, and even if there were or were to be added, I think Julia would
almost always be a better choice here.

~~~
spinningslate
As an outsider, I was disappointed they didn't pick Julia too. It seems very
thoughtfully put together, has a good community and some high quality libs.
And it's fast.

As the article points out, Google did consider it. IIRC it came down to Julia
and Swift in the end. And, given Chris Lattner was leading the effort, there
was only really going to be one answer. There's clearly some merit to that:
they were expecting to make changes in the compiler (again, if I remember, for
e.g. optimising GPU code). If you're going to change the compiler, it's pretty
compelling to opt for the language that one of your team designed. And it's
not clear (to me) what the implications of commits into the Julia master tree
would have been.

That doesn't generate a community though. It's yet to be seen whether that
will happen. It would take the level of resource that very few firms can
afford to dedicate. Google is one of them: though its patchy record on
committing to long term endeavours means it's definitely not a slam dunk. And
Lattner leaving further detracts from that confidence.

I'll be interested to see what they come up with. Still think it's a pity they
didn't choose Julia. But it's not my project so I don't get to choose.

~~~
ksec
>I'll be interested to see what they come up with. Still think it's a pity
they didn't choose Julia. But it's not my project so I don't get to choose.

I still think it is not too late to admit that was a mistake, and change
course to Julia. Especially Chris Lattner has moved on.

------
helsinkiandrew
> A data scientist could use Swift in much the same way as they use Python

Is this really true? Many data scientists don't come from a development
background. Low level swift code (what's likely to be in TF) can be as obscure
as C++. Will the 'industry lag' without users having to dig into the library
and understand the code?

~~~
MiroF
Not all data scientists are the same. Sure, this probably isn't useful for
someone without much coding experience who ya experience with the Kerala API,
but for people who want to code custom architectures to solve their problem
(which is also very common), the static type checking enabled by Swift, among
other things, is very appealing.

------
id_ris
I'm an iOS developer deeply familiar with Swift and on the path of learning
ML. The irritation with "import Foundation" is understandable. Foundation is,
wait for it, the Foundation of Objective-C programming and development for the
iPhone. It's a library Swift makes use of and needs to inter-op with due to
legacy concerns for iOS and Mac developers.

It's not an inherent part of Swift the language, and efforts like those at
Google and the open source development of Swift can develop more modern and
suitable replacements for those libraries.

I like python, but man I really don't want to make large systems in it. Swift
is a great language, and imo the biggest thing holding it back is that it's
intertwined with a lot of Apple code. But it doesn't need to stay that way,
and for that reason I applaud the efforts to move it beyond just an "app
creation" language.

------
cgarciae
One thing not mentioned in the original "Why Swift for Tensorflow" document
and was a mayor source of conflict when the differentiable programming feature
was formally proposed by the S4TF as a standard Swift feature: Swift has no
mechanisms for metaprogramming. The reason is that Automatic Differentiation
can be implemented 100% using metaprogramming, instead the S4TF team had to
create internally certain features for this, that is probably one of the
reasons why it took so long to get the most basic stuff working.

In retrospective you can really say Swift was a bad choice for the project
because the time to market was much slower than it could be vs e.g choosing
Julia. The other thing they didn't take into account was the actual market,
that is, the Data Science ecosystem in Swift is non-existente, you have an
excellent Deep Learning library standing alone without a numpy, a pandas, a
scipy, a opencv, a pillow, ect, which makes doing real application with it
nearly impossible.

That said, Swift as a language is amazing, doing parallel computation is so
easy, not having a garbage collector makes it super efficient. Its the kind of
thing we need, but the language right now is not in the right state.

------
BiteCode_dev
There are some things really neat in the switf example that they compare to
python, like the literal range similar to ruby's, the fact that "+" is a
function like in lisp, although I'm really not fan of the
CFAbsoluteTimeGetCurrent().

But the python example doesn't make me trust the rest of the article. It is
clearly a swift example, translated verbatim to python.

Idiomatic Python would be this:

    
    
        import time
        for it in range(15):           
            start = time.time()        
            total = sum((it,) * 3000)
            end = time.time()     
            print(end - start, total)  
    

Which is shorter, and way faster.

Now of course, Python is slower than swift (although a numpy version would not
be, but I get it's not relevant in general purpose machine learning). But
misrepresenting a language is not a good way to make a point.

~~~
lalord
Hi, author here, thanks for taking the time to read the article!

The objective of the demo was not to see which language could sum up a bunch
of numbers the fastest. You could keep optimizing that until you are left with
just `print(<the resulting number>)`. The objective was to have a simple
example of looping over an array a bunch of times. The only reason I ended up
summing the numbers in the array and printing them was so that LLVM wouldn't
optimize it away and be unfair towards python. I actually wrote it first in
Python tbh.

~~~
BiteCode_dev
Make sense, thanks.

------
elevenoh
SwiftUI for iOS got me in to swift.

I find optional?s & protocol-oriented programming enable clear & concise
mental models. I like swift's syntax better than Java/Kotlin, C++ & Rust imo.
And it's damn fast.

------
awwaiid
"If you’ve ever used PyTorch, TensorFlow, or any of the other big machine
learning libraries, they all support this feature, but only if you’re using
their particular library specific operations. What’s more, working with
gradients in these Python libraries is not as lightweight, transparent, or
well integrated as it is in plain Swift."

This reminds me of concurrency and transactional memory in Clojure. You can
have all of those things at a library level, but building it into the
language... Well it kinda FORCES you to deal with them, for good and ill.

------
nnq
Why the heck would anyone try to implement differentiable programming in a
language with _no meta-programming features whatsoever?!_

It's like trying to put a car into orbit or smth...

------
sriku
if you want nice DP, try Julia with Flux and Zygote. Zygote perhaps has a bit
of a distance to go for performance, but it just feels totally natural to code
with these libraries .. if you assume performance gaps will be closed (its
good enough).

~~~
mratsim
Can Julia build static binaries now?

Because for mobile devices you don't want to ship LLVM. Plus it's easier to
package binaries than scripts which usually require Docker to deal with
dependencies in a sane way.

~~~
eigenspace
A lot of great progress is happening in PackageCompiler.jl (large binaries,
very useable) and StaticCompiler.jl (small binaries, still immature).

I'd argue that Julia is much closer to having a good static compilation story
than Swift or Nim are to having vibrant scientific ecosystems.

> Plus it's easier to package binaries than scripts which usually require
> Docker to deal with dependencies in a sane way.

Julia does have a really good story for reproducible dependency management. We
learned a lot of valuable lessons by looking at Python's dumpsterfire.

------
scottlocklin
I had to google "differentiable programming" to see if they were really
touting automatic differentiation as some futuristic language feature.
Honestly the only real response to this should be :trollface:

I've been doing software development in data science, large scale optimization
and machine learning for over 15 years... I've needed automatic
differentiation in my language .... exactly never. I mean, most of the
languages I use regularly are capable of it, and it is a neat trick; it's just
not that useful.

The best part of this article is Yann and Soumith twittering they need Lush
back (and not because of automatic differentiation). I agree; it's still my
all time favorite programming language, and I don't even fool around with Deep
Learning.
[https://twitter.com/jeremyphoward/status/1097799892167122944](https://twitter.com/jeremyphoward/status/1097799892167122944)

~~~
abra1010
If you're doing "deep learning" using frameworks like Tensorflow, PyTorch or
JAX you're using autodiff all the time.

~~~
scottlocklin
Sure: great -it can be used with other sorts of models as well, but it's
really not that big a deal, and can be implemented in a lazy afternoon. I'm
pretty sure lack of this as a first class language feature is holding back
exactly no DL frameworks or programming languages, and glueing it onto Swift
isn't going to make DL people use it. I've never used Swift's repl; I suppose
that might actually be something that gets people on board.

~~~
stormtroper1721
> can be implemented in a lazy afternoon

Hi, I'm working on making a tensor lib in Rust (think numpy + autodiff) to
learn about these topics. There isn't much information online about how
projects like numpy and autograd work under the hood.

Do you have any ideas/tips/resources about how it could be done?

~~~
scottlocklin
Numpy is basically lapack. You'd be hard pressed to replace that with
something nearing its performance. For autodiff, I dunno, how about John
Mount's explanation?

[http://www.win-vector.com/blog/2010/06/automatic-
differentia...](http://www.win-vector.com/blog/2010/06/automatic-
differentiation-with-scala/)

------
nurettin
> Also, Python is not great for parallelism.

I agree that cpython threads are not parallelism, but python still comes with
built-in support for multiprocessing and I've been earning my bread using that
within the past two years, so unless you _have_ to use in-process parallelism
for some weird reason, python and your OS scheduler of choice has you covered
there.

~~~
logicchains
> have to use in-process parallelism for some weird reason

Needing shared memory paralellism is a weird reason now? Pretty much any
parallel algorithm that's not embarassingly parallel is going to perform
better with threads able to share memory than with message passing between
processes.

~~~
nurettin
Exactly, pretty much any parallel algorithm is embarrassingly parallel (gather
or calculate a bunch of data, process it and merge it together) so I question
the need for continuously needing in-process parallelism instead of solving it
trivially.

~~~
lmeyerov
Python is not great for fine-grained data parallelism (SIMD, GPU), which is
increasingly the lion's share: non-starter for direct inline and pretty bad
for DSLs. The result is runtime heroics for embedded ~dataframe DSLs (pyspark,
rapids.ai) with high overhead.

OTOH, those heroics do happen, and been OK so far. Accelerating differentiable
programming is basically an extra transform layer on accelerating data
parallel programming. Thankfully, our team writes zero raw OpenCL/CUDA
nowadays and instead fairly dense dataframes code. Similar to async/await
being added to doing a lot for web programming on Python, curious what it'll
take for data parallel fragments (incl. differentiable.) If it wasn't for
language resistance for UDF + overhead, and legacy libs around blocking, we'd
be happy.

------
john_alan
Love Swift. Really elegant language.

------
nimmer
Nim would have been a better choice. It's often as fast as C, similar to a
static typed Python and completely programmable.

~~~
yahyaheee
Nim is maybe the best alternative right now next to Julia, still think there
may be a better numeric language yet to show up

~~~
goatlover
So what about Fortran if the modern languages in this space fail to satisfy?
It's fast, statically typed, and often used for heavy duty number crunching on
supercomputing architectures, so obviously it handles parallelism well.

------
red_admiral
There are several things that make me give this a thumbs down.

First: "All these usability problems aren’t just making it more difficult to
write code, they are unnecessarily causing the industry to lag behind
academia."

Industry lag behind academia? Seriously, have you any idea of the amount of
data that industry crunches these days, while some in academia still think
half a GB is "big data"? Or the amount of money there is in the kind of ML
that industry does (which is the main reason why anyone cares about this field
at all).

Also, your whole post is about innovation going on at ... google. Then apple.
Not academia.

Secondly: "swift is fast", point taken. Then you go off on a drool about your
favourite syntactic sugar. My own experience is that this is exactly the kind
of thing you do not need in an enterprise-grade product.

------
mkchoi212
While I’m a fan of Swift and am glad that it’s getting a lot more attention
outside of the iOS world, I’m not sure if it’s the best language to be used in
the data/ML world. Yes, it’s faster than Python, yes it’s almost as easy to
read, and yes its standard library is powerful. But when interacting with
Swift as a framework, I can’t imagine how much of a pain in the ass the strict
type system will be for researchers. I get it. Types help people developers
write less error prone code and blah blah. But the main target here are ML
researchers. They aren’t developers. They just want to write down their theory
in code and just run to train their models. They aren’t writing complex iOS
apps here.

~~~
nestorD
To me this seem attractive for those that want to put ML into production.
Having a solid type system and an easily interoperable language could lead to
a lot less bad surprises than using python.

For research this seem indeed like a bad fit.

------
fastball
Something of an aside: closures don't need to be "unnamed functions which
capture their context", right? A function that "closes over" its context but
is _named_ is still a closure, unless I'm wrong.

~~~
wool_gather
You're correct, but there's a bit of "linguistic drift" at play, I guess. The
use of the term "closure" _in the Swift ecosystem_ has come to refer almost
entirely to the anonymous kind.

(Side note, named functions in Swift are indeed also closures:

    
    
        let x = 10
        func f() {
            print(x)
        }
        f()    // Okay; prints 10 as you'd expect
    
    )

~~~
fastball
Yeah, that seems to be the case in the JS community as well – I see a lot of
people referring to the arrow function syntax as "closures", even though named
functions are closures in JS too.

------
erwinh
Heard about this a while back but then it went off the radar (from my pov at
least). What is the latest from Apple and Google on this?

~~~
jamil7
Apple has little interest in supporting Swift outside of their platforms.
Chris Lattner has left Google so the future is uncertain. Maybe thats a
pessimistic outlook though.

~~~
pjmlp
Apparently, as linked by the article, Richard Wei as well.

------
pkulak
> Google’s team had more expertise in Swift, given that Swift’s creator Chris
> Lattner started the project

Way to bury the lead there, haha.

------
komuher
Looking forward to S4TF 1.0 lets see how much progress they'll manage to make
in next few months without Chris Lattner

------
weiming
My gripe with Swift (and I've written a lot of it) for purposes requiring a
lot of number crunching is the performance issues associated with default use
of copy-on-write types/ease of accidentally writing O(N^2) code. In Objective
C the behavior was more obvious with NSMutable* types being front and center.

~~~
valuearb
Premature optimizations.

------
nmca
Does Chris Lattner still work at Google?

~~~
pjmlp
No, and Richard Wei has left as well.

~~~
geodel
And he is working on Swift at Apple now which does not look bad to me.

~~~
pjmlp
Depends to which extent working on Swift at Apple actually means Swift on
Android, Linux and Windows, at an ecosystem and tooling level comparable to
other programming languages.

------
metroholografix
The improved Swift code shown in the original post takes 0.5s to compile with
optimizations on my system. It runs in 4-5us per inner loop.

The following Common Lisp code:

    
    
      (defun bench-array ()
        (declare (optimize (speed 3) (debug 0) (safety 1)))
        (let ((arr (make-array 3000)))
          (declare (dynamic-extent arr)) ; stack allocate
          (loop
            :with sum fixnum = 0
            :for i fixnum :from 0 :repeat 15 :do
              (time
               (setf sum
                     (loop :for k fixnum :from 0 :repeat 3000
                           :do
                              (setf (aref arr k) i)
                           :finally (return (loop :for l fixnum :across arr
                                                  :sum l fixnum)))))
              (print sum))))
    

compiles in 0.01s and runs in 5us per inner loop on my system (SBCL).

It has array bounds checking enabled (safety 1 declaration). If I remove it
(safety 0), runtime improves to 2-3us per inner loop.

I'll take Common Lisp over Swift any day of the week :-]

------
chombier
Also, the swift ABI [1] should make it fairly easy to distribute compiled
networks, right?

[1] [https://gankra.github.io/blah/swift-
abi/](https://gankra.github.io/blah/swift-abi/)

------
lovetocode
Honest question. Why are languages like C# and Java not more heavily
considered?

------
kelvin0
Isn't Swift an Apple 'invention'? Surprised Google is going for such a 'niche'
language ...

This reminds me of the initial 'Dart' hype and promises.

------
teekert
"To put it bluntly, Python is slow. Also, Python is not great for
parallelism."

Really? Doesn't that depend on the modules, the underlying code? The problem?

~~~
echelon
The problem is the GIL prevents threads from making progress simultaneously,
even with synchronization and locking primitives available.

But even so, Python is a little slow when you start thinking about threading.
You'd be better off using Rust or Go or something rather than the half-baked
support found in scripting languages.

~~~
andybak
> the half-baked support found in scripting languages.

It's a long time since I've heard that phrase used disparagingly. Didn't we
all decide to use the term "dynamic languages" just to avoid the judgemental
overtones associated with "scripting".

~~~
echelon
> It's a long time since I've heard that phrase used disparagingly. Didn't we
> all decide to use the term "dynamic languages" just to avoid the judgemental
> overtones associated with "scripting".

I didn't mean to disparage. While I did intend to scrutinize Python's
multiprocessing (my experience with it was less than fun), my use of the term
"scripting language" was entirely subconscious.

But I do have an anecdote.

A decade ago I was using Python and Flask for building web apps. Now I use
Rust instead, and many of my collages are choosing to use Go.

I still use Python for scripting and now also use it for ML. But I wouldn't
use it for the web anymore.

I think the landscapes and use cases are shifting since there are new tools
available. Python is doing new things (pytorch, tensorflow), but less of the
things I used to use it for (Flask, Django, ...)

------
dzonga
the growth of an high performing language for data will probably come from
NIM. it has expressiveness like Python and speeds like C. Though the community
needs to develop. and drop efforts like deploy to JS etc

------
eugenekolo
Google loves entering legal battles over programming languages huh....

Just use something actually free, or create your own.

------
yters
Will differential programming be able to solve the halting problem?

~~~
joshmarlow
No - a differentiable program would still be running on a real computer (which
is a finite approximation of a Turing machine). So for a differentiable
program to solve the halting problem, that would imply that the underlying
machine could solve the halting problem (which it can't).

~~~
yters
So how is this more powerful than normal programming?

~~~
JoeCamel
Obviously "more powerful" doesn't imply the capability of solving the halting
problem. What exactly do you want to say? You are not impressed with
"differentiable programming"?

~~~
yters
It is misnamed. Makes it sound like some kind of Turing complete approach,
like when DeepMind invented differentiable Turing machines. Tgis post just
describes differentiating mathematical functions, which has been around
forever, and you can find in matlab, mathematica, python libs, explained in
CISCP, etc. Why doesn't Google just buy a Matlab license and be done with it?
Why bother embedding in a whole new programming language?

~~~
ddragon
Technically, RNNs, including LSTMs, are already turing complete, neural turing
machines mostly decoupled memory from the hidden layers so it can grow the
memory size without a square increase on the number of parameters, which helps
with the unbounded part of turing machines. it also helped inspire many
attention based models that came after. Also matlab isn't really related here,
and automatic differentiation is different from symbolic differentiation, the
direct comparison are those python libs like tensorflow, pytorch and jax.

Differentiable programming in the end is just a way of making something that
you can already do better (just like you could create neural network long
before theano/tensorflow/torch but it was not as streamlined). With a
differentiable programming approach you can get something as dynamic as
pytorch, with the performance optimizations and deploy capabilities of a
tensorflow graph and with an easy way to plug any new operation and it's
gradient by writing in the same host language (so no need to learn or restrict
yourself the tensorflow/pytorch defined methods/DSL).

You don't even need to change the compiler or define a new language for it.
Julia's Zygote [1] is just a 100% Julia library you can import, for which you
can add at any point any custom gradient even if the library creators never
added them, and them run either on CPU or GPU (for which you can also fully
extend using just pure Julia [2]). And of course, you can also use a higher
level framework like Flux [3] which is also high level Julia code.

I think the heart of differentiable programming is just another step in the
evolution, from early (lua) torch-like libraries that gave you the high level
blocks to compose, to autodiff libraries that gave you easy access to low
level math operators to build the blocks to the point where you can easily
create your own operators to create the high level blocks.

[1] [https://github.com/FluxML/Zygote.jl](https://github.com/FluxML/Zygote.jl)

[2]
[https://github.com/JuliaGPU/CUDAnative.jl](https://github.com/JuliaGPU/CUDAnative.jl)

[3] [https://github.com/FluxML/Flux.jl](https://github.com/FluxML/Flux.jl)

------
altaaf_baatli
Why not kotlin?

