
Building a Language and Compiler for Machine Learning - ViralBShah
https://julialang.org/blog/2018/12/ml-language-compiler
======
eigenspace
Mike Innes is one of the real rockstars in the Julia community. He has a knack
for making surprisingly small and elegant packages which compose so naturally
with the base language that they feel built-in.

After a `using Flux`, julia is suddenly a machine learning language, instead
of a language with a machine learning library. I'd argue it shouldn't be
surprising that he found his way to Julia because Julia is one of few
languages that allows one to make such packages.

His other packages such as MacroTools.jl, Lazy.jl and Zygote.jl are also well
worth checking out.

------
mark_l_watson
As an old Lisp user, I am impressed by Flux (which I started using this
weekend, after someone on HN recommended Flux to me) as transforming Julia,
like building Lisp up to a new language for whatever problem you are working.
I also appreciate how incredibly easy it was to get started with Flux which
‘just worked’ with CUDA 10 and the GPU in my laptop and the model zoo was
great to get started. Really quality stuff!

~~~
ChrisFoster
Hah, that someone was me, I'm glad it clicked for you! I had a similar
experience of it "just working" when trying a few weeks ago. I particularly
enjoyed being able to build and run Flux models interactively just like any
other julia code. Also, having a hand written loss function just run on the
GPU with no extra effort kind of blew my mind.

~~~
atrilumen
How "ready" is Flux, for building something like Rasa?

[https://rasa.com/](https://rasa.com/)

~~~
ChrisFoster
My feeling is that Flux is a fantastic tool for playing with innovative models
which can't be expressed in the usual frameworks. And quite possibly already
the best thing that exists for open ended experimentation in ML.

On the other hand, for training and deploying models which are easily
expressed in other frameworks you will find a lot more ready made pieces of
infrastructure elsewhere.

~~~
atrilumen
Thanks!

------
stabbles
Since the blog post does not have many code samples, this non-trivial AD
example with Zygote.jl is worth sharing (it's from their readme):

    
    
        julia> using Zygote
        julia> fs = Dict("sin" => sin, "cos" => cos);
        julia> derivative(x -> fs[readline()](x), 1.0)
        cos
        -0.8414709848078965
        julia> -sin(1.0) # 'true' derivative in 1.0
        -0.8414709848078965
    

So Zygote can apply AD to an anonymous function that looks up a function in a
hash table from user input.

[https://github.com/FluxML/Zygote.jl](https://github.com/FluxML/Zygote.jl)

~~~
UncleOxidant
That's pretty impressive.

I notice this in the zygote readme:

"The Julia compiler does not yet support all features needed to make Zygote
fast, particularly in the presence of control flow."

Any idea of when the Julia compiler will support these features?

~~~
viksit
Could you elaborate why this is so impressive? As I understand it, it's
because things like TF or PyTorch can only AD specific things as part of a
spec vs generic functions like this?

~~~
FridgeSeal
It's impressive and super super cool, because now any function you can write
in Julia, the AD can differentiate.

You don't have to hope that your AD/ML package has that function, you can just
write it or find it in a package and punch it straight in. That's awesome.

~~~
0-_-0
How is that different from e.g. Tensorflow? You can compose any function you
want that has differentiable components, and (re)define the derivatives of
elementary or composed functions.

~~~
Setepenre
No tensorflow does not allow you to do that. Tensorflow differentiate a TF
graph. Flux differentiate Julia's IR.

Tensorflow has OP(eration)s that are differentiable that you can compose and
that's it. If you want to implement something that is going to be
differentiable you need to implement it using Tensorflow OPs or add your own
OPs to [tensorflow] with their gradient.

With Flux you can take code written by a random guy on the internet that never
thought about using his stuff in ML and Flux will be able to differentiate it
anyway.

[1]:
[https://www.tensorflow.org/guide/extend/op](https://www.tensorflow.org/guide/extend/op)

~~~
0-_-0
You mean like AutoGraph in Tensorflow?

[https://github.com/tensorflow/tensorflow/tree/master/tensorf...](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/autograph)

~~~
FridgeSeal
Still a small subset of the language and features

They've also got to manually implement the Python -> autograph translation for
a whole variety of language features (so any language features that get added
or changed will break autograph until it's updated.

Flux gets this essentially for free, for the entire Julia language, without
the need to manually build that language -> tensorflow translation layer. With
the added benefit of Julia's non-trivial performance benefits.

------
ChrisRackauckas
As someone who works in merging differential equations and machine learning, I
have found this kind of work essential for what I do. Pervasive AD that allows
merging neural networks and diffeq solvers is allowing us to explore all of
kinds of new models and new problems. Sure it doesn't impact vanilla machine
learning all that much (though Zygote.jl does allow for a lot of optimizations
that wouldn't be possible with tracing-based AD), but it definitely opens up a
new wave of AI possibilities.

~~~
ArtWomb
Neural ODE solver just announced as a Best Paper at NeurIPS 2018 conference.
What's interesting is that not only do preliminary results yield significantly
less error in prediction vs RNNs. Differentiable models allow for
extrapolations well beyond range of observable test data set.

Neural Ordinary Differential Equations

[https://arxiv.org/pdf/1806.07366.pdf](https://arxiv.org/pdf/1806.07366.pdf)

~~~
ChrisRackauckas
I am working on very different approaches with very different applications,
but yes the neural ODE (not a solver BTW) is a good example of this kind of
application and the effectiveness of the approach. It's a fantastic paper if
anyone hasn't read it.

------
jlebar
As an XLA:GPU person I'm curious how the performance of Julia natively
compiling to CUDA compares to using XLA:GPU.

In particular, is this a promising approach, or do you see it as a dead end
compared to generating GPU code natively? If it's promising, are there things
we need to do in XLA:GPU to make it less awful for you?

(Reasons you might want to use XLA:GPU include, you don't have to reinvent all
our performance and correctness hacks for cudnn, and maybe our kernels run
faster since we're targeting such a limited domain?)

~~~
KenoFischer
We've been meaning to run this comparison, but haven't gotten around to it
yet. I expect it to work and am hoping to see some performance benefits. It
should be fairly straightforward to see it working, the only reason we haven't
tried so far is that we only have xrt hooked up and the TF infeed ops are not
open source, so the existing code doesn't just work. It should be
straightforward to hook up the xla service instead, but it's a bit of
additional code to write that we haven't gotten to.

------
UncleOxidant
I'm tasked with running several ML algorithms on a new hardware accelerator.
Currently there is an LLVM toolchain for that new hardware, but no Python
support is expected for a while which means implementing a bunch of ML code in
C or maybe C++ (not a very pleasant prospect). I'm wondering, since Julia has
an LLVM backend would it be possible to emit LLVM IR from Julia which could
then be fed into our LLVM toolchain?

One thing that comes to mind here: does Julia use some kind of primitives for
various things like matrix multiplication that might be difficult to export at
the LLVM-IR level?

~~~
KenoFischer
Yes, Julia tends to be quite good at this kind of thing. Which level you want
to operate at will depend on the details of the accelerator. Happy to give
some pointers if you can give me a rough idea of the target architecture and
what software already exists. My email is in my HN profile.

------
shafte
I'd be interested in a direct comparison with similar efforts undertaken by
existing frameworks; for example Torch Script[1], which aims to produce a
language which shares a syntactic frontend with Python while getting all the
goodies that ahead-of-time compilation gives you (symbolic diff, operator
fusion, etc).

Seems to me that the primary challenge for any "next-generation" framework or
language is getting people to actually use the thing. Sharing a front-end with
Python and a backend with PyTorch seems like a good way to bootstrap that.

[1]
[https://pytorch.org/docs/master/jit.html?highlight=torchscri...](https://pytorch.org/docs/master/jit.html?highlight=torchscript)

------
lostmsu
I wonder if this feature is in any way different from LINQ for expressions?

In C# you can say

    
    
      Expr<Func<float,float>> sin = Math.Sin;
    

And then write a function

    
    
      Expr Derivative(Expr expr) => ...
    

Which will take the above sin, and compute its derivative as another Expr,
which can be later compiled using Expr.Compile()

In C# this has been introduced to make SQL bindings.

So far, the only difference I see is that in C# there's a distinction between
expression trees and functions themselves, but in Julia there's not.

~~~
StefanKarpinski
This isn't really a blog post about a specific language feature so your
question doesn't make too much sense to me, which may be why it hasn't gotten
any answers. In general, being able to map specific primitives to their
derivatives is not sufficient for AD, although I'm sure AD is possible in C#,
it's just considerably more involved than that.

------
pjmlp
I love the work being done in Julia, as competition is good, and maybe it
would make Python community be more supportive of the ongoing JIT attempts.

------
Myrmornis
> Meanwhile, the idea of ML models fundamentally being differentiable
> algorithms – often called differentiable programming – has caught on.

> We need a language to write differentiable algorithms, and Flux takes Julia
> to be this language.

Recently on HN there was some discussion of this paper by Conal Elliott on
automatic differentiation in a pure functional language (Haskell):
[https://arxiv.org/abs/1804.00746](https://arxiv.org/abs/1804.00746)

This is a rather large and vague question but I'm curious whether people have
comments on the relative merits of Julia vs a pure functional language for
supporting "differentiable programming" for ML?

------
StefanKarpinski
Also posted here:
[https://news.ycombinator.com/item?id=18593453](https://news.ycombinator.com/item?id=18593453).
Maybe a mod could combine the two?

------
tehsauce
This looks awesome, and makes me wonder if anyone done any real-time graphics
experiments with Julia? With this great AD and gpu support, I would love to
try using this on some graphics applications!

~~~
byt143
Check out Makie
[https://github.com/JuliaPlots/Makie.jl](https://github.com/JuliaPlots/Makie.jl)

And the cool thing is, Julia's wonderful generic programming facilities
mentioned in the blogpost are used to make Makie generic to backend (GL,
WebGL, cairo etc). Relies on this library
[https://github.com/JuliaPlots/AbstractPlotting.jl](https://github.com/JuliaPlots/AbstractPlotting.jl)

Here's a cairo version of makie
[https://github.com/JuliaPlots/CairoMakie.jl](https://github.com/JuliaPlots/CairoMakie.jl)

------
glemmaPaul
This is gonna be very interesting when this can be combined with a distributed
network of specialized CNNs for highly specialized tasks (if there isn't
already)

