
Fast.ai and why Python is not the future of ML with Jeremy Howard - tosh
https://www.wandb.com/podcast/jeremy-howard
======
fxtentacle
I feel like this misses the point.

Python is used as the language to describe things, but nobody is executing
Python on the GPU. Instead, things are transpiled into highly optimized byte
code, C++ source, or chains of optimized GPU sharers. See TensorFlow XLA and
AOT.

So when I set up a training pipeline with TensorFlow, then it'll barely run
any python while training. Instead, the TF C++ core will execute a network of
TF CUDA operators for me, all inside one python function call. That's why
pythons global interpreter lock is pretty much a non-issue by now.

Since only 1% of my training time is even running inside python, I don't
expect any performance benefits from replacing it. And if both are equally
fast, I prefer the familiar option, meaning python.

~~~
dgellow
You can be sure that they are both well aware of this, Jeremy is one of the
creator of the popular python library, book, and online course fast.ai built
on top of PyTorch.

The relevant part of the their discussion is this one:

> Like Python is not the future of machine learning. It can't be. You know,
> it's so nicely hackable, but it's so frustrating to work with a language
> where you can't do anything fast enough unless you call out to some external
> code or C code, and you can't run anything in parallel unless you put in a
> whole other process. Like I find working with Python, there's just so much
> overhead in my brain to try to get it to work fast enough. It's obviously
> fine for a lot of things, but not really in the deep learning world or not
> really in the machine learning world. So, like, I really hope that Julia is
> really successful because there's a language with a nicely designed type
> system and a nicely designed dispatch system and most importantly, it's
> Julia all the way down so you can get in and write your GPU colonel in
> Julia, or all the basic stuff is implemented in Júlia all the way down until
> you hit the LLVM.

Here the point is that when using other languages (such as Julia) you actually
can use the same language for your GPU and for the glue that keep your scripts
together.

~~~
p1esk
I feel like you (and them) are still missing the point. 99% of users don't
write GPU kernels. They use simple function calls and that's all they see. 99%
of users don't care if they call CUDA code from Python, or Julia code from
Julia, as long as both are equally fast and API is clean (e.g. Pytorch).

~~~
dragandj
This is because 99% of users know only Python, and C++ is too much for them.
If the advanced stuff was available in the language they use, larger portion
of users would write more advanced stuff.

I prefer Clojure for this. Even simpler than Python, fast, and enables me to
match or even surpass the speed expected from mainstream tools.

See this for example: [https://dragan.rocks/articles/20/Deep-Diamond-Deep-
Learning-...](https://dragan.rocks/articles/20/Deep-Diamond-Deep-Learning-in-
Clojure-is-Fast-and-Simpler-than-Keras)

~~~
p1esk
Did you write gpu kernels in clojure? Or did you make function calls to cudnn
like pytorch et al does it?

~~~
dragandj
Both. See
[https://github.com/uncomplicate/bayadera](https://github.com/uncomplicate/bayadera)
too.

Whatever the language, you _have_ to make function calls to cuDNN, because
cuDNN is optimized by Nvidia, which put serious resources into it. Even if you
wrote everything in C++/CUDA, you probably can't match it for the standard
stuff.

The custom things that I write, I prefer to write in CUDA kernels + Clojure
management code.

Clojure kernels are technically possible in the same way as Julia or whatever
kernels are possible, but the thing is that you have to match low-level
hardware features anyway, so CUDA C/C++ just makes more sense for kernels,
which are tiny part of code anyway.

~~~
p1esk
So if you have to use cuda c/c++, then why not do it from python?

~~~
dragandj
No. You don't use cuda/c++ as it is used from Python. You use cuda/c only in
kernels. That is compiled dynamically, in process, from Clojure funcion calls,
and 95% of related code is the management code, which is written in Clojure.
There is no separate C++ build. You never touch any C++ tools such as CMake or
anything. You work dynamically from the Clojure REPL.

~~~
p1esk
As a Pytorch user, I never touch c/cuda code. I never touch cmake. I can work
dynamically from python repl. So, again, why should I care how those c/cuda
calls are made if I don’t see them?

~~~
pjmlp
Because Clojure comes with a platform that supports AOT/JIT compilation out of
the box, has several state of the art GC algorithms and in the near future
even value types, while offering the same dynamism as Python, with all the
capabilities of a Lisp in code manipulation.

------
ovi256
I get that a language like Julia is better for library writers, but in my
opinion Python is better for library users, and that has more weight. It will
drive faster adoption, making the library popular. Meanwhile, the easily-
written better-language library gathers cobwebs.

~~~
m_mueller
I think Julia will gain some adoption replacing Fortran-, C- and C++ based
libraries that can then also be used within Python. If you use any numerical
packages, the above is already what you’re using under the hood.

I also think that not enough people are aware of the performance limitations
of NumPy code and how much more straightforward it can be to drop into a lower
level language once you cross a certain optimization barrier. This is where
Julia can increasingly serve an important role.

~~~
TheRealKing
This will not happen. There is so much hype around Julia and false claims such
as Julia outperforming Fortran, etc. One can always design an isolated
benchmark in which Julia outperforms any other language. But what people care,
is performance in practice. And false claims will only hurt Julia in the long
run, in the same way VolksWagen got badly hurt for its "US diesel emission
fraud".

~~~
m_mueller
It's a compiled language with a close-to-baremetal type system and implemented
as an LLVM frontend. I don't see what would make it slower than e.g. Fortran
except for a couple more years of LLVM backend optimizations required (which I
still think will happen given LLVM's adoption), but probably I'm missing
something here.

------
dklend122
Jeremy has amended his comments regarding the viability of Julia Computing
based on new information he received:
[https://twitter.com/jeremyphoward/status/1302678869158182912...](https://twitter.com/jeremyphoward/status/1302678869158182912?s=19)

The financials are more secure than stated in the video. (I don't work there,
but I'm a fan)

------
ummonk
I love Julia, but what I don't get in this particular instance is to what
extent it actually allows you to write code that you can't with Python. Sure,
cassette allows you to do more powerful automatic differentiation, and various
GPU arrays packages make writing code that runs on GPUs very natural, but it
doesn't seem like that big an improvement over using a framework like TF /
Pytorch / Numba in Python. Sure, those libraries aren't written fully in
Python, but neither are Julia gpu libraries - you obviously have to convert to
CUDA or similar code at some point to run on the GPU.

Julia brings great improvements in ability to write simple idiomatic serial
code that runs at near C speeds on the CPU (whereas idiomatic Python code is
at least 10x slower if you use an optimal compiler, and 1000x slower if you
use the standard python interpreter). But for highly parallel code dependent
on element-wise or broadcasting array operations, I just don't see the issue
with Python.

~~~
cdsousa
""" [...] it is possible to perform kernel-like operations without actually
writing your own GPU kernels: a = CUDA.zeros(1024) b = CUDA.ones(1024) a.^2 .+
sin.(b) """ [[https://juliagpu.gitlab.io/CUDA.jl/usage/overview/#The-
CuArr...](https://juliagpu.gitlab.io/CUDA.jl/usage/overview/#The-CuArray-
type)]

As far as I know, that example code creates an ad-hoc kernel that performs the
computation in a single pass.

I honestly don't know if that is possible in other frameworks.

~~~
ummonk
Yeah, that's definitely available in Python frameworks. (I was wrong to
mention Numba - I think CuPy would be a better example:
[https://docs.cupy.dev/en/stable/overview.html](https://docs.cupy.dev/en/stable/overview.html))

~~~
cdsousa
That's right, I found it:
[https://docs.cupy.dev/en/stable/tutorial/kernel.html#kernel-...](https://docs.cupy.dev/en/stable/tutorial/kernel.html#kernel-
fusion)

------
ForHackernews
Julia is wonderful and I'm really hoping it catches on:
[https://julialang.org/](https://julialang.org/)

Unfortunately, it lacks a big-name sponsor like Go or Rust.

~~~
pjmlp
It has enough of them,

[https://juliacomputing.com/case-studies/](https://juliacomputing.com/case-
studies/)

------
miguendes
Like I said in a another comment. To me Julia has the potential to be the
modern fortran.

Python versatility is miles ahead. You can build Web APIs, Web apps,
automation scrips, data scrapping and so on.

Python is not the de facto language of ML nowadays because its performance.
All the underlying performance sensitive code is written is C/C++/Fortran.

Also, cython, which has great integration with python, is growing a lot and
has been used in some cool projects such as aiohttp, fastapi, and spacy. So
one more alternative to write fast code without giving up on python entirely.

~~~
dunefox
> Python versatility is miles ahead. You can build Web APIs, Web apps,
> automation scrips, data scrapping and so on.

What? And you somehow cannot do that in Julia?

> So one more alternative to write fast code without giving up on python
> entirely.

I don't have to give up on Python, I can just use all libraries from Julia
with PyCall. I'm just glad I don't have to use Python the language anymore
than absolutely necessary.

~~~
miguendes
> What? And you somehow cannot do that in Julia?

You definitely can. My point is that python libs are more mature and there are
more options to choose from than Julia.

~~~
dklend122
Julia libraries are plenty mature enough to be useful and Julia's 10x
advantage in numerical computing far outweighs the difference in many cases.

------
learningwebdev
As somebody who is just getting into learning Python for web development and
ML, am I better off switching to Julia right now? How steep is the learning
curve compared to Python, and does it have a framework similar to Django for
building web apps?

~~~
miguendes
IMHO, no, stick with Python. Of course you can fiddle with Julia but Python is
used in many fronts and much more versatile.

If Python were a blocker for the adoption and development of ML, it wouldn't
be the default language of the most popular ML libs.

The reason is, the high performant code is written in C++/C/Fortran. Python is
just used to glue everything and provide a nice and rich interface. That's
what really matters.

To me Julia is more to Fortran than Python. And it doesn't have many usages
outside numeric programming.

Scapping the Web, building rest APIs, Performing Data Analysis, Automation
scripts is much easier in Python than Julia or Swift.

Edit: typos

~~~
dunefox
> much more versatile.

> To me Julia is more to Fortran than Python. And it doesn't have many usages
> outside numeric programming.

Again, you're making unsubstantiated claims.

> Python is just used to glue everything and provide a nice and rich
> interface. That's what really matters.

So, now I have to not only learn Python but also Fortran/C++/C because the
underlying library that I might want to adapt is written in one of these
languages. In Julia the DL library, for example, is written in Julia. What you
are claiming is a pro is actually a con.

> Scapping the Web, building rest APIs, Performing Data Analysis, Automation
> scripts is much easier in Python than Julia or Swift.

That might be true for Swift, but certainly not for Julia.

~~~
nl
_So, now I have to not only learn Python but also Fortran /C++/C because the
underlying library that I might want to adapt is written in one of these
languages._

Basically the only time you'll want to do this from Python is if there is
specific Fortran or C++ library you want to use. You'd have to do the same in
Swift or Julia in this case.

~~~
dunefox
Only if the library is not written in Julia, yes. The point here is that Julia
is efficient enough that deep learning libraries can be written in pure Julia,
not C/C++/Fortran.

~~~
nl
And yet... no widely used DL library is in Julia, and they are _all_ in
Python.

It's kind of a silly point to try to score: "it's possible to write an
efficient deep learning library in Julia (although no one has done it yet),
and yes, you can do the same in Numpy in Python, and XLA in Python will
outpeform it, but _someone else wrote some C /C++ there to make that
possible!_"

You are much more likely to want to write CUDA kernels (in C!) than you are to
write C framework code to interface with Python for machine learning.

The person is looking to "get into ML". I've been working as a professional ML
developer for 6 years, and I've never written any C or C++ for it.

~~~
BadInformatics
> although no one has done it yet

This may be generally true (though the benchmarks I've seen show Knet.jl and
sometimes Flux.jl on par with TF/PyTorch with a single machine + single GPU),
but there are definitely domains where it is categorically not. The most
prominent one is neural *DEs, where the SciML [1] ecosystem has SOTA
performance. You can really see Python/C++-based frameworks struggle here
because they have slow "glue code" and don't (one could argue can't
effectively) optimize for latency. That's not a problem for most CV models and
transformers, but really stunts research into more dynamic approaches.

[1] [https://sciml.ai/](https://sciml.ai/)

~~~
nl
This sounds more like it's a new field.

I started looking for benchmarks (because it sounds like the kind of thing JAX
would do well) and the very first link I clicked included:

 _Wraps for common C /Fortran methods like Sundials and Hairer's radau_

which is exactly what was claimed wasn't needed.

~~~
ChrisRackauckas
>I started looking for benchmarks (because it sounds like the kind of thing
JAX would do well)

Currently Jax has an in-progress stiff ODE solver that's about 200x slower
than SciPy

[https://github.com/google/jax/issues/3686#issuecomment-65709...](https://github.com/google/jax/issues/3686#issuecomment-657090547)

and SciPy (with JIT) is about 50x-100x slower than the pure Julia methods

[https://benchmarks.sciml.ai/html/MultiLanguage/wrapper_packa...](https://benchmarks.sciml.ai/html/MultiLanguage/wrapper_packages.html)

so Jax has more than a little bit of a way to go.

------
pjmlp
Agreed, with something like a C++20 JIT based workflows remove the dual
language barrier, and then there is Julia, and other JIT based languages with
GPGPU backends.

------
zmmmmm
It's sort of one of those provocative statements that just mis-represents
things to make controversy.

I also believe Python is not the future of ML but it's because the future is
more about the space maturing and becoming commoditised and "boring" which
probably means, being completely honest, things that people have no interest
in happening like it migrating over to the JVM etc.

But who wants to talk about that?

------
teleforce
I'm surprised nobody mentioned about D language for AI type of applications.

It's a very fast compiled language and compile faster than most of its
competitors.

I supports interactive programming with rdmd and can work as a kernel inside
Jupyter notebook through Jupyter's wire protocol[1].

It has friendly python vibe to it due to default to GC based ecosystem (but
you choose to let go the GC where appropriate).

With the introduction of DPP, its FFI capability to C/C++ is second to none if
the needs arise to interface with the existing libraries in those languages
and it can also easily interface seamlessly with python and R libraries! [2]

It also has a growing library for big data analysis [3] and one of the main
users of D is WekaIO, one of the prominent big data companies with the claim
of the world's fastest file system for data storage [4].

If you insist on multiple dispatch style like Julia, you can emulate it as
well in D [5].

If you want to see the glimpse of what D can offer for data science domain
please check this "D is for Data Science" article [6].

Finally if you want to see D in action against Julia and Chapel for kernel
matrix calculations that's common in AI type of applications please check this
post [7].

[1] [https://github.com/symmetryinvestments/jupyter-
wire](https://github.com/symmetryinvestments/jupyter-wire)

[2] [https://dlang.org/blog/2020/01/27/d-for-data-science-
calling...](https://dlang.org/blog/2020/01/27/d-for-data-science-calling-r-
from-d/)

[3] [https://dlang.org/blog/2018/12/04/interview-liran-zvibel-
of-...](https://dlang.org/blog/2018/12/04/interview-liran-zvibel-of-wekaio/)

[4]
[http://docs.algorithm.dlang.io/latest/mir_ndslice.html](http://docs.algorithm.dlang.io/latest/mir_ndslice.html)

[5]
[https://en.wikipedia.org/wiki/Multiple_dispatch#D](https://en.wikipedia.org/wiki/Multiple_dispatch#D)

[6] [https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-
data...](https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data-
science.html)

[7] [https://dlang.org/blog/2020/06/03/a-look-at-chapel-d-and-
jul...](https://dlang.org/blog/2020/06/03/a-look-at-chapel-d-and-julia-using-
kernel-matrix-calculations/)

~~~
non-entity
D is weird to me. Ive only looked breifly at it, but I liked what I saw. I'm
assuming it had some early issues or completely failed to market itself for
too long because it seems to see very little use compared to similar
languages.

~~~
teleforce
Yes, your impression is correct but D is still a young language compared to
Python, and it just has been around for 20 years now (18 to be exact).

For comparison, at the age of 20 years (around 2010) Python was still playing
a second fiddle to Perl, and at the same time Ruby was fast becoming popular
due to RoR.

The good news is that now D already passed the growing pain problem regarding
the transition from D1 to D2 (e.g. Tango library issue), very similar to
Python transition from 2 to 3 that happened fairly recently, or perhaps it's
still happening now.

------
est
tl;dr python is slow and cant parallel. Julia is better because it's fast and
julia all the way down.

------
cheriot
tldr;

Jeremy Howard says python will always be calling out to something else that's
faster and more parallel. Writing fast python has too much cognitive overhead.
He hopes Julia catches on because it's Julia all the way down.

~~~
linkdd
I think that's one of the strength of Python:

\- implement your work intensive logic with the adapted tools

\- glue everything together with Python

~~~
lmm
I used to think that until I found Scala. It's easier than writing Python - I
can write pretty much the same thing as I would in Python, but I have a more
complete IDE and I can ask it what the types of things are. I'd never want to
use a Java/C++-style language for exploratory programming, but it turns out
not all reasonable-performance languages are like that.

~~~
gilbertmpanga12
I think you're trying to oversell here, Scala is bloated with lots of
paradigms and lots of features. It's pretty hard to know what to use. Python
in most ML frameworks is just a frontend, other beefy stuff gets processed by
c++ or c

~~~
lmm
> Scala is bloated with lots of paradigms and lots of features. It's pretty
> hard to know what to use.

It's a flexible language, but I wouldn't say bloated - there actually aren't
so many language-level features, but the features there are are very general.
Some parts are simpler, e.g. in Python I struggle to remember which method you
have to define to overload the * operator, whereas in Scala you just define a
method called * . But I'd agree that there are a bunch of overcomplicated
frameworks that can be pretty confusing - it's not always an easy language to
get started with. But I stand by the statement that you can write Python-like
code in it, at least once you know what you're doing.

> Python in most ML frameworks is just a frontend, other beefy stuff gets
> processed by c++ or c

Right, and that introduces a bunch of overhead and possibilities for weird
errors - e.g. when you hit a bug in a library you pretty much have to learn
how to debug C. In Python the benefits are worth it, but there's undeniably an
overhead - if you could use the same language top-to-bottom but still have all
the nice things Python gives you, that would be a much nicer way to work.

