Python is used as the language to describe things, but nobody is executing Python on the GPU. Instead, things are transpiled into highly optimized byte code, C++ source, or chains of optimized GPU sharers. See TensorFlow XLA and AOT.
So when I set up a training pipeline with TensorFlow, then it'll barely run any python while training. Instead, the TF C++ core will execute a network of TF CUDA operators for me, all inside one python function call. That's why pythons global interpreter lock is pretty much a non-issue by now.
Since only 1% of my training time is even running inside python, I don't expect any performance benefits from replacing it. And if both are equally fast, I prefer the familiar option, meaning python.
The relevant part of the their discussion is this one:
> Like Python is not the future of machine learning. It can't be. You know, it's so nicely hackable, but it's so frustrating to work with a language where you can't do anything fast enough unless you call out to some external code or C code, and you can't run anything in parallel unless you put in a whole other process. Like I find working with Python, there's just so much overhead in my brain to try to get it to work fast enough. It's obviously fine for a lot of things, but not really in the deep learning world or not really in the machine learning world. So, like, I really hope that Julia is really successful because there's a language with a nicely designed type system and a nicely designed dispatch system and most importantly, it's Julia all the way down so you can get in and write your GPU colonel in Julia, or all the basic stuff is implemented in Júlia all the way down until you hit the LLVM.
Here the point is that when using other languages (such as Julia) you actually can use the same language for your GPU and for the glue that keep your scripts together.
I prefer Clojure for this. Even simpler than Python, fast, and enables me to match or even surpass the speed expected from mainstream tools.
See this for example: https://dragan.rocks/articles/20/Deep-Diamond-Deep-Learning-...
Whatever the language, you have to make function calls to cuDNN, because cuDNN is optimized by Nvidia, which put serious resources into it. Even if you wrote everything in C++/CUDA, you probably can't match it for the standard stuff.
The custom things that I write, I prefer to write in CUDA kernels + Clojure management code.
Clojure kernels are technically possible in the same way as Julia or whatever kernels are possible, but the thing is that you have to match low-level hardware features anyway, so CUDA C/C++ just makes more sense for kernels, which are tiny part of code anyway.
It will take a lot for another language eco system to catch-up to something as broad and deep as python but these issues combined with the horrible experience that is dependency management with pure python vs compiled etc as well as the need for real-time/close to real-time systems, it seems very plausible that something like Julia or other llvm language will become the default over python for _these specific_ use cases, not in general though. Tooling on top of Julia can likely be far better and more easily made than living in python which is another aspect.
Again, 99% of the users won't be doing that no matter how good the tracing abilities are. They will post a question on SO/GitHub. Regardless of the language, modern DL frameworks are just too complex for regular users.
I agree that users don’t care, I use Spark a lot and don’t care if I write scala, python or even c# now.
Have a listen: https://www.wandb.com/podcast/jeremy-howard
(I think I have paraphrased the podcast correctly :) )
Now that I agree on. I suspect we will see something like GPT-5 reliably converting plain English into code sooner than Julia displacing Python as king of ML.
False. Not in Julia: https://juliagpu.gitlab.io/KernelAbstractions.jl/
And PaddedMatrices.jl is another example:
So there isn't just one example but multiples at this point. RecursiveFactorization's `lu` tends to perform better than MKL for small matrices and this is what we use in the differential equation solvers when there are <100 ODEs.
That's all on CPUs though. KernelAbstractions.jl is the GPU version and that's still being constructed but moving strong.
Here's an example: not long ago I needed a fast GPU implementation of numpy.unpackbits. I can do it in Pytorch, but it's slow and it wastes a ton of memory. So, my question is: can I write it in Julia as efficiently as I can do it in CUDA? If yes, will it be easier to do so?
and you get ~60% of the best known CUDA implementation on that test:
Not perfect but not awful.
What I showed was the the essentially no-knowledge pure Julia code gets to ~60% of the optimized CUDA kernel that libraries tend to wrap. Take that as you will for whether that means that means that a user requires to write a "fast kernel": it's a data point and if you define that as fast enough then Julia does it, or if you require 100% as fast as the fastest CUDA kernel ever found :shrug: it doesn't get there quite yet because there's a lot of people working on getting it there.
If you could express the problem in generalized index notation, tullio.jl is an even higher level abstraction that uses KA
I don't know about the numpy function you mentioned, but here's a KA matmul. Definitely doesn't look like it has any CUDA specific things. Just a regular loop with some restrictions
@kernel function matmul_kernel!(a, b, c)
i, j = @index(Global, NTuple)
# creating a temporary sum variable for matrix multiplication
tmp_sum = zero(eltype(c))
for k = 1:size(a)
tmp_sum += a[i,k] * b[k, j]
c[i,j] = tmp_sum
Wish I had time to write the code and compare to pytorch for you though
using Tullio, OffsetArrays
# A convolution with cyclic indices
mat = zeros(10,10,1); mat[2,2] = 101; mat[10,10] = 1;
@tullio kern[i,j] := 1/(1+i^2+j^2) (i in -3:3, j in -3:3)
@tullio out[x,y,c] := begin
xi = mod(x+i, axes(mat,1)) # xi = ... means that it won't be summed,
yj = mod(y+j, axes(mat,2))
@inbounds trunc(Int, mat[xi, yj, c] * kern[i,j]) # and disables automatic @inbounds,
end (x in 1:10, y in 1:10) # and prevents range of x from being inferred.
It's mostly just math!
Fortran, Java, .NET. Julia, D, Rust and anything else able to either generate PTX or integrate with CUDA LLVM backend can be used.
So to the end user who doesn't want to write CUDA, I don't see why using a Julia package for gpu acceleration is any better than using PyTorch in Python.
Pytorch on the other hand HAS to glue together c and c++ cuda code.
Julia has four layers of GPU abstractions, each being successively easier to use ( bit a bit less control. ), all in pure Julia and reusing the compiler and normal Julia abstractions and integrated with AD etc
CUDA.JL, GPUArrays.jl, kernelabstracions.jl, tullio.jl.
At the highest levels you just write normal loops or index notation and the code and backward pass generated. Also generic to backend-simd and multithreading cpu and soon AMD rock, intel accelerator and distributed
This all composed with deep learning libs etc
The main questionable implicit assumption that is made is that Python can't "the future" (which is severely underspecified without a time horizon) because of what is bad about it today. It even is literally there "Python isn't the future... It can't be" <because something in Python is bad>.
That's highly questionable, because it leaves out
- that people appear to have reasons to adopt Python, likely because of something that is good about it,
(this complements the observation that not that many people might be impacted by the disadvantages).
- that Python and the ML tools in Python might not be static and actually evolve in "the future".
It seems to me a curious lack of reflection substituted for by being particularly assertive about it.
To my mind Python not being the future is more likely to happen if Python becomes less likable or the attempts to improve the shortcomings (which are there, HPy as a medium-term way to get out of the "C extension vs. parallelism/speedup/GIL" corner, mypyc/numba/TorchScript/JAX pushing the "what can be efficiently expressed in Python" boundary) fail.
Julia, Swift, Dex-Lang, HaskTorch all get things right that Python doesn't. But do they get all the things right that Python does get right?
Disclaimer: I'm biased because I like Python and have some involvement in Python ML tools.
"Like Python is not the future of machine learning. It can't be. You know, it's so nicely hackable, but it's so frustrating to work with a language where you can't do anything fast enough unless you call out to some external code or C code, and you can't run anything in parallel unless you put in a whole other process."
So exactly the point made here.
I have always wondered why. I see so many projects that miss those. Projects from big companies and teams. Are they intentionally adding hurdles to using the code?
I also think that not enough people are aware of the performance limitations of NumPy code and how much more straightforward it can be to drop into a lower level language once you cross a certain optimization barrier. This is where Julia can increasingly serve an important role.
Poorly written code in any language will be slow, so don't start with that.
Python is better for library users in the sense that Python has more libraries, but I remember not many years ago when nobody was doing anything serious in Python for statistics because the libraries were so poor and so few compared to R.
Julia has some great libraries being built that leverage some of the properties of Julia that Python can’t easily match.
Will Julia “beat” Python? Maybe in the sense that Python has “beaten” R. Note that R continues to do just fine, thank you. So too will Python do fine. But you may be surprised to find some killer libraries or frameworks that are firmly in Julia’s domain in the future.
The financials are more secure than stated in the video. (I don't work there, but I'm a fan)
Julia brings great improvements in ability to write simple idiomatic serial code that runs at near C speeds on the CPU (whereas idiomatic Python code is at least 10x slower if you use an optimal compiler, and 1000x slower if you use the standard python interpreter). But for highly parallel code dependent on element-wise or broadcasting array operations, I just don't see the issue with Python.
As far as I know, that example code creates an ad-hoc kernel that performs the computation in a single pass.
I honestly don't know if that is possible in other frameworks.
For example, the simple array expression v+cw (where v,w are vectors and c is a scalar) can be calculated using fused multiply-add instructions. But this is not possible when called in an interpreter, which has to make separate calls for each array operation. For the same reason, this same calculation may saturate the memory bus. This is not an issue with Julia or other compiled languages. Not sure how Numba and/or PyPy optimizes it.
Wrong. Julia can call out to Cuda kernels, but it also does codegen for custom ops and types. Technically with enough engineering time, someone could rewrite all those C cuda kernels in julia, and julia would just generate low level machine GPU code on the fly.
Unfortunately, it lacks a big-name sponsor like Go or Rust.
Python versatility is miles ahead. You can build Web APIs, Web apps, automation scrips, data scrapping and so on.
Python is not the de facto language of ML nowadays because its performance. All the underlying performance sensitive code is written is C/C++/Fortran.
Also, cython, which has great integration with python, is growing a lot and has been used in some cool projects such as aiohttp, fastapi, and spacy. So one more alternative to write fast code without giving up on python entirely.
For instance, in python you happily add a string to the list [1,2,3], but not in Julia because the list has been created as a vector of integers. However, if you create the list as Any[1,2,3], then you can append a variable of any type.
What? And you somehow cannot do that in Julia?
> So one more alternative to write fast code without giving up on python entirely.
I don't have to give up on Python, I can just use all libraries from Julia with PyCall. I'm just glad I don't have to use Python the language anymore than absolutely necessary.
You definitely can. My point is that python libs are more mature and there are more options to choose from than Julia.
Python community better embrace JIT tooling, now that projects like TensorFlow and Torch are going polyglot.
If Python were a blocker for the adoption and development of ML, it wouldn't be the default language of the most popular ML libs.
The reason is, the high performant code is written in C++/C/Fortran. Python is just used to glue everything and provide a nice and rich interface. That's what really matters.
To me Julia is more to Fortran than Python. And it doesn't have many usages outside numeric programming.
Scapping the Web, building rest APIs, Performing Data Analysis, Automation scripts is much easier in Python than Julia or Swift.
I am going to watch Julia adoption, and use it more, when and if it becomes more popular.
> To me Julia is more to Fortran than Python. And it doesn't have many usages outside numeric programming.
Again, you're making unsubstantiated claims.
> Python is just used to glue everything and provide a nice and rich interface. That's what really matters.
So, now I have to not only learn Python but also Fortran/C++/C because the underlying library that I might want to adapt is written in one of these languages. In Julia the DL library, for example, is written in Julia. What you are claiming is a pro is actually a con.
> Scapping the Web, building rest APIs, Performing Data Analysis, Automation scripts is much easier in Python than Julia or Swift.
That might be true for Swift, but certainly not for Julia.
Basically the only time you'll want to do this from Python is if there is specific Fortran or C++ library you want to use. You'd have to do the same in Swift or Julia in this case.
It's kind of a silly point to try to score: "it's possible to write an efficient deep learning library in Julia (although no one has done it yet), and yes, you can do the same in Numpy in Python, and XLA in Python will outpeform it, but someone else wrote some C/C++ there to make that possible!"
You are much more likely to want to write CUDA kernels (in C!) than you are to write C framework code to interface with Python for machine learning.
The person is looking to "get into ML". I've been working as a professional ML developer for 6 years, and I've never written any C or C++ for it.
This may be generally true (though the benchmarks I've seen show Knet.jl and sometimes Flux.jl on par with TF/PyTorch with a single machine + single GPU), but there are definitely domains where it is categorically not. The most prominent one is neural *DEs, where the SciML  ecosystem has SOTA performance. You can really see Python/C++-based frameworks struggle here because they have slow "glue code" and don't (one could argue can't effectively) optimize for latency. That's not a problem for most CV models and transformers, but really stunts research into more dynamic approaches.
I started looking for benchmarks (because it sounds like the kind of thing JAX would do well) and the very first link I clicked included:
Wraps for common C/Fortran methods like Sundials and Hairer's radau
which is exactly what was claimed wasn't needed.
Currently Jax has an in-progress stiff ODE solver that's about 200x slower than SciPy
and SciPy (with JIT) is about 50x-100x slower than the pure Julia methods
so Jax has more than a little bit of a way to go.
Those are exactly the things wrapping fortran and c won't give you
You don't need to learn C/C++ or fortran. I've been working with python and ML for about 4 years and haven't touched any C/C++ or fortran to get work done. I agree that it'd be better if we could do everything with just one language but the truth is that Julia is not that language. It's great for HPC but expressive enough for generic things like Web.
See https://genieframework.com/ and interact.jl
Once it can compile it web assembly (work in progress) it be the obvious choice.
Julia might be a good thing to explore once you're up and running.
The most important for newcomers is to pick one language and stick with it long enough to know it well (I would advise for a general-purpose programming language such as python). It will take time before you bump into its limits, and when that happen you can start to look around how things are being done in other languages.
The latter two are tools to send ML models and training pipelines around, so kind of like Google Docs for AI. Colab is on the internet, Jupyter on your PC.
That way, you can easily exchange your experiments with others and/or asks for help online.
I also believe Python is not the future of ML but it's because the future is more about the space maturing and becoming commoditised and "boring" which probably means, being completely honest, things that people have no interest in happening like it migrating over to the JVM etc.
But who wants to talk about that?
It's a very fast compiled language and compile faster than most of its competitors.
I supports interactive programming with rdmd and can work as a kernel inside Jupyter notebook through Jupyter's wire protocol.
It has friendly python vibe to it due to default to GC based ecosystem (but you choose to let go the GC where appropriate).
With the introduction of DPP, its FFI capability to C/C++ is second to none if the needs arise to interface with the existing libraries in those languages and it can also easily interface seamlessly with python and R libraries! 
It also has a growing library for big data analysis  and one of the main users of D is WekaIO, one of the prominent big data companies with the claim of the world's fastest file system for data storage .
If you insist on multiple dispatch style like Julia, you can emulate it as well in D .
If you want to see the glimpse of what D can offer for data science domain please check this "D is for Data Science" article .
Finally if you want to see D in action against Julia and Chapel for kernel matrix calculations that's common in AI type of applications please check this post .
For comparison, at the age of 20 years (around 2010) Python was still playing a second fiddle to Perl, and at the same time Ruby was fast becoming popular due to RoR.
The good news is that now D already passed the growing pain problem regarding the transition from D1 to D2 (e.g. Tango library issue), very similar to Python transition from 2 to 3 that happened fairly recently, or perhaps it's still happening now.
Jeremy Howard says python will always be calling out to something else that's faster and more parallel. Writing fast python has too much cognitive overhead. He hopes Julia catches on because it's Julia all the way down.
- implement your work intensive logic with the adapted tools
- glue everything together with Python
It's a flexible language, but I wouldn't say bloated - there actually aren't so many language-level features, but the features there are are very general. Some parts are simpler, e.g. in Python I struggle to remember which method you have to define to overload the * operator, whereas in Scala you just define a method called * . But I'd agree that there are a bunch of overcomplicated frameworks that can be pretty confusing - it's not always an easy language to get started with. But I stand by the statement that you can write Python-like code in it, at least once you know what you're doing.
> Python in most ML frameworks is just a frontend, other beefy stuff gets processed by c++ or c
Right, and that introduces a bunch of overhead and possibilities for weird errors - e.g. when you hit a bug in a library you pretty much have to learn how to debug C. In Python the benefits are worth it, but there's undeniably an overhead - if you could use the same language top-to-bottom but still have all the nice things Python gives you, that would be a much nicer way to work.
Macros are less of an issue with modern IDEs, and that is hardly any different from any other language with macro support, including Scala.
It's a pain in the ass, and there is no need to pretend that it's great just because Python sucks (performance wise).
Python seems to have won.
One of the strength of Python is its capacity to integrate with anything. We never said it was the only language with that strength.
With Python comes an ecosystem, another language another ecosystem. Choose the tool that fits your need.