
Weld: Accelerating numpy, scikit and pandas as much as 100x with Rust and LLVM - unbalancedparen
https://notamonadtutorial.com/weld-accelerating-numpy-scikit-and-pandas-as-much-as-100x-with-rust-and-llvm-12ec1c630a1
======
spenrose
"the first implementation was in Scala, which was chosen because of its
algebraic data types and powerful pattern matching. This made writing the
optimizer, which is the core part of the compiler, very easy. Our original
optimizer was based on the design of Catalyst, which is Spark SQL’s extensible
optimizer. We moved away from Scala because it was too difficult to embed a
JVM-based language into other runtimes and languages."

There is an important "contemporary history of computing" article to write
about the evolution of the Spark project from "let's build a distributed
filesystem for MapReduce in Java because we read those early Google papers" to
"SQL is the right model for working with data so DataFrames" to "meet data
scientists where they are: Python (and R)" to "make machine learning easy" and
now to "LLVM, but for crunching big numeric arrays".

~~~
vmchale
Not that you should replace Rust with Haskell, but Haskell would've been a
better choice than Scala.

It has its own runtime, but it's not difficult to call Haskell code from C or
ATS or whatever.

~~~
choeger
Not difficult to call from C? How does that work, exactly? Wouldn't you need
to properly setup the whole runtime (incl. GC) first?

~~~
marmaduke
Yes but I think the parent meant its 'just' an #include and ghc_init() away.

------
the_duke
See also this interesting talk on Weld at RustConf 2019:
[https://www.youtube.com/watch?v=AZsgdCEQjFo&t=1430s](https://www.youtube.com/watch?v=AZsgdCEQjFo&t=1430s)

------
westurner
There's also RustPython, a Rust implementation of CPython 3.5+:
[https://news.ycombinator.com/item?id=20686580](https://news.ycombinator.com/item?id=20686580)

>
> _[https://github.com/RustPython/RustPython](https://github.com/RustPython/RustPython)
> _

~~~
sieabahlpark
Why didn't they call it Rython

~~~
microcolonel
Or RytOn ;-)

~~~
nostrademons
Or IronPython...oh, wait. ;-)

~~~
nine_k
Fe2Py3

~~~
0-_-0
FeOPy3

------
tmostak
Also worth checking out OmniSci (formerly MapD), which features an LLVM query
compiler to gain large speedups executing SQL on both CPU and GPU:
[https://github.com/omnisci/omniscidb](https://github.com/omnisci/omniscidb) .
And here's a link to a blog post giving a high level overview of the
advantages of JIT compilation of queries over an interpreter:
[https://devblogs.nvidia.com/mapd-massive-throughput-
database...](https://devblogs.nvidia.com/mapd-massive-throughput-database-
queries-llvm-gpus/) .

~~~
nautilus12
That tweetmap is impressive

------
adrien-treuille
This post combines pretty much every technology I'm obsessed with right now:
Python, Rust, Pandas, Numpy, and LLVM. Yess!!!

~~~
whoevercares
Just a word of caution, always obsess with product and customer needs first :)
In ML/data science tech first normally won’t end up well

~~~
d33
It's important to enjoy your work, which is - among other causes - about
having right tools. Also, some of us actually get to have some influence over
what language we write our projects in.

~~~
ekianjo
I think you misunderstand the parent. The obsession to always focus on tools
is I think, what they described. In the end of the day what matters is what
you produce, not what tools you used. Nobody cares about what you used, apart
from engineers.

~~~
naniwaduni
_You_ care.

~~~
toss1
Are you coding as a hobby or a profession?

If you are a professional, you will use the most effective tool for the job -
to get results. What tool will produce the best results - schedule, budget,
quality, maintainability, scalabi, portability, etc.?

Other than outliers that will crush your productivity, or multiply it, your
feelings are pretty irrelevant.

Similarly, when you get into a racecar, your feelings about your preferred
driving style are irrelevant - if you can change the setup to accommodate your
style without slowing jt down, great = but if not, your job is to adapt to the
situation and reliably get the best possible result.

Either way, you have fun and produce a crap result, you will not be
congratulated (or re-hired), and if you have little fun and produce a great
result, you'll get both.

If it's a hobby, do whatever you want.

Obviously, in terms of professional development, you want to use more forward
looking tools, but what is the best measure of that - your feelings or
results?

~~~
turk73
I sort of disagree with your main assertion. I do big data for a living and
what I have seen is that our architecture is dictated to us from above for
reasons of "fashion" not really for any reasons of practicality.

I'm actually looking for a different job for that reason.

We are required to used Java on K8s, Kafka & Cassandra for every single
solution big or small because it is fashionable, not because it gets the job
done well or for any other reason. I can even demonstrate how a couple of
Python scripts and Pandas could do all the same work with far less overhead
and achieve the same results. Crickets. Python is not sexy where I am, it is
the language of peasants, apparently. Not sure what to make of it all, but
that is my reality right now.

Also, I don't think you know anything at all about driving race cars. The
driver has a tremendous amount of input into the car's setup because it's his
life on the line out on the track. "Adapting to the situation" gets finishes,
not wins.

~~~
toss1
Looks like we agree more than disagree. Seems like whoever is deciding on the
tools, are failing to do so based on the job/project, instead opting for
'fashion' or whatever.

Good reason to seek a new situation, since you have neither appropriate tools
selected for you nor input to select better ones.

Racecars? Yeah, I've only won some SCCA super-regional championships. Yes the
driver does have a very large inptut into the setup, BUT it is within the
constraint of the combination of the setup change and the improved driver feel
must make the combination of car/driver faster. And yes, sometimes a change
that makes the car technically a bit slower but gives the driver more
confidence will result in faster net lap times -- and those are OK. But
whatever the setup is, at the end of the test sessions, whether the car feels
great or feels like crap, it's the driver's job to get the most out of it.

And I've had many situations both in the racecar and in international alpine
ski racing where something felt weird/odd/unfamiliar/scary, but was fast as
heck, so it was my job to adapt, rather than go back into my comfort zone.

Better to keep pushing outside your comfort zone, use tools/setups that get
better results, and change your 'feel' to appreciate the better setup.

------
stereosteve
This project has similar goals to the MLIR project:

[https://github.com/tensorflow/mlir](https://github.com/tensorflow/mlir)

[https://www.youtube.com/watch?v=qzljG6DKgic](https://www.youtube.com/watch?v=qzljG6DKgic)

Exciting times for the future of parallel computing!

------
mlthoughts2018
Very bizarre there is no discussion of numba here, which has been around and
used widely for many years, achieves faster speedups than this, and also emits
an LLVM IR that is likely a much better starting point for developing a
“universal” scientific computing IR than doing yet another thing that further
complicates it with fairly needless involvement of Rust.

[https://numba.pydata.org/](https://numba.pydata.org/)

~~~
sppalkia
I'm one of the developers of Weld -- Numba is indeed very cool and is a great
way to compile numerical Python code. Weld performs some additional
optimizations specific to data science that Numba doesn't really target right
now (e.g., fusing parallel loops across independently written functions,
parallelizing hash table operations, etc.). We're also working on adding the
ability to call Python functions from within Weld, which will allow a data
science program expressed in Weld to call out to other optimized functions
(e.g., ones compiled by Numba). We additionally have a system called split
annotations under development that can schedule chains of such optimized
functions in a more efficient way without an IR, by keeping datasets processed
by successive function calls in the CPU caches (check it out here:
[https://github.com/weld-project/split-annotations](https://github.com/weld-
project/split-annotations)).

Overall, we think that the accelerating the kinds of data science apps Weld
and Numba target will not only involve tricks such as compilation that make
user-defined code faster, but also systems that can just schedule and call
code that people have _already_ hand-optimized in a more efficient and
transparent way (e.g., by pipelining data).

~~~
infinite8s
Although to be fair, there is no reason why numba couldn't gain those
capabilities, it just hasn't been a focus of the project. It should be
possible to build a lightweight modular staging system in python/numba similar
to Scala's ([https://scala-lms.github.io/](https://scala-lms.github.io/)) or
Lua's ([http://terralang.org/](http://terralang.org/)).

------
axegon_
I have said multiple times that Rust has an incredible potential in the data
analysis world. And Weld is a great example.

~~~
the_duke
Weld is a compiler/JIT/runtime though, something Rust is very well suited for,
and which is very different code from data analysis/ML.

I think Julia is a more interesting language for this space, with the built in
matrix support, easier prototyping, a REPL, etc...

~~~
sppalkia
Rust is great, but this is an important comment! We used it to implement
Weld's compiler and runtime, but we don't expect data scientists who use
languages such as Python, Julia, or R to switch over to it; the idea is that
these data scientists continue using APIs in these languages, and under the
hood, Weld will perform optimizations and compilation for decreasing execution
time (and these "under the hood" components are the ones that we wrote in
Rust).

~~~
fluffything
Would Weld be able to do a better job if these scientist were using a Rust
library instead ?

A lot of people would like to use Rust for data-analysis / machine learning,
but there are not really any good batteries-included frameworks for getting
started with this.

------
riboflavin
Sounds a lot like Gandiva (part of Apache Arrow) as well.
[https://www.dremio.com/announcing-gandiva-initiative-for-
apa...](https://www.dremio.com/announcing-gandiva-initiative-for-apache-
arrow/). Cool!

------
ris
So... this requires cooperation from the underlying libraries (numpy,
pandas...) - what is the likelihood of said libraries adopting this upstream
vs Weld having to maintain their own shadow implementations for the
foreseeable future?

Numpy et al of course already have N python acceleration frameworks hammering
at their doorsteps to integrate more closely...

~~~
sgillen
How much cooperation is needed though? It seems to me that all that numpy
pandas etc. need to do is maintain a stable API, which they already do AFAIK.

------
xiphias2
I saw a performance comparision with XLA, and it's interesting that Weld is
faster, because XLA is supposed to optimize the code using the known tensor
sizes during compile time.

Weld and XLA seem to have similar optimization steps though.

~~~
sppalkia
XLA and Weld do have similar optimizations -- at their core, one of the main
things they do is removing inefficiencies like unnecessary scans over data,
common subexpressions, etc. _across_ many operators. The speedup in the
benchmark you're referring to actually involved some NumPy code too for pre-
processing, and the reason Weld outperformed XLA is because Weld could perform
those kinds of optimizations _across_ TensorFlow operators and NumPy functions
(whereas XLA only optimizes the TensorFlow part of the application).

I also want to mention that this benchmark is from a while back (around 2017 I
believe), so its possible improvements in both XLA and Weld will make the
numbers look different today :)

~~~
davmre
For what it's worth, Jax (github.com/google/jax) now lets you use XLA to
compile Numpy code. It'd be cool to see how that would stack up in a modern
comparison.

------
dlphn___xyz
whats the benefit of rust over julia or C for computation?

~~~
shpongled
Not to sound like a member of the Rust evangelism strike force, but after
using Rust for a couple years, I don't have any desire to go back to C - sum
types alone are worth the switch to me, not to mention iterators, concurrency
story, etc.

~~~
shaklee3
C++ has sum types as of several years ago with std::variant.

~~~
codr7
The way Python has macros, sure :)

You would have to keep everything in variants, or wrap/unwrap manually all
over the place to get similar functionality.

And C has tagged unions.

~~~
shaklee3
Can you elaborate more? What do other languages have over std::variant/visit?

~~~
fluffything
That would be like explaining C++ Concepts to an assembly programmer from the
60s that had never used a "function" as a way of abstracting code.

If you really want to know, spend one afternoon learning any programming
language with built in support for that (Rust, Ocaml, Haskell, ...). ADTs is
one of the first things one learns.

In Rust, the features you'd need to learn are enums, patterns, and pattern
matching.

But be warned that using C++ variant and std::visit will feel like you are
being forced to only write C instead of C++ for the rest of your life, knowing
that life could be much better. Once you learn this, there is no way to un-
learn it.

~~~
shaklee3
I know how these things work in Rust, and I'm still failing to see your point.
It's not at all as complicated as what you are saying given that you can find
many blog articles that explain it succinctly in a couple paragraphs.

It's not at all helpful to say that there's something so complicated on these
other languages that you can't possibly get the idea across without using
them.

~~~
fluffything
Try doing any of this with variant and visit

    
    
        enum A { Foo{ x: i32, y: f32 }, Bar(B), Baz([u32; 4]), Moo(i32), Mooz(i32) }
        struct B { z: f32, w: (f64, f32) }
    
        let b = B { z: 42.0, ..}; // create a b with z == 42 and default w
        let B{ w: (first, _), .. } = b; // get B.w.0 field
        let a = A::Foo{ x: 42.0, ..};
        if let A::Bar(Bar { z, ..}) = a {
          // if a is an A::Bar, get b.z field of A::Bar(b)
        }
        if let A::Baz([0, 1, 2, 3]) = a {
          // if a is an A::Baz containing an array 
          // with value [0, 1, 2, 3]
        }
        match a {
            A::Moo(1..3 @ v) => {
               // if a is an A::Moo where x in A::Moo(x) is in range [0, 3) and put the value in the local v variable
            }
            A::Moo(x) | A::Mooz(x) => { 
               // Either A::Moo or A::mooz, gives you the value of x
            }
            // ERROR: I forgot to match some patterns
        }
    
        foo(b);
    
        fn foo(B{ z, ..}: B) -> f64 { 
          // get the z field of the first function argument
          z
        }
    

What in Rust are one liners, and can be used anywhere (let bindings,
constructors, match, if-let, while-let, function arguments, ...) is a pain to
write in C++ using `std::visit` and `std::variant`. The error messages of
`std::visit` + `std::variant` are quite bad as well. And well, then there are
also other fundamental problems with variant like `variant<int>` having two
variants, `variant<int, int>` having 3 variants, but you can't reach the
second `int`, etc.

You can translate all the code above to C++ to use std::visit + std::variant
instead. I personally find that C++ is unusable for programming like this, and
almost never use std::variant in C++ as a consequence, while I use ADTs in
Rust all the time.

------
Myrmornis
> The motivation behind Weld is to provide bare-metal performance for
> applications that rely on existing high-level APIs such as NumPy and Pandas.

With regard to Pandas this makes me pause slightly, since, while pandas
contains lots of high quality and high performance implementations, the API of
pandas in some places doesn’t feel well-designed (the most obvious example is
indexing of data frames via square brackets and the various properties like
iloc).

------
syrusakbary
This is awesome. The quality of work behind it it's incredible.

I think there might be something interesting for this strategy also in the
WebAssembly space :)

------
objektif
Can anyone pls tell me if there are any other tools out there to increase
performance of pandas?

~~~
alcidesfonseca
Modin is an alternative pandas implementation for distributed processing using
Ray or Dask:

[https://github.com/modin-project/modin](https://github.com/modin-
project/modin)

------
roadbeats
Interesting. It looks like Rust and Swift will be competitors in this field.

------
xtat
This would have made so much of my work so much faster

------
RocketSyntax
you had me at keras

------
janered
Interesting that initial implementation was in Scala but then they switched to
Rust because of minimal runtime, language embeddability, functional paradigms,
community and high quality packages. The hype bandwagon is so real in here. So
basically one could say the same for several other well established languages,
e.g. Haskell. Also what saddens me is that everyone forgets about D which has
the same benefits and a syntax that does not make scratch your eyes out,
especially when it comes to FP. Also D has not actually "skipped the leg day"
;)

