
Fixing Python Performance with Rust - ngoldbaum
https://blog.sentry.io/2016/10/19/fixing-python-performance-with-rust.html
======
lqdc13
I'm using Nim's Pymod [https://github.com/jboy/nim-
pymod](https://github.com/jboy/nim-pymod) for this exact purpose and I think
it's much better suited.

The reason is that it can automatically generate C API code without FFI,
because FFI calls are slower
([https://gist.github.com/brentp/7e173302952b210aeaf3](https://gist.github.com/brentp/7e173302952b210aeaf3))
so there is less overhead. You obviously care about overhead here.

Nim's pymod is already a python module and you can send strings and numpy
arrays to Nim for fast processing.

I wish you could send bytes from python3, but that's not implemented yet.

~~~
ngoldbaum
Do you have experience with cython? If so, how does Nim compare for writing
fast C extensions?

~~~
the_mitsuhiko
My personal experience with cython (and why I'm less than lukewarm about it)
is that it's about as much fun as C to write and only really helps wih he
annoyance of dealing with PyObject. That's something that was not even
required for the sourcemap lib.

Debugging and writing cython that is fast is feally not a very pleasant
experience and the tooling is not great. Let alone that a real ecosystem
exists. There is not even a good way to deal with dependencies at compile
time.

------
jlarocco
A neat case study, but "Embedding Rust in Python" is a poor way to phrase it.

If I'm reading it correctly they're just using CFFI to load and call a shared
library - it's really not embedding anything. The fact that the library was
written in Rust is interesting, but as far as Python is concerned it could
easily have been written in any language that can create a shared library.

~~~
the_mitsuhiko
> If I'm reading it correctly they're just using CFFI to load and call a
> shared library - it's really not embedding anything. The fact that the
> library was written in Rust is interesting, but as far as Python is
> concerned it could easily have been written in any language that can create
> a shared library.

Of which there are not that many that would make a shared library that can be
safely loaded into a Python process. Traditionally this was limited to C and
C++. As far as embedding goes: Rust like most things needs minimal runtime
support and that is embedded in the dylib.

~~~
jlarocco
FWIW, most languages that can generate native code can generate shared
libraries, and will usually have instructions for how to call the libraries
from other languages.

Here's how to do it Haskell, for example:
[https://downloads.haskell.org/~ghc/7.6.3/docs/html/users_gui...](https://downloads.haskell.org/~ghc/7.6.3/docs/html/users_guide/using-
shared-libs.html)

That said, if optimization gets to the point where re-writing in a different
language is a good idea, most people just jump to C or C++, or in this case
Rust, because they're the fastest.

------
dekhna
Not a fan of the title, you aren't fixing python performance with rust, you
are avoiding python performance with it

~~~
vishalzone2002
+1

------
bmh100
Small question: since this is improving performance on a given machine, isn't
this actually an example of vertical scaling, as opposed to horizontal?

~~~
mattrobenolt
I guess this is just phrased a bit oddly. It's just a factor in helping us
scale by requiring us to use our hardware more efficiently. Not sure I'd say
it's horizontally or vertically. We've just made each unit of work cheaper to
help things along.

~~~
Rapzid
It's vertical almost by definition. I'll say it.

------
denfromufa
So why not Cython like PayPal?

~~~
0xCMP
Yea they mentioned alternatives, but cython wasn't mentioned at all. I guess
Rust makes sure there aren't any memory issues , but it's far from the only
easy option.

Cython is used everywhere and can be compiled on the machine it's about to be
used on. It also follows best practices for talking to python via FFI.

~~~
ngoldbaum
Not sure what you mean about cython and the FFI. As far as I know, cython is
tightly coupled to the CPython C API.

I agree that cython is a good option if you only care about CPython.

~~~
thesmallestcat
That's just not true, Cython works with PyPy
[https://cython.readthedocs.io/en/latest/src/userguide/pypy.h...](https://cython.readthedocs.io/en/latest/src/userguide/pypy.html),
and it's not like you're going to target Jython with CFFI.

~~~
ngoldbaum
Cython does work with pypy, but via the cpyext emulation of the CPython C API.
See this answer from one of the pypy devs when I asked about this about a year
ago:
[https://news.ycombinator.com/item?id=10195892](https://news.ycombinator.com/item?id=10195892)

Maybe cpyext has gotten faster since then, but I think that's the state of
things still.

------
jondot
I did the same thing with Go and Ruby:
[http://blog.paracode.com/2015/08/28/ruby-and-go-sitting-
in-a...](http://blog.paracode.com/2015/08/28/ruby-and-go-sitting-in-a-tree/)

IMHO the end result is more maintainable, readable, and accessible from FFI
point of view. Regarding the performance, so Go has a GC, but I'm wondering if
that would affect things dramatically at all.

Here is the Ruby side FFI code:
[https://github.com/jondot/scatter/blob/master/lib/scatter.rb](https://github.com/jondot/scatter/blob/master/lib/scatter.rb)

And here's the "native" part:
[https://github.com/jondot/scatter/tree/master/ext](https://github.com/jondot/scatter/tree/master/ext)

Every now and then I keep looking at Rust and how it can integrate with higher
level languages, the last time I really wanted OpenCV to work well with Rust.
I think that's a big selling point. So far, to me, it's not perfect yet but it
may get there.

From a pragmatic point of view, I imagine Sentry getting more bang for a buck
with Go as there would be less wheels to invent from an ecosystem POV, and
from a maintenance POV it would be closer to Python. But that wouldn't advance
any of the Rust ecosystem at all, and we do need that as a collective.

~~~
the_mitsuhiko
How do you deal with the lifetime of memory passing from Python to Rust and
back in the presence of two garbage collectors?

~~~
jondot
If I understand the question correctly, you are saying:

1\. There's a bunch of objects that need to pass down the ffi boundaries
py->go 2\. compute 3\. There's a bunch of objects that need to pass up the ffi
boundaries go->py 4\. Python now continues as usual with a bunch of processed
objects

In that case, yes this would be a problem. The way I'd resolve it is by
planning the ffi boundaries accordingly. I'd make python do as less as
possible, and pass just declarative "instructions" to go. In this case where's
the file location and where's the sourcemap file location (and perhaps where
to dump output to if that's the case). And go doing as much work as possible
to make sure there's only a minimal number of objects passed back if any.

It may _feel_ like a hack but ultimately its the same approach if you were to
make a "sourcemap server" making python code communicate with it over RPC.

If this is not the problem then I'd love an example of what you meant

~~~
the_mitsuhiko
> 1\. There's a bunch of objects that need to pass down the ffi boundaries
> py->go 2. compute 3. There's a bunch of objects that need to pass up the ffi
> boundaries go->py 4. Python now continues as usual with a bunch of processed
> objects

You can look at the library in question. An object gets created in Rust but
the ownership of that object is held in Python. When the Python GC runs we
clean up the Rust object.

> It may _feel_ like a hack but ultimately its the same approach if you were
> to make a "sourcemap server" making python code communicate with it over
> RPC.

Sure, but that significantly complicates the problem. To the point in fact
where I question if the Go solution makes any sense at all because it takes
away the advantage you have where you can just drop an extension module in
without much work. Once you need to restructure your system to be message
based you might as well go in and run a separate process and use a unix pipe
to communicate. We used to do that for a few things like our debug symbol
symbolication and the downsides are just too big.

~~~
jondot
I understand. So now if I may backpaddle a bit, why go through the trouble of
having python own the objects? why not let Rust (or Go) deal with the entire
bulk of the job at hand?

~~~
the_mitsuhiko
Because that would require a huge changes to our codebase. We pass those
objects around in various places already.

------
giancarlostoro
Shouldn't it be said improving Python performance? Is this a bugfix of Python
that can't be 'fixed' in the initial software? Maybe I'm just reading it
oddly.

Sidenote: I wonder how improving Python performance with D fares considering
it links up to C pretty nicely.

~~~
kbaker
I don't think you can write a DLL/dylib in D to work with Python, since D has
a GC (actually, it looks like maybe use of the GC can be worked around with
some careful programming in D. Still, the D runtime itself may conflict.) From
the article:

> In that case, your requirements to the language are pretty harsh: it must
> not have an invasive runtime, must not have a GC, and must support the C
> ABI. Right now, the only languages I think that fit this are C, C++, and
> Rust.

~~~
Volt
I'm sceptical. I highly doubt GC matters here at all. The runtime restrictions
might be the author's own requirements.

~~~
the_mitsuhiko
I am not intelligent rnough to contemplate about the ramifications of teo
independent GCs having control over their own memory and making that work
together well. I'm sure there are ways but it's definitely not easy and not
something I would just try on a whim.

------
dikaiosune
Super cool!

Is there a reason the Rust-exported functions aren't marked with `extern "C"`?

~~~
wyldfire
Are you referring to the declarations in libsourcemap.h or the definitions in
cabi.rs?

Regarding the declarations: this would only be necessary if processed in C++
context, to change the name mangling/linkage features of the declarations. For
portability sometimes authors hide these behind "ifdef __cplusplus" barriers,
but it's not really critical here.

Regarding the definitions: "Exposing a C ABI in Rust" from the article
describes this in detail. For the most part, "#[no_mangle]" has the same
effect that "extern "C"" has on linkage/mangling in C++.

~~~
dikaiosune
I'm referring to the definitions of the Rust functions.

As far as I know, `#[no_mangle]` disables name-mangling but doesn't change the
ABI of a function. That's what `extern "C"` is for in Rust -- to declare a
function with the C ABI. You can have a Rust ABI function with an unmangled
name (what it looks is done in the post) and you can have a C ABI function in
Rust with a mangled name (by using `extern "C"` but not `#[no_mangle]` -- for
example for C callbacks).

Based on my limited understanding of C++, `extern "c"` in C++ is equivalent to
using _both_ `#[no_mangle]` and `pub extern "C"` in Rust. I would guess that
much of the time failing to specify a C ABI would work out fine unless you try
to accept or pass non-FFI types (enums, references, etc) but I'm not sure.

It's confusing as hell. There was a thread about the mixed up semantics
somewhat recently on the internals forum: [https://internals.rust-
lang.org/t/no-no-mangle/3973](https://internals.rust-lang.org/t/no-no-
mangle/3973) (edit: and also [https://internals.rust-lang.org/t/precise-
semantics-of-no-ma...](https://internals.rust-lang.org/t/precise-semantics-of-
no-mangle/4098)).

If you look at the LLVM IR generated from
[https://is.gd/Hfup3X](https://is.gd/Hfup3X) you can see that the extern "C"
fn differs in that it's given a `nounwind` attribute, among other things.

~~~
wyldfire
Ok, good call.

Thanks for the tip btw I think this means I have a bug in my code. ;)

~~~
dikaiosune
I'm glad it helped! Sorry if it came on strong.

------
jgalt212
How about this:

Why not just deserialize all the source maps ahead of time and just
store/retrieve them as msgpack objects?

Per this python serialization speed comparison, msgpack is ~ 10X faster than
json. So you get the same speed up, but no Rust.

[https://gist.github.com/cactus/4073643](https://gist.github.com/cactus/4073643)

------
jgalt212
Stupid Question:

Why not just deserialize all the source maps ahead of time and just
store/retrieve them via cPickle? Wouldn't that get you almost the same results
without having to learn and support a second language (Rust, in this case)?

[Edit]

cPickle is slower than JSON, but browsing the interwebs it seems that marshal
can be 2X faster than JSON and 4X faster than cPickle.

------
forgottenpass
Things wrong with the title:

\- It is not fixing python's performance.

\- The performance improvement has very little to do with the choice of Rust.

~~~
kbenson
I think that's just a combination your initial interpretation of the title and
being a little too pedantic.

They are fixing a case of Python performance being a problem _in the context
of their needs_ , and the way they solved it was with Rust (and there's no
implication that it _had_ to be Rust in the title).

It's not _wrong_ , it's just vague.

------
ksec
I remember Skylight.io did something similar with Ruby.

Edit: [http://blog.skylight.io/introducing-
helix/](http://blog.skylight.io/introducing-helix/)

~~~
masklinn
That's a bit different, the point of helix is to easily build native modules
_for Ruby_ in Rust. The case here was using a regular FFI (cffi) to call Rust
code as if it were C, without using Python's C API or anything.

------
tempodox
A good write-up and a great case for Rust.

------
silur
"oh my god, using native code is faster than interpreting, such exciting and
revolutionary news"

------
obviouslee
To summarize: instead of improving Python's maps to consume less memory
they've embedded an entirely different language, Rust, into Python to solve a
particular problem.

Doesn't make Python look good.

~~~
pcwalton
Python is hard to beat for speed of development. Rust is hard to beat for
performance and memory usage. So why _not_ combine the two?

~~~
obviouslee
Complexity.

~~~
the_mitsuhiko
Out of all options we had this was probably the least conplex one given our
set of tools and experience.

