
Speed up your Python using Rust - jD91mZM2
https://developers.redhat.com/blog/2017/11/16/speed-python-using-rust/
======
emj
About as fast as numpy.. More tools to create fast code is always great, but
the tooling for Rust/C in Python needs to be easier, I just can't be bothered
most of the time.

This in numpy gets a better relative boost on my machine YMMV.

    
    
        import numpy
        def count_double_chars_np(val):
    	ng=np.fromstring(val,dtype=np.byte)
    	return np.sum(ng[:-1]==ng[1:])
    
        def test_np(benchmark):
            benchmark(count_double_chars_np, val)

~~~
rochacbruno
Hi, can you send a Pull Request including your numpy implementation?
[https://github.com/rochacbruno/rust-python-
example](https://github.com/rochacbruno/rust-python-example) I would like to
add it there just for the record and then I will update the article.

~~~
fnl
Thank you for the very nice, educative article, Bruno!

If performance comparison of counting character pairs really were the issue
here, in addition to the already suggested numpy approach, an implementation
I'd dare wager to be as competitive is re2, e.g. [1], a drop-in replacement
for the standard re package.

But I want to point out that I think all this performance comparison of this
trivial character counting distracts from the core idea here: You'd use a low-
level implementation in Rust (or C/C++/Cython, for that matter) when such
"nifty tricks" are not available, after all. So again thanks for the article,
and do think if you really want this performance issues to degrade the article
to a only marginally relevant performance "showdown".

[https://pypi.python.org/pypi/re2/](https://pypi.python.org/pypi/re2/)

------
js2
See also “Fixing Python Performance With Rust” previously discussed here:

[https://news.ycombinator.com/item?id=12748020](https://news.ycombinator.com/item?id=12748020)

And “Evolving Our Rust With Milksnake”:

[https://news.ycombinator.com/item?id=15697570](https://news.ycombinator.com/item?id=15697570)

Both from Armin Ronacher at Sentry. About Milksnake:

 _Milksnake helps you compile and ship shared libraries that do not link
against libpython either directly or indirectly. This means it generates a
very specific type of Python wheel. Since the extension modules do not link
against libpython they are completely Python version or implementation
independent. The same wheel works for Python 2.7, 3.6 or PyPy. As such if you
use milksnake you only need to build one wheel per platform and CPU
architecture._

~~~
rochacbruno
Yeah, Milksnake is mentioned in the article :)

~~~
js2
Doh, completely missed that reading the article on my phone.

------
Dowwie
I never got around to talking about it, but as part of my "month of Rust", I
ported permission-based authorization logic from Python to Rust and then ran
performance benchmarks of the Rust implementation and a pypy-compiled version.
The pypy-compiled python ran slightly faster.

I've been told not to expect similar results in other implementations. These
findings cannot be used to draw any conclusions about pypy.

My rust project:
[https://github.com/YosaiProject/yosai_libauthz](https://github.com/YosaiProject/yosai_libauthz)

~~~
k__
I don't understand all the implications, but I often hear from JIT language
people that their language could be as fast as a AOT language if it was used
correctly.

What are the use-cases that lend itself to be faster in Rust, than
JS/Ruby/Phython?

~~~
brightball
Fwiw, Ruby does get much more performant with JRuby but nobody cares that much
because the benefits are mostly lost with Rails.

------
Matumio
For comparison, I just implemented the same as C SWIG extension[1]. It's about
10% faster, but it's cheating by comparing bytes instead of utf-8 encoded
characters. The more interesting part to me is the comparison of the amount of
boilerplate code required.

[https://github.com/martinxyz/rust-python-
example/commit/f8e3...](https://github.com/martinxyz/rust-python-
example/commit/f8e36ab5f9c)

~~~
radarsat1
One thing though that gets _very_ complicated about using SWIG is ownership
semantics. With anything more complicated than passing scalar values, it is
very easy to introduce a memory leak or double-free if you don't get the flags
right. I wonder if Rust types naturally allow a much better inference of
ownership semantics across the language boundary?

~~~
Matumio
If you try to wrap any non-trivial type using SWIG typemaps you will quickly
go insane. However to speed up an inner loop you can often get away with a few
PyObject* arguments/returns. SWIG will pass those through and you can use the
Python/C API directly, e.g. to return a numpy array. Allow SWIG to handle only
simple types. The Python/C API is relatively sane, but you'll have to learn
the reference counting conventions.

~~~
radarsat1
Agreed! On my current project (C++) I found that things get extremely
complicated with shared_ptrs and directors. I even ended up contributing some
solutions to SWIG.

It all appears to be due to a lack of semantics in the C header. SWIG depends
on specifying this stuff in the interface file, but I've often wondered if it
wouldn't be better to enhance the C-side, either by standard parameter name
conventions or by some Doxygen-like standard comments to indicate ownership
and other stuff.

SWIG has this nice potential to generate wrappers for (m)any language(s), but
in practice as you said it's often just easier to use the Python API directly
instead of trying to make it too general. Shame.

~~~
luthaf
I have a C++ project that is available in multiple languages (Python, Rust,
Fortran, Julia and JS), and I decided not to use SWIG partly because of this
issue. Instead, I manually maintain a clean C API with everything that I need,
and manually wrap this API using whatever is available in other languages. It
is a bit more work (and thus incentivize me not to break the API ^^), but
allow me to mix and match various ownership semantics throughout the API.

------
vog
I like the article, but the following advice confused me, especially since
this comes from RedHat i.e. Linux people:

 _> Having Rust installed (recommended way is
[https://www.rustup.rs/](https://www.rustup.rs/))._

This essentially recommends unconditionally using the "curl | sh" anti-
pattern.

Shouldn't they recommend instead e.g. "apt-get install rustc" for Debian
users?

Since this doesn't make use of too recent Rust features, using Rust 1.14 of
Debian/Stable should be fine, shouldn't it? Same of Fedora, etc.

~~~
somenewacc
I think distros package rustup too. Here on Arch it is pacman -S rustup
(instead of the curl | sh thing) then proceed normally.

Installing an outdated toolchain makes little sense because after this example
people interested in Rust may want to do other things, and will encounter an
artificial roadblock when they (or a worse, a dependency) needs a newer Rust.

I think the Rust packaged in the distros is meant to be a build-dependency of
software written in Rust (for example, ripgrep), not for Rust developers.

~~~
cbcoutinho
This is true for the rust compiler included in the opensuse repository - its
only there to build packages (of which the newly released Firefox is and has
been since 54)

------
b0rsuk
> Rust is a language that, because it has no runtime, can be used to integrate
> with any runtime; you can write a native extension in Rust that is called by
> a program node.js, or by a python program, or by a program in ruby, lua etc.
> and, however, you can script a program in Rust using these languages. —
> “Elias Gabriel Amaral da Silva”

Can someone explain why is "having a runtime" problematic for writing
extensions and calling them from Python ? From what I gather Go does have a
runtime, so implicitly it should be suboptimal for calling from Python. Yet
since 2015 (Go 1.5) can be called directly from Python. I'm a Python
programmer looking to expand my tool belt. I'm wondering of relative pros and
cons of Rust and Go. I have only written small toy programs in C and other
compiled languages.

Is Go better suited to completely rewriting software rather than using it for
extensions ? Why ?

I would appreciate a benchmark with a Go extension, too.

~~~
jerf
"Can someone explain why is "having a runtime" problematic for writing
extensions and calling them from Python ?"

Perhaps instead of saying "having a runtime" it would be better to examine the
situation in terms of what the code assumes. Python assumes that it has the
Python GC running on its code, that everything is a PyObject of one sort or
another, that it has a Global Interpreter Lock that if taken will prevent
anything from modifying anything it thinks it owns, and so on. Go assumes that
it has the Go GC running (despite both "having GC", there's enough differences
that it must be specified as a difference), that its objects are laid out in
certain manners such that most field references are compiled down to static
offsets rather than dynamic lookups, that it can run its core event loop and
dispatch out work to its internal goroutines without asking anyone else, etc.

You could go on for quite a while; I don't intend those as complete lists. I
just want to convey the flavor of conceptualizing the runtime in terms of
assumptions that the code running in that runtime can make.

Once you look at it this way, it should be more clear why trying to jam two
runtimes into one OS process gets to be tricky. I use the word "jam" quite
carefully, because it always feels that way to me. The more differences
between the assumptions of the two runtimes, the more translation the code is
going to need. For instance, Python to anything else is going to involve
unwrapping the data from the internal PyObject wrappers, and wrapping anything
coming back from somewhere else back into PyObjects. Threading models have to
be matched up. Memory layout has to be harmonized. Memory generally has to be
kept strictly separated, because the two runtimes both expect to be able to
manage memory, so you can't hand memory allocated by one of them to the other,
which further implies that you're almost certainly copying everything across
the boundary. Etc. etc.

I'd also separate out the way there can be differences in the _affordances_ of
the languages. For instance, Python doesn't have what Rust or Go would call
"arrays". Rust and Go are fine with getting arrays of pointers, but the
languages afford the use of memory-contiguous arrays without pointers, so
especially if you're integrating with a third-party library, you have no
choice but for some layer somewhere along the way to convert Python lists into
the correct sort of array. The runtimes technically don't force this, but the
structure of the libraries and code afforded by the other languages do. By
contrast, if you were integrating with lisp, you might find many points where
you need to turn things into singly-linked lists, again, not because Lisp
can't handle arrays, but because you're likely to encounter pre-existing Lisp
code that expects Lisp cons lists.

As another example, despite the fact Go and C generally see eye-to-eye on how
to layout structs, the C support from Go is still extremely expensive due to
the need to convert from how Go sees the concurrency world to how C sees the
world. C, contrary to popular belief, actually does have a runtime, and that
runtime tends to assume it has very deep control of the OS process it is
running in. Go has to do a lot of work to isolate the running C code in an
environment it is comfortable with, where it won't be pre-empted by the green
thread code (on account of the fact that it _can 't_ be, C doesn't support
that). There's also some tricksy code you may need to write to harmonize C's
memory-management-via-malloc model with Go's "lifetimes determined via the GC"
model. (If you listen carefully, you can hear the Go runtime go "klunk" every
time it runs cgo code.)

Rust has a runtime too, but unlike a lot of languages, it has the ability to
shut it off. You lose some services and capabilities, but on the upside, you
significantly reduce the number of assumptions the Rust code is making, making
it easier to integrate with other runtimes. (I say reduce because technically,
it still doesn't make it to zero if you are precise enough in your thinking,
but I'd expect that of all the current "cool" languages, Rust with the runtime
off probably makes fewer assumptions than anything else.) That said, I'm not
sure if this code is working in that mode. I see the rust code doesn't
directly turn off the runtime, but I don't know what that "#[macro_use] extern
crate cpython;" line fully expands to. It's possible that the full Rust
runtime is still in play, which looks enough like C anyhow (by explicit design
of the Rust team) that Python's existing C integration can just be reused.
Either way Rust is still making many fewer assumptions that Go's relatively
heavyweight (in terms of assumptions moreso than resources) runtime.

~~~
TuringTest
_> C, contrary to popular belief, actually does have a runtime_

I've been left wondering what you meant by this. Are you referring to the
stack and heap management? Or OS processes and threads?

If not, could you please explain what you mean by C runtime, and how does Rust
differs from it when it is shut down??

~~~
jerf
"could you please explain what you mean by C runtime,"

There's two components to the C runtime, what is specified by the C standard,
and what is specified by POSIX and the operating systems. I am not
sufficiently familiar with the C world to tell you exactly which thing is
defined in which part. Fortunately, for this discussion of how integrating C
code into another runtime goes, it doesn't really matter.

The C runtime includes the assumption that there is a malloc-compatible memory
allocator available (note it's swappable), the process of linking programs
when they start up and the whole surrounding "symbols" they can obtain. It has
certain assumptions about what state needs to be saved when a function is
called; for instance, it won't save the flags on the processor controlling
IEEE FPU conformity. Function calls have a "stack" and there's a "heap", and
the language itself distinguishes between them. C itself, IIRC, has no
specification for threads whatsoever, but the OSes seem to have converged on a
fairly similar model that could be fairly called part of the runtime now.

It's hard to "see" the C runtime because it has won so thoroughly that it just
looks like "how computation is done", or is so deeply integrated into the
operating system that it forces parts of the model on everything that runs on
that OS. You kind of have to piece together what C does by looking at what it
does that other languages do differently. Yes, most programs at some point
will do some linking and symbol resolution, but once the interpreter has
started up, dynamic languages have no concept of a static symbol table.
Loading another Python module doesn't even remotely resemble loading a C
library, either at startup or dynamically later. The _language_ Go doesn't
have a stack or a heap. The _implementation_ does for practical reasons, but
the _language_ does not. Most other languages now will save the same things on
the call stack as C, but that's not a requirement of computation; you could
save a lot more of the processor's state, but it'll trash your function
performance to do it. A "stack" and "heap" model is not necessary; Haskell for
instance does not have a clear "stack" at all. (It does stack-like things,
certainly, but it turns out getting what most people call "a stacktrace" from
the runtime is actually fairly hard. I believe still not possible on GHC.)
There are alternate methods for threading, including models that still use the
C-style threads under the hood but include mandatory code to be run at startup
and shutdown to be "part" of the runtime.

C is not as thin as it looks; it's just that history has made it appear to be
the baseline. And as I know my internets, let me say that nothing in this post
is criticism. Something has to be the baseline. While I think the C baseline
is getting long in the tooth, it won for a reason, and I don't know that we
could have gotten much better from the 1970s. (The other competition usually
cited was either a performance non-starter (the Lisp of the time), or had it
survived for 40+ years, we'd be able to write a very similar post about how it
is getting long in the tooth too in 2017 (Pascal, for instance).)

~~~
steveklabnik
A great answer!

> I am not sufficiently familiar with the C world to tell you exactly which
> thing is defined in which part.

I've got some bits of knowledge here. I could be wrong, as it's not my
expertise...

> Function calls have a "stack" and there's a "heap", and the language itself
> distinguishes between them.

I don't believe this is true or at least, not literally but the details are
interesting! [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1548.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) is what I usually go by when
talking about C11. Malloc is defined in 7.22.3.4, and says:

> The malloc function allocates space for an object whose size is specified by
> size and whose value is indeterminate.

In 7.22.3, the overview for all the memory functions, it says stuff like

> The lifetime of an allocated object extends from the allocation until the
> deallocation.

which restricts how you can implement it, of course, but it doesn't use the
words "heap" and "stack" at all; "stack" is never mentioned in the document.
6.2.4 talks about storage durations, this is usually what we think about when
we talk about "stack" and "heap" and such. "stack allocated" is more properly
termed "automatic storage duration" and "heap allocation" is "allocated
storage duration."

This is a side effect of the fact that C itself is defined in terms of a
virtual machine! They call it the "abstract machine".

Anyway, all of this is in service of your point about history and such. Many
people just assume all of this is how it has to be, rather than something that
came to be thanks to history. It's all very interesting!

> C itself, IIRC, has no specification for threads whatsoever, IIRC

C11 added this, actually, but before that, you're 100% right.

~~~
jerf
Thank you for the elaboration. Now that you remind me, I remember about the
C11. Which also adds "The C memory model" as part of the runtime, IIRC. Other
languages have different memory models. Usually simpler, though it's hard to
hold that against C11 since it was in the unenviable position of trying to
codify decades of implicit and divergent practice in one of the trickiest
places in software engineering.

------
onnnon
If anyone is looking for something like this for Ruby, check out Helix:

[https://usehelix.com](https://usehelix.com)

~~~
jD91mZM2
Awesome, thanks! It feels like a lot of people dislike Ruby, but I'm glad some
people still like it. I think it's a good Python alternative, and has syntax
that reminds you of Rust.

------
pmoriarty
What's the advantage of doing this over using cython or pypy?

~~~
fulafel
Cython is unsafe.

~~~
fermigier
It's much easier to learn Cython once you know Python. That's the biggest
selling point.

~~~
the_mitsuhiko
But the tooling is terrible in comparison. I find rust sigificantly easier as
a python developer than cython. There is so much more rust ecosystem to take
advantage of.

~~~
tdbgamer
To be fair to Cython, they have access to the entire C++ stdlib, so that's a
fairly good amount of tooling. The main thing is lacks is good documentation
and memory safety.

~~~
the_mitsuhiko
And C++ has absolutely no package distribution system at this point.

------
rochacbruno
Now we got Numba, Cython and Numpy results for comparison
[https://github.com/rochacbruno/rust-python-example#new-
resul...](https://github.com/rochacbruno/rust-python-example#new-results)

------
j_s
[https://news.ycombinator.com/item?id=14588333](https://news.ycombinator.com/item?id=14588333)
(beautifulsoup/lxml upgrade)

> _Python: interactive glue language between high performance C libraries._

Appreciate this walkthrough for Rust!

~~~
rochacbruno
How does it compare with
[https://github.com/servo/html5ever](https://github.com/servo/html5ever)
(someone with free time do run some benchmarks)

~~~
j_s
That's a great question, and fits nicely in the context of the current
discussion.

I think the primary claim to fame for this C-based
[https://github.com/kovidgoyal/html5-parser](https://github.com/kovidgoyal/html5-parser)
is serving as a drop-in performance boost for lxml (at the API level; it
parses invalid HTML differently/more consistently).

I too would be interested in a performance comparison to help decide which
project makes more sense for new projects. The existing Python layer in
html5-parser might give it a leg up if the language of choice is Python - is
there a similar project for the Rust-based html5ever?

------
pbreit
Shouldn't Python, et al have more "native" ways to achieve these sorts of
performance improvements?

~~~
rochacbruno
Maybe because of the existence of `Numpy` and `Cython` and `PyPy` + all other
`FFI` possibilities it is not on Python Roadmap.

