
data processing performance with python, go, rust, and c - nathants
https://nathants.com/posts/data-processing-performance-with-python-go-rust-and-c
======
mattbillenstein
I've had tasks where we processed a lot of gzipped jsonl - I usually wrote
tooling in Python as it was easy, then I had the option to use pypy.

Pypy pretty consistently gave me a 2x speedup as you're seeing here without
any other work and a few experiments I did in golang would give me another 2x
- I had expected more.

So, I just settled on keeping doing it in python and using pypy where it
mattered - I just enjoyed writing Python more than writing golang - ymmv.

~~~
nathants
i agree, pypy is really good.

most interesting to me is how it behaves very differently than cpython when
trying to avoid allocations and optimize. these kind of changes where x2
faster with pypy, but way slower in cpython. similar wins with go, rust, and
c.

things like:
[https://github.com/nathants/bsv/blob/master/experiments/cut/...](https://github.com/nathants/bsv/blob/master/experiments/cut/cut.py)

~~~
mattbillenstein
Yeah, it's a lot of work to try and avoid copies and allocations.

cpython I think comes from a very different era where cpu speeds were much
closer to memory speeds and you had a lot less latency between the two in
terms of cpu cycles. Like reference counting would probably not be a design
you'd start at today if you were designing a new runtime.

That being said, it's kinda amazing to me that cpython is only 4x slower in
some of these benchmarks - and you can actually get within 2x with a different
runtime that has the same guarantees and a JIT. It makes me think you have to
have really good reasons not to choose Python for a lot of tasks given
developer productivity and maturity of the runtime and libraries.

~~~
nathants
or choose both and compose with pipes! good for prototyping, maybe escalate to
a single executable when the design stabilizes.

~~~
mattbillenstein
Leads into serialization/deserialization and allocator overhead when using
pipes.

I usually went the route of parallelizing using the shell (xargs -P) or using
multiprocessing. I typically had tens to hundreds of thousands of objects in
S3 to process.

But one neat trick that is pipe-like us using subprocess to do the gunzip step
-- subprocess.Popen('gunzip -c ' \+ filename, ...) -- this would offload work
from the python process and give me a decent wall clock speedup.

