Damn! Is the rule of thumb really a 10x performance hit between Python/C++? I do...

vgatherps · on March 3, 2023

Outside cases where Python is used as a thin wrapper around some C library (simple networking code, numpy, etc) 10x is frankly quite conservative. Depending on the problem space and how aggressively you optimize, it's easily multiple orders of magnitude.

DiogenesKynikos · on March 3, 2023

Those cases are about 95% of scientific programming.

This is the first line in most scientific code:

    import numpy

intelVISA · on March 3, 2023

FFI into lean C isn't some perf panacea either, beyond the overhead you're also depriving yourself of interprocedural optimization and other Good Things from the native space.

dagw · on March 3, 2023

Of course it depends on what you are doing, but 10x is a pretty good case. I recently re-wrote a C++ tool in python and even though all the data parsing and computing was done by python libraries that wrap high performance C libraries, the program was still 6 or 7 times slower than C++. Had I written the python version in pure python (no numpy, no third party C libraries) it would no doubt have been 1000x slower.

ale42 · on March 3, 2023

It depends on what you're doing. If you load some data, process it with some Numpy routines (where speed-critical parts are implemented in C) and save a result, you can probably be almost as fast as C++... however if you write your algorithm fully in Python, you might have much worse results than being 10x slower. See for example: https://shvbsle.in/computers-are-fast-but-you-dont-know-it-p... (here they have ~4x speedup from good Python to unoptimized C++, and ~1000x from heavy Python to optimized one...)

dralley · on March 3, 2023

It can be anywhere from 2-3x for IO-heavy code to 2000x for tight vectorizable loops. But 20x-80x is pretty typical.

Yoric · on March 3, 2023

Last time I checked (which was a few years ago), the performance gain of porting a non-trivial calculation-heavy piece of code from Python to OCaml was actually 25x. I believe that performance of Python has improved quite a lot since then (as has OCaml's), but I doubt it's sufficient to erase this difference.

And OCaml (which offers a productivity comparable to Python) is sensibly slower than Rust or C++.

sva_ · on March 3, 2023

It really depends on what you're doing, but I don't think it is generally accurate.

What slows Python down is generally the "everything is an object" attitude of the interpreter. I.e. you call a function, the interpreter has to first create an object of the thing you're calling.

In C++, due to zero-cost abstractions, this usually just boils down to a CALL instruction preceded by a bunch of PUSH instructions in assembly, based on the number of parameters (and call convention). This is of course a lot faster than running through the abstractions of creating some Python object.

kaba0 · on March 3, 2023

> What slows Python down is generally the "everything is an object" attitude of the interpreter

Nah, it’s the interpreter itself. Due to it not having JIT compilation there is a very high ceiling it can not even in theory surpass (as opposed to things like pypy, or graal python).

codethief · on March 3, 2023

I don't think this is true: Other Python runtimes and compilers (e.g. Nuitka) won't magically speed up your code to the level of C++.

Python is primarily slowed down because of the fact that each attribute and method access results in multiple CALL instructions since it's dictionaries and magic methods all the way down.

kaba0 · on March 3, 2023

Which can be inlined/speculated away easily. It won’t be as fast as well-optimized C++ (mostly due to memory layout), but there is no reason why it couldn’t get arbitrarily close to that.

codethief · on March 3, 2023

> Which can be inlined/speculated away easily.

How so? Python is dynamically typed after all and even type annotations are merely bolted on – they don't tell you anything about the "actual" type of an object, they merely restrict your view on that object (i.e. what operations you can do on the variable without causing a type error). For instance, if you add additional properties to an object of type A via monkey-patching, you can still pass it around as object of type A.

kaba0 · on March 3, 2023

A function/part of code is performed say a thousand times, the runtime collects statistics that object ‘a’ was always an integer, so it might be worthwhile to compile this code block to native code with a guard on whether ‘a’ really is an integer (that’s very cheap). The speedup comes from not doing interpretation, but taking the common case and making it natively fast and in the slow branch the complex case of “+ operator has been redefined” for example can be handled simply by the interpreter. Python is not more dynamic than Javascript (hell, python is strongly typed even), which hovers around the impressive 2x native performance mark.

Also, if you are interested, “shapes” are the primitives of both Javascript and python jit compilers instead of regular types.

ale42 · on March 3, 2023

Other than this, dynamic typing is a big culprit. I can't find back the article with the numbers, but its performance overhead is enormous.

moffkalast · on March 3, 2023

Well at least 10x, sometimes more. Not really surprising when you think about that it's a VM reading and parsing your code as a string at runtime.

bombolo · on March 3, 2023

> it's a VM reading and parsing your code as a string at runtime.

Commonly it creates the .pyc files, so it doesn't really re-parse your code as a string every time. But it does check the file's dates to make sure that the .pyc file is up to date.

On debian (and I guess most distributions) the .pyc files get created when you install the package, because generally they go in /usr and that's only writeable by root.

It does include the full parser in the runtime, but I'd expect most code to not be re-parsed entirely at every start.

The import thing is really slow anyway. People writing command lines have to defer imports to avoid huge startup times to load libraries that are perhaps needed just by some functions that might not even be used in that particular run.

kaba0 · on March 3, 2023

> re-parse your code as a string every time

That doesn’t really take any significant time though on modern processors.

moffkalast · on March 3, 2023

Aren't those pyc files still technically just string bytecode, but encoded as hex?

bombolo · on March 3, 2023

Well bytecode isn't the same as the actual code you write in your editor.

benji-york · on March 3, 2023

As a long-time Python lover, yes that's a decent rule of thumb.

roflyear · on March 3, 2023

It is anywhere from 1x to 100x+.