Damn! Is the rule of thumb really a 10x performance hit between Python/C++? I don’t doubt you’re correct, I’m just thinking of all the unnecessary cycles I put my poor CPU through.
Outside cases where Python is used as a thin wrapper around some C library (simple networking code, numpy, etc) 10x is frankly quite conservative. Depending on the problem space and how aggressively you optimize, it's easily multiple orders of magnitude.
FFI into lean C isn't some perf panacea either, beyond the overhead you're also depriving yourself of interprocedural optimization and other Good Things from the native space.
Of course it depends on what you are doing, but 10x is a pretty good case. I recently re-wrote a C++ tool in python and even though all the data parsing and computing was done by python libraries that wrap high performance C libraries, the program was still 6 or 7 times slower than C++. Had I written the python version in pure python (no numpy, no third party C libraries) it would no doubt have been 1000x slower.
It depends on what you're doing. If you load some data, process it with some Numpy routines (where speed-critical parts are implemented in C) and save a result, you can probably be almost as fast as C++... however if you write your algorithm fully in Python, you might have much worse results than being 10x slower. See for example: https://shvbsle.in/computers-are-fast-but-you-dont-know-it-p... (here they have ~4x speedup from good Python to unoptimized C++, and ~1000x from heavy Python to optimized one...)
Last time I checked (which was a few years ago), the performance gain of porting a non-trivial calculation-heavy piece of code from Python to OCaml was actually 25x. I believe that performance of Python has improved quite a lot since then (as has OCaml's), but I doubt it's sufficient to erase this difference.
And OCaml (which offers a productivity comparable to Python) is sensibly slower than Rust or C++.
It really depends on what you're doing, but I don't think it is generally accurate.
What slows Python down is generally the "everything is an object" attitude of the interpreter. I.e. you call a function, the interpreter has to first create an object of the thing you're calling.
In C++, due to zero-cost abstractions, this usually just boils down to a CALL instruction preceded by a bunch of PUSH instructions in assembly, based on the number of parameters (and call convention). This is of course a lot faster than running through the abstractions of creating some Python object.
> What slows Python down is generally the "everything is an object" attitude of the interpreter
Nah, it’s the interpreter itself. Due to it not having JIT compilation there is a very high ceiling it can not even in theory surpass (as opposed to things like pypy, or graal python).
I don't think this is true: Other Python runtimes and compilers (e.g. Nuitka) won't magically speed up your code to the level of C++.
Python is primarily slowed down because of the fact that each attribute and method access results in multiple CALL instructions since it's dictionaries and magic methods all the way down.
Which can be inlined/speculated away easily. It won’t be as fast as well-optimized C++ (mostly due to memory layout), but there is no reason why it couldn’t get arbitrarily close to that.
How so? Python is dynamically typed after all and even type annotations are merely bolted on – they don't tell you anything about the "actual" type of an object, they merely restrict your view on that object (i.e. what operations you can do on the variable without causing a type error). For instance, if you add additional properties to an object of type A via monkey-patching, you can still pass it around as object of type A.
A function/part of code is performed say a thousand times, the runtime collects statistics that object ‘a’ was always an integer, so it might be worthwhile to compile this code block to native code with a guard on whether ‘a’ really is an integer (that’s very cheap). The speedup comes from not doing interpretation, but taking the common case and making it natively fast and in the slow branch the complex case of “+ operator has been redefined” for example can be handled simply by the interpreter. Python is not more dynamic than Javascript (hell, python is strongly typed even), which hovers around the impressive 2x native performance mark.
Also, if you are interested, “shapes” are the primitives of both Javascript and python jit compilers instead of regular types.
> it's a VM reading and parsing your code as a string at runtime.
Commonly it creates the .pyc files, so it doesn't really re-parse your code as a string every time. But it does check the file's dates to make sure that the .pyc file is up to date.
On debian (and I guess most distributions) the .pyc files get created when you install the package, because generally they go in /usr and that's only writeable by root.
It does include the full parser in the runtime, but I'd expect most code to not be re-parsed entirely at every start.
The import thing is really slow anyway. People writing command lines have to defer imports to avoid huge startup times to load libraries that are perhaps needed just by some functions that might not even be used in that particular run.