I've never been able to find a good use case for pypy. Why would you ever want to run CPU bound tasks in python? I'm sure there are arguments about the huge ecosystem, not having to rewrite code in another language, etc. But most of the widely used stuff in python already has underlying compiled code for the heavy lifting. Also, given the large amount of parallelism in modern hardware architecture, and python's lackluster concurrency support, I just don't see a reason for using it.
I used Luigi [1] to automate data processing at a previous job. It's a simple job queue with a UI. You request jobs from it, and then run them for minutes or hours, so it shouldn't normally be a bottleneck and it makes sense to use a language that's quick and easy to write.
It's written in Python and works fine to process thousands of jobs per day. Once you start having tens of thousands of jobs in the queue, it gets slow enough that it can back things up. This compounds the problem, eventually resulting in the whole thing crashing.
By switching the interpreter to PyPy, I was able to keep the data pipeline running at that scale without having to rewrite anything.
> Why would you ever want to run CPU bound tasks in python? ... But most of the widely used stuff in python already has underlying compiled code for the heavy lifting.
Haven't you just answered your own question? We know that people want to run CPU bound tasks in Python so much that they went to the effort of writing native modules because they couldn't do it in Python.
> python's lackluster concurrency support
This is a common misconception - Python actually has fully concurrent threads already.
Not fully. The GIL is always held when executing Python bytecode, because it isn’t threadsafe. Dropping the GIL (parallelism) only happens in native code that explicitly drops it.
When a Python thread is holding on to the GIL (running Python bytecode), how many other Python threads can concurrently run in the same process?
The answer is zero.
Sure the interpreter releases the GIL every n bytecode ops, and C extensions can release the GIL before doing anything IO bound and reacquire it (i.e wait for it) afterwards, but that isn’t full concurrency, in my books.
I think you're describing parallelism. The thread is about concurrency. I think you'll find this definition of concurrency matches industry standard definitions like Padua.
This is a commonly held misconception. Concurrency actually implies that computations can be reordered without changing the final outcome, and does not imply parallel execution. This is related to parallelism in that concurrent computations can be run in parallel for speed up.
> When a Python thread is holding on to the GIL (running Python bytecode), how many other Python threads can concurrently
You've been tricked by jargon. It's a common misconception. In this context, "concurrency" has a specific meaning that is different from the everyday one you're using in this sentence.
For a good introduction to what the two words mean in a software engineering context, check out this written version of Rob Pike's talk, "Concurrency is not Parallelism."
> Concurrency is composition of independently executing things. . . Parallelism is simultaneous execution of multiple things.
When Python's threading model was implemented, parallelism just wasn't much of a concern. CPUs had a single core and could therefore only be working on one thing at a time. (In a macro sense; pipelining and supersclar architectures were still a thing, but not super relevant here.) Multithreading was not a way to do multiple things at once, it was just a way to ensure that some long-running calculation would not cause the program to lock up by, e.g., preventing it form responding to event queues in a timely manner. This was done by, not by running things in parallel, but by switching back and forth among them them very quickly.
Python's GIL was designed for this kind of situation. It's there to ensure that nothing bad happens if one of those context switches happens in the middle of a sensitive operation. Which is, strictly speaking, a concurrency concern and not a parallelism concern.
(It's also possible to have parallel work that is not concurrent, in which case locks are not necessary. But just because it's common for parallelism and concurrency to co-occur does not mean that they are the same thing.)
I guess I incorrectly understood “fully concurrent threads” to mean threads that actually run in parallel, like Java threads. The redundant word “fully” threw me off; apologies.
A well-tuned web application layer is CPU bound at scale - your database is well designed and not a source of latency, and the lack of concurrency support in a language doesn’t matter if the interpreter is so much slower than context switches, which is absolutely true for CPython.
There are many sites and services where a rewrite in a new language is just not viable, and I still would recommend Python-everywhere to startups doing things remotely associated with data. So PyPy would be a tide that would lift many boats.
I switched a high traffic Flask web app to PyPy a couple of years ago and we saw substantially faster response times across the board, and much higher task throughput from our background worker machines, many of which were pegged 24/7.
We had so much less baseline load afterwards that we were able to scale down a bunch of instances. The transition only took a few hours of effort fixing one or two incompatible dependencies, so it paid for itself in savings quickly, especially vs an approach of trying to rewrite the slowest bits in a faster language.
> Why would you ever want to run CPU bound tasks in python?
0. Because you don't have time to deal with the mess that C++ has become, and the amount of please-repeat-yourself-a-million-times crap you have to deal with (cmake files, header files, they give us a goddamn spaceship operator but not basic necessities like string split/join methods)
1. There are many use cases where faster execution time is nice to have, e.g. when results can be cached for a long time, or if it's a one-off data analysis script, but human time is far more expensive. If it costs $1000 more in engineer hours to write C++ instead of Python for that script that's only going to be run 10 times, that isn't a worthwhile tradeoff. Hell you could buy a new GPU for that money.
2. Because the same exact file can be deployed on arm32, arm64, and x86
3. Because CPU-intensive stuff is largely already optimized by numpy, numba, tensorflow, pytorch, etc.
I can either spend a lot of time rewriting my already tested and working code if my application scales to the point of hitting a CPU bottleneck OR I can just try using PyPy.
Python makes it quick to write code, test it and get it out the door.
Other languages don't have that same cycle, and Python has a LARGE amount of freely available packages that are able to help launch even quicker by not having to necessarily write everything oneself.
So at that point it comes down to what are you fastest in?
Python given it's eco-system of packages is quick to write functioning code in. It's dynamic nature allows for quick prototyping and re-factoring without requiring massive pre-planning and cognitive overhead.
I've done it for one-off data munging scripts - processing archives of one format of data to another, etc. Python is easy to write and for these tasks, you can get PyPy to execute at twice the speed for basically no cost.
Strictly speaking, "CPU bound" is not an adjective that can be used to describe tasks. It's one that describes a particular program solving a particular task under a particular configuration. I've done no analysis on this myself, but I would be more than willing to believe that a CPU-bound job might be only a CPython-to-MyPy's worth of speedup away from instead being memory- or disk-bound.
It can be hard to tell the difference, too, since being stalled out while waiting on the memory controller shows up as CPU activity in htop.
The Intel vtune profiler [0] bolts on linux's perf subsystem and offers a very nice way of assessing if the cpu is stalled on memory (or cache) or os spending its time computing. I guess is a nice GUI on (nowadays) standard Linux tracing interfaces, but I really did not dig enough.
If you are after deep profiling, you should definitely give it a try. My recollection is totally positive.
As others mentioned, it's generally an easy choice to write one-off data processing & analysis stuff in. I can get all the multithreading support I need with the multiprocessing library.
For me this always just consists of reading a bunch of files in and then doing some basic aggregation on them. Years ago I benchmarked Python vs. PyPy and found no real benefit to it.
Here is the link to that if you'd like to read it:
Code can end up being CPU bound that wasn’t meant to be. For example, having to use on a database driver or some other low level library written in Python can make your web app pretty CPU-bound. The main issue with PyPy is compatibility for me.