Note that Sam's no-GIL changes are not only about the removal of the GIL (although that was the main goal). There are a number of other unrelated improvements to make it faster. And as far as I know, most of those unrelated improvements have gotten into CPython now.
I'm worried about it. My understanding is that the GIL removal was still slower then with GIL, but that he added useful unrelated speedups to compensate. If the speedups get into 3.11 but no gil doesn't will they accept a significantly slower 3.12?
It seems like the core team doesn't share my sense that nogil is the single most important thing that Python needs to do in a world of many-core processors. If nogil is now blocked because it's deemed an unacceptable performance hit vs 3.11, I'll be very, very disappointed.
Most python code in the world is currently single threaded. It makes sense to me to start with the low hanging fruit that affects the majority of existing code and then move on to harder problems.
Sure, but Guido put a pre-condition on GIL removal years ago that it must not cause single-threaded code to suffer a performance penalty.
So Sam came up with a way to ensure that (on balance, if not in every example), we could have same-or-better performance and no GIL. Awesome!
But now we're getting the performance boosts without the GIL removal, and so in the next release, the GIL removal will cause a performance regression unless we can somehow find more performance boosts. It feels like this could just happen forever.
There's many more improvements to be made in single threaded performance, including things like JIT. Future improvements are complicated, they won't be a free lunch like this first round has generally been. My feeling is that things like JIT and nogil will start out as optional and work their way into the ecosystem over time where the performance implications are less severe for the general use case.
Right. But as you implicitly note, it's a work-around ("escape hatch"), not a feature.
eg. Unix had processes from day 1, but threads were added (with significant effort) because threads are a better abstraction for addressing lot of problems. Especially so when your CPU has lots of cores.
The work to make multiprocessing better is certainly useful, and valuable, but it's still a work-around.
It might make sense to have runtime configuration that enables no-GIL mode, or just enable it automatically when importing threading, so that non-threaded workloads don't suffer.
The faster cpython project isn't done after 3.11. they're working on many other improvements for 3.12 and we'll see other things possibly a JIT or at least an API to facilitate accelerators in 3.13+
I'm not worried about a lack of single thread performance increases for some time.
Yeah, while it in some regards is nice having multiple small deployments of our python app than can be scaled up and down independently, it's sometimes such an hurdle to something "extra" in a smaller app. Have a simple webserver and want to do some things out of band? In jvm I would use a concurrent queue and add tasks to it, and spin up some threads consuming that. In python you either spin up a new process and deal with the communication problems, become a master of greenlet and python internals, or most likely default to celery which need it's own deployment.
It sometimes boggles my mind how python is considered the quick and easy way for startups, while at the same time doing trivial things become such a hurdle.
> Have a simple webserver and want to do some things out of band? In jvm I would use a concurrent queue and add tasks to it, and spin up some threads consuming that. In python you either spin up a new process and deal with the communication problems, become a master of greenlet and python internals, or most likely default to celery which need it's own deployment.
This is probably risky in production anyway, because the load balancer (typically) has no insight into these background tasks and will happily kill a process/pod/etc that is running a background process. You should probably dispatch the workload to an external task runner (e.g., Lambda or a Kubernetes Job or similar) unless you really don't care if the background task gets killed mid-flight. (I've had a few dev teams ignore these warnings and then blame infrastructure when their background jobs got killed mid-flight occasionally).
Yes, that is true. My case here was about pushing some metrics somewhere, where losing some weren't an issue, but I didn't want to block main thread doing it.
But this single java app on some EC2 instance could do what you need 10 "apps" and possibly a complicated k8s deployment to handle with python. Some just because the raw performance of python is far worse, but most of it because of cases like this, where simple things can't be shared. So lots of unnecessary complexity compared to "old and verbose" java.
Another example is prometheus metrics. The current app I'm working on doesn't have a webserver. Which makes it really awkward in python to add prometheus. Since adding an endpoint to my app creates a new process and needs to be deployed almost as a sidecart, there is no smooth way to actually get the metrics from my main app to the endpoint that can be scraped since they don't share the same process.
it's a huge hurdle in any scenario where you have large, static dataset from which you want to derive computation-heavy results. Multiprocessing generally requires making copies of that data, which can be impossible. The workarounds (e.g. mmap) take an order of magnitude more engineering effort.
I agree 100% (I come from a "use all the cores with threads and shared memory to get linear speedups on many workloads" world). But I think at this point the python leadership has committed to the "every time the GIL-removal looks plausible, find some more single-core speedups to stave off the change". Put another way: I've given up on the hope that python will ever multicore the way I think it should multicore.
I'd be tickled if Python would singlecore the way I think it should singlecore. Unfortunately, it remains considerably slower even than JavaScript (never mind Go, Java, etc). A lot of bad choices that can't easily be undone without breaking compatibility for a handful of packages, and the Python leadership seems unable or unwilling to guide the community through those changes.
I had a similar problem last month where I needed to compute some results on multiple datasets (with a little bit of concurrency per dataset). In the end I just ported the code to Ada which makes it very easy to create task hierarchies. You can define variables of task types on the 'stack' and the program leaves the enclosing block when the variables can go out of scope after every task has finished. I can now easily put all cores to maximum use. The list comprehensions in Python became a little bit more verbose though because I had to use an older revision (2012) instead of the new 2022 revision.
I'm particularly happy to see the django_template test showing so much improvement. The implementation of Django templates has been a hot topic for speed improvements for along time (and why many people jump to Jinja). Its a good example of how "complex" python will be improved by these optimisations.
Two interesting ones to compare are pickle_pure_python and json_loads (both doing (de)serialisation activities), the former shows a large improvement, the latter much less so. Again it's showing how the improvements are coming to complex pure python code, json.load is mostly c whereas the pickle_pure_python test is run against the pure python Pickle implementation.
It would be interesting to see a benchmark based on the Django test suite, that would probably be quite indicative of the impairments we can expect in real world web server Python.
How this happened with PHP was interesting to watch from the side. The competition that came from Facebook's Hack/HHVM seemed to be the catalyst, and helped set expectations for some benchmarks.
I remember Perl 6 (now Raku) used to advertise itself as "the first 100 year programming language". I wonder if actually that could be Python. It seems like the update process has a lot of "taste", in that it adds features that are really useful and yet the syntax still looks clean and intuitive.
"the first 100 year programming language" is definitely JavaScript. (If thinking in terms of Perl/Python like scripting and versatility, otherwise clearly C)
Depends on what you mean by "100-year language". If it's in the pg sense (i.e. a language that's on the main branch of the evolutionary path of programming languages) I think the answer is clearly the ML family.
If you mean "what language is most likely to be around in 2100 AD", realistically we'll have C for as long as we'll have computers.
I was thinking probably Cobol will be first since it has at least 63 years already but apparently LISP is technically a year older. Fortran is a year older but I'm not sure it will last as long as Cobol/Lisp (it's losing out to python for simplicity and is likely to bleed a lot to c/c++/rust for science libraries/things that require faster glue code)
C 1972, C++ 1985, Erlang 1986, Perl 1988, Python 1991, JavaScript 1995.
C is half-way to 100: an amazing fact in the rapidly-changing world of technology. Of all these, Perl seems to have suffered the biggest loss of popularity so far, while Erlang has the least popularity to lose.
I'd be somewhat surprised if any of these are still in common use in 2072, but OTOH, I'd also be surprised if any of them were completely unused, with the possible exception of Perl.
Thinking about which languages might still be prevalent at the end of this century, Lisp and Prolog (in their respective niches) would be my candidates. Maybe also some form of C/C++, and Bash. Everything else I wouldn’t be so sure about. Languages whose typical software projects have a high turnover (like JavaScript) or are based on a virtual machine (JVM, CLR) are less likely to persist.
Well, those are at most op-codes. The language is significantly more complex and everything is extremely context sensitive. Plus, it requires one hell of an interpreter to get it to work.
Then again, if we're comparing it to Perl 6/Raku...
Unfortunately, there are no signs that CPython is moving towards integrating a JIT after all these years and despite its massive popularity. There's therefore no clear path towards substantially improved CPython performance, and it remains among the slowest mainstream languages, really the slowest among its top group of 4-5 most popular languages.
There's a lot of performance to be gained in cPython by improving the runtime. Also, one could argue that the quickening pass[1] they've added in 3.11 is a proto-JIT. Baby steps, I guess.
In the past, I've seen quite a few attempts at bolting on simple template JITs onto the cPython interpreter loop[2], with lacklustre results. They'll eventually need a JIT, once all the easy wins in runtime perf have been exhausted.
OTOH, I'm glad that runtime performance is finally getting the attention it deserves from the cPython core devs. This wasn't the case, just a few years ago.
JIT has been mentioned in the faster-cpython work though. But you're right to be skeptical, they are putting it off too ("it's not 3.11 and maybe not for 3.12"), and it's not something that can be added easily between two releases.
I don't think it's that much of a problem. Python fills a niche where performance is not a huge issue. When you compare languages, you look for a balance between how much development costs, what's your time to market, and how much money you'll spend in hosting. Time to market and cost of development usually are more important than hosting costs unless you know you'll be deploying to hundreds or thousands of machines. Most apps never grow that large.
I think that's actually "actual parallelism", not concurrency. Concurrency is multiple processes making forward progress using a single processor and some form of blocking/switching, while parallelism takes advantage of multiple cores. Personally, I consider concurrency a degenerate form of parallelism limited to multiplexing processes on a single core.
A lot of credit for these improvements come from that gil removal effort. I very very much hope we get it or a well supported branch with those changes.
I believe some of these improvements were part of the same work to eventually remove the GIL (global interpreter lock), albeit not tied to it. However, the final decision on removing the GIL hasn't been made.
Ah yes, this is what I thought was the current state. I was curious if a GIL removal decision had been made that I had missed, but seems like it's still a soft "maybe":
Edit: And to be fair, you can do "true" concurrency in Python you just have to eject to Multi-processing (with the tradeoffs that implies), so I didn't want to assume the comment was about GIL removal.
There was an experimental branch where someone reimplemented Python 3.9 sans the Gil and managed to maintain performance parity with the Gil’ed version. It’s likely this performance uplift is through the various optimisations that PR implemented.
Radical no, that would require to add JIT which is planned for 3.12 AFAIK. But there are two performance improvements after release of first beta already: https://speed.python.org/
I've been following what PRs the faster-cpython people merge and it's been nothing on the beta branch, I think. For example this PR is not getting merged, it seems. https://github.com/python/cpython/pull/93379
Install dependencies with apt (a few popular pages on google for this). Download tarball. Configure. Final step is ‘make altinstall’ and then you can use it standalone. If you already use virtual environments you would then recreate your venv and the rest of your experience would remain the same. I’d share actual syntax but I’m mobile right now.
Alternatively a tool like pyenv makes this much easier.
It means to run the “./configure” script first, which is normally present in the unpacked tar file. Also, on Debian, to compile programs, you usually first have to install the “build-essential” package.
https://mail.python.org/archives/list/python-dev@python.org/...