Some context from one of the replies to that tweet:
> This is a graph of CPU utilization for the web services that power PyPI. Today we upgraded from 3.10 to 3.11 and saw a significant and correlated drop in CPU usage -- nearly half.
>
> The CPython team has been putting a lot of effort into improving performance recently and it shows!
If you're talking about the older approach to implementation CPython used to take then I must say if was pretty efficient for what it did (all while keeping the code clean). It took a 2-3 days deep dive for a moderately prepared programmer to understand ins/outs of the interpreter.
Optimisations coming to Python are a departure from the "clean-first code" approach. It's performance coming at a price of complexity.
And there's yet more speed-ups coming in Python 3.12! It'll support Perf profiler to allow looking into Python call stacks and see python functions in perf output and have even more improved error messages.
If you're interested in profiling, you should check out scalene. It's leaps ahead of every other profiling tool I've used in python and honestly might be the best profiler I've used in any language. It will give you per line results for mem, cpu, and gpu, tell you c time vs python time (numpy calls, etc), and it's faster than most other python profilers.
I am writing a Python AI book right now and I am using version 3.11. Since I am a patient person, I prefer conda to avoid dependency issues, so: conda create --name py3.11 python=3.11 -c conda-forge
Being a Lisp developer (for 40 years), I am finally developing some love for Python after years of having to use it at work for deep learning.
For me the better performance/lower resource improvements in 3.11 (looking forward to 3.12) is a big issue because I compare to Common Lisp performance, which can be very good.
Off topic, but my new found love for Python comes from realizing that many programming tasks can be done so much faster in Python.
The difference in speed is still huge, for example
sbcl
(time (loop for i from 1 below (expt 10 8) by 2 when
(zerop (rem i 3) count t))
this takes 0.131 seconds
versus python
import time
def gato():
start = time.time()
s = sum(1 if i % 3 == 0 else 0 for i in range(1,10\**8,2))
end = time.time()
print(s)
print(end-start)
gato()
takes 2.26 seconds, that is 20 times the sbcl version (I am using python 3.10.6)
You are right, since the post say that the performance of 3.11 can be 2x that of 3.10, the implication is that python 3.11 is ten times slower on loop related code than sbcl.
Anyway, a 2x improvement is a great feast for python 3.11, congratulations to the team.
it's a very small thing, but the booleans in python can be added like integers, with True being 1 an False being 0. So your comprehension could have been:
As someone that knows Python since version 1.6, that is what kind of always has irritated me.
Sure the language isn't Lisp, but it comes close enough, only to be let down by lack of performance focus, and PyPy never got the spotlight it deserves.
So I kept using Python only for sysadmin stuff when targeting UNIX like workflows.
Finally, it is looking like I can eventually some day reach out for Python in tasks where performace matters.
> Off topic, but my new found love for Python comes from realizing that many programming tasks can be done so much faster in Python.
There's a "freedom of expression" that python has that other languages such as Java/Go lacks. C++ can have similar expressive power by overloading operators, then this was of course eschewed by Java and Go. Go and Java provide a framework for writing code their way. Python doesn't insist you use OOP methodologies, if you don't want (e.g.)
I built a data conversion system in python using the ">>" operator to simplify translation of data.
For large data structures this becomes tedious with if/elif/else statements and null checking for a small piece of data. Particularly if you're going to translate one large data structure to another.
I think you got downvoted for the operator overloading but as long as it's used with intention you're fine. Python itself abuses bitwise or for types, everyone's darling SQLAlchemy overloads all the comparators to make filter expressions. Testing libs also overload comparisons to make matchers.
As an experienced Python dev... Python's "convenience" and "soeed of iteration" completely falls apart when you have more than half a dozen people working on the same codebase of varying development experience. You spend so much time digging yourself out of abuses of internal APIs.
Microservices is the answer. You limit the Python code to completely separate 3k line programs that talk over an message queue. If a junior screws up a service just bin it and rewrite it.
Ruby is definitely great and I would have totally agreed a couple years ago, but Elixir has blown my mind. It feels so much like LISP sometimes I wonder, especially with the full macro system where you can operate directly on the AST :-D
Given the recent news on Nx and Livebook, Elixir may be a first-class language for ML very soon.
On a related note, Elixir feels a bit like a functional love child of ruby and python.
I wonder is there is something similar to the expert system powerloom in python. I hope you get some inspiration from ARIMA (Norvig Book with python code). Also using ontologies with triples in python should be slow for inference.
Unrelated, but I'm wishing to learn lisp myself. I started with Cloture since I have experience with the Java ecosystem. What resources do you recommend for learning Lisp?
I´m loving the Faster CPython project. Just for a reference, I have a project originally written in (very optimized) Python, that has a Rust module for the demanding path. The Rust version was approximately 150% faster in Python 3.10. In Python 3.11 the gap reduced to 100% faster. This is something incredible as I would prefer to keep it all in Python.
Have you tried running it in pypy? Or using @njit in numba? I've found both to be faster than cpython. Of course there's also cython if you're brave but that tends to be a lot more work.
I just use pip-tools and vanilla pip. Neither poetry, nor the rightfully maligned pipenv did anything to resolve this. Plain pip is extremely fast and has all the features I need and pip-tools' dependency resolver is fast enough that I have had no problems
poetry can add some headaches (specifically, occasionally breaks after a version update on the wrong version of python on MacOS), but IMO is worth it for lock files and some of the other features it brings.
pip-tools unfortunately doesn't generate cross-platform lock files, so your locked requirements.txt may be different if you generate it on macos than if you generate it on Linux.
Edited to add: both Poetry and PDM do generate cross-platform locks.
Just generate a pip-tools lock file that includes ipython on both macos and Linux and you'll see differences. Only on macos does ipython depend on "appnope".
You won't run into it most of the time, but if you've got OS specific wheels you can have dependencies that require different requirements files for different OS.
Do you use virtualenv or something to manage different environments? That's the achilles heel of vanilla pip for me, having everything pollute global installs is not so cute.
"venv" (the modern version of virtualenv) is included in python3 out of the box¹ now, so creating a virtualenv is as simple as `python3 -m venv venv`. (I usually do use one to avoid polluting the global one / I let the system package manager manage the global install.)
¹but note that Debian & Debian derivatives shove several standard library modules into individual packages, so you might need to apt-get install more than just python3.
Absolutely. You should almost never be pip-installing global things (of course there are circumstances, but you should know exactly why you are choosing to do it, vs an env).
Vanilla pip works well with the venv module. I still manually manage the majority of my virtual environments, with the virtualenvwrapper libary, although `source path/to/myvenv/bin/activate` also works.
dependency resolution even with caching takes way too long for larger projects, which requires a lot of setup and configuration when managing multiple inter-related projects that have many dependencies
I really hope not. I used it for a while, but the “wontfix” attitude (e.g. PEP440 related issues: https://github.com/python-poetry/poetry/issues/6013, or the issues with pytorch cpu packages) made me go back to regular pip and use “pip freeze” for locking package versions in place.
I understand your frustration, but the poetry team's response makes sense. The situation you got yourself is quite strange. System package python environments are notoriously nasty (frankly they were a mistake, but they emerged before venv, so it's hard to fault).
Non-PEP440 compliant libraries are also a pain in the ass.
Put both those facts together and you are playing with fire. In all likelihood you will eventually footgun yourself and end up with a python env superfund.
Just use virtual envs for everything.
I say this coming from over a decade of python experience and lots of time spent installing python in exotic environments.
Oh, I agree in principle, but as a user it means that something that worked up until poetry<1.2.0 broke. I understand that it's not their fault if Python distributions and packages don't comply with PEP440, and it's up to maintainers to fix it. Unfortunately, similar issues affect quite popular packages such as PyTorch, where the computation backend/version (cpu, cuda) is added as a local specifier ("1.12.1+cu116"), leading to all kinds of dependency resolver issues in poetry.
The PEP440 issue you linked to is a positive for Poetry, not a negative. You shouldn't be installing packages into your distro-managed Python environment, that's just asking for problems.
This issue is not restricted to installing packages into your distro-managed Python environment. I had Poetry-managed environments that, after upgrading to poetry>=1.2.0 stopped working (e.g., I couldn't run "poetry update" anymore) because of these issues.
I don't treat Python packaging tools as my religion. If one doesn't fulfill its purpose, I use another solution. Whether that's a positive or negative for the tool isn't my concern.
Packaging has been increasingly improving, and I would dare to say that it is currently in a very good shape. The key problem is that of "blind scientists and an elephant": the term "packaging" means different things to different people, and details depend on specific circumstances. And Python has some quite specific details that are not present in many other languages, which adds to the complexity.
Without going into all the available solutions for all possible cases, here's a list of use cases that Python packaging needs to cover:
- dependency management and packaging:
- abstract for libraries, concrete for applications [0]
- pure Python dependencies
- dependencies with binaries included
- packaging format: sdist, wheel, conda
- library repositories: public (PyPI), private, mirrors etc
- installing the Python executable
- multiple Python executables
- management of (multiple) virtual environments
Multiple standards have been adopted to deal with many of those cases, but many existing projects still haven't been updated and still need to be supported. Although we currently don't have a single tool that will cover every single possible use case, things are slowly but surely moving in the right direction.
We converted all our projects to Poetry over a year ago and don't have any major complaints. Also nice moving all tool configs into a single pyproject.toml file.
for quick light development using pure python I pip install directly into my user dir of my system python.
for real work I use conda because it manages multiple versions of python well. Then I mainly use pip to install stuff, occasionally falling back to conda or conda-forge.
This has been the most reliable for me. I often have to trash my whole environment and start over after installing a few binary packaes like tensorflow.
> for quick light development using pure python I pip install directly into my user dir of my system python.
I'd advise against this habit, even for "light" development work. I've been bitten too many times by some library installing into the wrong env or picking up the wrong path of a library.
Conda's generally a good move if you don't want to manually manage virtualenvs.
In one project I have with a few hundred tests, after updating to Python 3.11 I noticed my test suite's runtime speed got faster by about 8-10%. That's using pytest in a Flask app where most tests are doing "real'ish" things like visiting one of the site's endpoints using the Flask test client and accessing Postgres. Not bad at all for a 1 line change.
This workload seems IO bound so python 3.11 shouldn't add too many speedups here, it adds mostly cpu-bound improvements. Surprised you are getting speedsup - any idea which parts?
I haven't done any low level profiling. I ran my test suite 5 times with 3.10 and then I did the same after updating to 3.11. It was an extremely informal test but 3.10 consistently took about 25 seconds where as 3.11 was finishing in 23.
It is I/O bound but it's all localized. Those URL endpoints aren't making external API calls. For tests, Postgres is set to use Session.begin_nested with SQLAlchemy which takes advantage of Postgres' ability to use SQL SAVEPOINTs[0]. The gory details are above my paygrade (I found it while Googling), but the end result is it makes Postgres able to be really really fast when accessing your DB in tests. I don't think it really writes to disk but it makes your app think it did and you get the "true" outcome of running the SQL (ids get created, all of your SQL works as expected, etc.). It's not a mock.
I have done some benchmark recently of Python 3.11, and pure Python code was even beating a Python-wrapped C++ implementation of the same data structure. While it's not an entirely apples-to-apples comparison (the reason to use C++ was to enable running without the global interpreter lock), it is still very impressive.
In the early Android days, Dalvik did quite a poor job generating native code, so many math functions went through JNI into native code.
Eventually Dalvik got replaced with ART, Android team started to care about JITing and AOT compilation, and all those native methods got deprecated as JIT/AOT code out of pure Java implementation started to be faster than the cost to jump through FFI infrastructure.
Python wrapping using binding tools like swig or cython or boost python are bound to be slower for things like data structures. They bring with them a lot of overhead between the python interpreter and the call into the data structure
That can only accelerate a very limited subset of Python code. For numerical calculations it looks promising, but for a codebase like PyPI's it's probably no use at all.
Right - I'd say Taichi was not designed to accelerate the entire Python codebase. In fact, since Taichi needs to run to GPUs, supporting a very wide range of dynamic features of Python would go in the opposite direction of its original goal (faster compute-intense code such as numerical simulations).
Back to the main thread, I'd say the acceleration in Python 3.11 would benefit Taichi users too - there are still parts of Taichi that runs in Python (such as constructing the AST of computation kernels), which will run faster with 3.11.
This introduces a completely new value stack which is only accessible from taichi and it takes a lot of hacking to pass python objects back and forth between taichi and python so anything you write, you will have to rewrite pretty much everything. It is not just adding a simple decorator to speed things up if your program is just a little more complex than summing numbers.
Wouldn't it also break most Python code that uses threading? Currently, everything is implicitly synchronized thanks to the GIL, so in Java terms it's as if every variable is declared "volatile".
If the GIL is removed and variables are no longer volatile (i.e, changes are no longer made visible immediately to other threads) that seems like it would break a lot of code. On the other hand, keeping every variable volatile seems like it would be terrible for performance.
Maybe I'm missing some critical difference between Python and the JVM here?
The GILectomy project was able to maintain thread safety on refcounting while (mostly) preserving performance by basically using some clever flags in the refcount to signal specific lifetimes/ownership (simplification for brevity). So you have good performance when the thread owns a reference while preserving safety for shared objects.
The smaller problem is this does at a small amount of overhead even to single-threaded performance. The gilectomy project improved performance elsewhere so net performance is close to the same.
The bigger problem would be integrating this strategy with all the libraries that rely on existing GIL behavior.
This comment demonstrates a tremendous amount of naivete about the Cpython runtime. After PyObject itself, the GIL mutex is probably the next most important data structure in the entire codebase. It's not "someone not being bothered to write thread-safe software." It's not something you can hide behind a flag. It's central to the entire cpython data model and any library which relies on releasing the GIL.
The closest anyone has come to removing the GIL is the Gilectomy project by Larry Hastings, and it's unlikely to ever be upstreamed unless it could be somehow made to work with libraries that rely on assumptions about GIL mechanics (eg numpy).
I guess it would have made more sense for me to say "Python C extension developers".
I don't find multithreading in languages without particular support easy at all, but I have become better at it. It is possible and sometimes necessary. It seems like the prevailing attitude in the Python ecosystem is weird, a kind of sour grapes thing, i.e. "Python doesn't have good multithreading support, but multithreading is ugly and error-prone anyway and the alternatives are almost as good or better".
Usually when I've needed more parallelization I've allowed more processes and for slow methods, there is threading available (this doesn't overcome the GIL but allows those methods to independently operate). It seems like the biggest reasons to focus on removing the GIL are single-process applications or machines where memory is constrained (so you don't want tons of processes consuming it all). Are you in one of those situations or is there another scenario that is impacted by the GIL?
Guido discussing this recently on Lex Friedman's podcast. tl;dw, he is open to the idea, he thinks it will be painful, he isn't convinced the demand is high enough yet
.
I believe the plan is still that the improvements in 3.11 are to be traded off against the overhead of eventual GIL removal. So, don't enjoy them too much.
Well, it might be accurate for the industry in general (e.g. going to higher level languages), but I don't think it's particularly the case with Python.
It's not like Python devs build abstraction towers beyond what the language already has an encourages (unlike, say, Java which has a well deserved reputation for this kind of towering abstractions, or say, web development).
If anything, a lot of Python effort is on Data Science and scientific python, where wrapped C/C++/fortran libs for maximum speed/minimum memory is the order of the day.
This is great news - we're deploying some new AI models and this coupled with Ruby 3.2 (yjit) I beleive we can deliver for the first time in a few years a new feature supported by new servers without increasing costs (we'll see, but i have hope)
Initial tests in a medium size codebase shows a 15% speed improvement, both in speed and some platform benchmarks. which is great. kudos to the team working on this.
I’m still unsure I understand the source of excitement. As far as I know Python even in its latest version is still at least an order of magnitude slower than native compiled code. It would seem to me that a native Python compiler would benefit server side Python applications much more than these incremental speed improvements of the interpreter. Is there any work being done by the Python community in that direction? For CPU bound workloads Python needs 10-100x improvement in order to be competitive.
Note, that I’m not dumping on Python. I use it daily but I think it’s likely a poor choice for apps that primarily tax the CPU.
There have been a bunch of attempts to produce compiled code for Python, but each have fallen short in different ways. However there is a serious bit of effort going into speeding up CPython now. I don’t think it’ll result in turning it into an optimising compiler that produces a binary executable as output or even an interpreter w/ JIT compiler, but I think there are still gains to be made without committing to either of those routes
Lots of people have already written programs in Python. Now their programs will run a little faster without any work. It’s like the barista saying, “coffee’s on me today.”
I don't think that's ever been a goal of Python, and it's not what people use (or should use) Python for.
Making Python faster is great because it improves every existing use case for Python, not because it now makes Python a good choice for situations where it wasn't previously a good choice.
I develop scripts, and a few minor but important-to-my-business scientific apps in Python due to library support. I'm quite excited. We already wrangle as much performance as we can out of our approaches, so anything that speeds up actual execution is awesome.
If someone is using a component that important that far outside its EOL it isn't a performance issue. It's either something extremely legacy, massively underbudgeted, or maybe non-internet-facing and non-critical. It isn't the kind of project someone is fine tuning for high performance.
Guido stepping down from BDFL may indeed be what allowed the recent speed improvements to progress; but certainly not in the way you're implying. If there is a causal effect here it seems more likely that it's because it has allowed him to concentrate more on the speed improvement work than he otherwise would have with BDFL issues taking up his time.
From what I've seen Guido has been quite actively involved in the speed up work, somewhat in the actual development but even more so in getting it merged (a problem many previous speed up attempts have failed at).
This is baseless conjecture and completely without merit. Guido was intimately involved with this speed-up work. I've disagreed with Guido over things but the things you're saying are just weird. I recommend this interview to get a slightly better glimpse of the guy: https://www.youtube.com/watch?v=-DVyjdw4t9I
He is working for microsoft who is funding this with the idea 'make this better'. He is bouncing around the idea of 'gil free' but how do you do that an not make another 2->3 mess? There are a set of interviews on YT right now with him talking about exactly what they are doing.
And I can't find the sources. But I recall at least a few articles about Python performance improvements where the conclusion was along the lines of: "Python core devs prefer to keep the Python runtime implementation simple instead of focusing on performance". It's stuff I've read in the past 15 years or so...
And it wouldn't be unheard of, compare Tannenbaum - Minix <-> Linus - Linux.
What's disturbing is the numbing effect it seems to have. Bad behavior is increasingly on display for all to see -> We're all slowly realizing it's always been this bad, it's just more visible now -> We're kind of just becoming OK with all the noise.
Cf. blatantly wrong statements divorced from reality from major figures on social media, etc.
That said, I'm not sure whether OP is really in that weight class. It's just a "did direction change with a leadership change?" question that can be answered in a straight-forward manner by closer observers or those with first-hand experience. The danger is that others take a possibly wrong narrative forward by taking the question and ignoring the answer, in a "Have you stopped beating your wife?" sort of way.
Yes, I clearly remember him working with Mark on this one because he liked the idea of generating more efficient bytecode rather than more complex bytecode. Certainly not being a hinderance in this case.
> This is a graph of CPU utilization for the web services that power PyPI. Today we upgraded from 3.10 to 3.11 and saw a significant and correlated drop in CPU usage -- nearly half.
>
> The CPython team has been putting a lot of effort into improving performance recently and it shows!