Lots of C library code for decades carried man page warnings it was unstable for use in async, re-entrant and recursive contexts. We learned how to cope and incrementally re-entrant safe versions deployed without too much API instability. Maybe time has healed wounds and caused me memory loss of the pain of discovery you'd tripped over them.
String parsing which tokenised in-place. DNS calls which used static buffers. Things which exploited Vax specific stack behaviour.
I remember scouring those C runtime docs, for every non-reentrant function. It might be what got me in the habit of checking docs when using some API that I know moderately well, just in case there's some important detail I missed before, or something had changed.
Around that time, doing cross-platform C++, I got an early look at Java, with concurrency built in from the start, along with GC and various other nice features that were easier to use than C++, and I "knew" it was going to be huge. (But who knew that the MIS people would take over Java, when it seemed clearly targeted at non-MIS programmers, and now MIS people are stuck with the C++ syntax and verbosity, after coming from 4GLs, etc.)
Then mainstream programmers picked up Python, which, IIRC, originally was an embeddable extension language, which was why it was simple. And for which the GIL made more sense.
Python has many similar restrictions even in the language itself, they just tend to be exceptionally poorly documented.
For example, you can't safely use print in a Python signal handler, because the entire I/O system is not reentrant [1], yet the documentation shows doing this in a "how to do it properly" example [2] while explaining that it's horribly unsafe to throw from a signal handler, which is the default for SIGINT. (And also the reason why the threading module doesn't expose PyThreadState_SetAsyncExc). So despite appearances to the contrary (since the Python signal handler is invoked by the VM, it doesn't actually run in a signal handler context; the signal module registers a C handler which simply sets a flag that is checked by the VM every instruction), you should probably only do in a Python signal handler what you would do in C, which is very little. Set a flag or write to a pipe. Don't go around and call library functions.
That's fine. But, communication is a skill and like all skills, can be improved. For my part, this is the first time I can remember seeing MIS used in the wild (i.e., not in university). Given the tech industry's propensity for overloading initialisms and redefining terms, I have to go by context to work out the meaning. In this case, the context was insufficient.
I suppose I could just dismiss the comment, but curiosity got the better part of me. I'm glad someone asked. I also had to search what "4GL" means. I'm assuming this top Wikipedia hit [1] is what the OP meant.
I don't know if it's reasonable to expect people to know these terms. But, the concise text is not saving anyone beyond the OP any appreciable amount of time, so what's its value?
Speaking of which I never fully understand the meaning of OP. It seems there’s a few semantic ranges there and people sometimes use OP standing for slightly different things.
I agree that Rust handles concurrency about as well as Java (in addition to its memory safety). But could you provide docs on Erlang or Haskell concurrent data structures that were better than Java’s, say, 10-20 years ago? Java’s support for concurrent data structures was pretty unprecedented in its heyday.
Java still doesn't provide persistent data structures out-of-the-box. IMHO these are more important than lock-free mutable data structures.
When one really does need shared mutable state, Haskell supports transactional updates to mutable state (STM), with optimistic locking and rollbacks. It's awesome, but unfortunately just not practically possible in any language with rampant untracked side-effects.
Compared to those other languages, Java has no mechanism to enfore the use of concurrent data structures, it's entirely up to the programmer to use them.
Erlang processes are a first class object. Data is immutable. There's a special syntax for sending and receiving messages between processes.
I would say that in some ways this is simply incomparable to java but I think this concurrency model is probably the easiest and simplest to work with of any concurrency model.
There are obviously drawbacks like with anything but for a lot of situations it's an excellent choice.
> But Java was still better at it than C or C++ at the time.
That's not even a contest. Java was released in 1995. At that time C did not have any kind of support for concurrency, all solutions were platform dependend and not part of the language.
This is not true. JSR-133 was one of the first and most prominent implementation of memory model among general purpose languages. You don't seem to know enough about the platform and it's history to make definite statements like that, imo.
JSR-133 was quite a few years after Java's release. But even after these changes, many problems remained. For example, standard library classes were often not thread safe out-of-the-box, even something seemingly innocuous like a date formatter class would malfunction in a threaded context. The rules around thread safety were also very complex, especially with regard to constructing objects. IMHO, it did not compare well with other languages around the same time built with concurrency in mind, for example Erlang. I was once very fortunate to meet Joe Armstrong and we briefly discussed Java, suffice to say he was not impressed by it.
Yes, van Rossum was an implementer of ABC, which was a teaching language. Python drew on that experience, but also from Modula-2 and Modula-3, and the needs for writing an extension language for the Amoeba operating system.
[Python is] an extensible interpreted programming language that
combines remarkable power with very clear syntax.
This is version 0.9 (the first beta release), patchlevel 1.
Python can be used instead of shell, Awk or Perl scripts, to write
prototypes of real applications, or as an extension language of large
systems, you name it.
> The letter “B” was chosen because it is the first letter of the word “beginner” and because the project was meant to become a language for teaching programming to absolute beginners.
That Java had concurrency built in from the start is a blessing mostly, but also a bit of a curse. Most of the Java ecosystem is still in the mindset that threads are cheap and firing up a couple more cannot hurt. So we end up with apps that run thousands of threads and this disease is hard to contain.
Threads are cheap compared to the process-per-request model that came before. And they're easy to code for, you don't have to worry about blocking code or awaiting your futures in the correct way or things like that.
They're not optimal if you want to squeeze lots of concurrent requests into some memory-limited box. But on the other hand even dumb threading offers more throughput than python or node.
Like not everyone toy app isn't going to be the next FAANG, there are plenty of workloads where it hardly matters, while 30 years later it is still a mess in C and C++.
And between C++ and Rust coroutines, still not sure which one I like less.
"And between C++ and Rust coroutines, still not sure which one I like less."
I, myself, am not a Go guy, but I feel it has to be mentioned here. Go's approach might not be as universal as C++'s or Rust's but I think for a large number of use cases it makes sense.
Many that praise async/await in C#, kind of forget it took about 10 years to spread across all the layer of the language and runtime, since it was done via IL rewriting, it caused several issues with F# async tasks, due to the age of the ecosystem plenty of code isn't async/await friendly and needs to be wrapped into Task.Run() or similar.
Aside from being too late (already having two models and not wanting to add a third) they also mention that the Go/Java approach adds a performance penalty when calling native APIs.
I thought so too until I got to interact with databases and Big Data tools written in Java. God, what a mess that requires so much upkeeping, more dependency problems than I remember from C++ and probably some orders of magnitude more resources than they should.
Doing code review for C++ code delivered by most well know offshoring companies, versus what they deliver in Java, will help get another point of view.
Thankfully LLMs are becoming a viable, cost-effective, option for stuffing your favorite offshored codebase with even more unmaintainable spaghetti. What a time to be alive.
I wish, but I'm skeptical. It's not that we don't already have the means to be reasonable with threads. The issue in my opinion is more a mixture of path dependency and a mindset that changes too slowly.
That's the exact attitude that lead to decades of pain with the 2 to 3 transition and libraries though. There was zero plan for how to help libraries migrate from python 2 to 3, or more importantly how does one library support _both_ python 2 and 3 from one codebase as their users take time to switch their python interpretor. The attitude of "just turn off GIL if you don't need it, just use libraries that support turning it off" means library authors will be asked to provide versions of their library that do and don't support non-GIL mode. That's a big burden to dump on library maintainers.
I’m no fan of Java, but in comparison with python, Java’s focus on extreme backwards compatibility and their ability to actually execute on this promise year after year stands in stark relief with how python has handled the same challenges. I have low confidence, despite their claims this won’t be python 4, that this will actually be executed well. Looking forward to having homebrew deliver python@3.25_GIL and python@3.25_no_GIL with each flipping package I install.
Yep, and you'll have pip3-gil and pip3-nogil binaries because each permutation of python has separate and incompatible site-packages folders and libraries. It could get really ugly.
Or it could motivate total abandonment of system-level Python installations in favor of per-app virtualenvs or whatever the new hotness is, and we'll finally achieve world peace.
I can see system level python installs being abandoned. They seem to be getting progressively harder to use over time. I don't see the replacement being virtual env, it'll be a different language ecosystem. Whichever one looks like it has remembered "easy to do simple things in" is a feature.
While Python has never been my primary coding language, I've used it extensively for building scripts and tools, but I've pretty much given up on it. The language is so elegant, but the installation of it (with 2~3 compatibility issues being just a small part of that) just became such a turnoff. It's been super frustrating having to search the Internet every time I need to install something Python-related, to then find all sorts of conflicting instructions, bleh. Virtualenv, venv, a different Pip in Ubuntu, and so on. Since there's a lot of references to Java in these discussions: there's "sadly" two ways to do it: Maven and Gradle. That's still one too many, but at least not a new flavor du jour every year, like seemingly with Python.
And I'm sure someone will drive by, now, and tell us we have to use xyz, "obviously". But if that's not "obviously" what I find in an Internet search, then the community has apparently not agreed on it consistently in a sustained way.
Having different build tools, that’s one thing. I can live with a somewhat wonky development environment to work on my code, as long as I can set it up once and it works after that. I would prefer elegance, just one is used and that’s it, but I don’t know of a language with a large user base that has that.
My point lies more in the install experiences for say, a command line tool written in said language. For Java I need one Java install on my system, it’s the most recent one. I install it and then all the old code just works, all my Java command line tools just work, I never have to watch homebrew install some stupid version of Java@11 when I have Java@13 installed like I do python@3.9 when I know I already have python@3.10. All of this is BECAUSE people basically threw up their hands and distribute the core language at their version with their code and their chosen dependencies “installed” into it… so we’ve all got 15 bespoke versions of the same language kicking around on our hard drives. That is really stupid.
The thing that really struck me after years of python is how it lets you out dependencies directly in a comment on top of a script and it will download and run with them automatically, without poisoning any system settings. It's so simple!
> it lets you out dependencies directly in a comment on top of a script and it will download and run with them automatically, without poisoning any system settings
The old-style nix-shell command in Nix can do this[1] for every language and package Nixpkgs supports, although it’s not that often used because it ties your shebangs to Nix. (An equivalent feature for the new CLI is a work in progress[2].)
I use Debian's system Python 3.10 install for most of my stuff and it works really well for me. Some things I install via pip but the key libraries (e.g. PyTorch) from source.
Is the situation in rust, where the answer is apparently to vendor the world, much better? Don't many of the big rust libraries still depend on nightly, too?
The major difference with rust is that the core language has strong backwards compatibility guarantees, and the package manager supports installing multiple versions of transitive dependencies so packages can be updated incrementally.
Lol the correct statement is it will get really ugly. Pretty much no question it will be a bigger mess than 2 to 3 transition. Here's hoping my subfield will move to a different language in the meanwhile because I don't want to deal with this shit again.
As I understand it, as a library author you either do absolutely nothing and your library will be marked as requiring GIL by default. Nothing to do, you keep on working with that good old GIL and nothing changes.
Or you make the extra effort of being thread safe and you can declare your library as not requiring the GIL.
Now if a user script mixes your GIL free lib with an older lib that has not been updated, well, too bad for them. Even with your hard work, the code will still operate like before, everything gets the GIL treatment.
Normal python devs will need to track down which pesky dependency of their script is causing the GIL slowdown. Kinda sucks but at least nothing breaks.
It's a sensitive, opt-in, and safe way forward. Hard to argue against it, really..
It's in the "Py_mod_gil Slot" section [0] in the PEP:
> In --disable-gil builds, when loading an extension, CPython will check for a new PEP 489-style Py_mod_gil slot. If the slot is set to Py_mod_gil_not_used, then extension loading proceeds as normal. If the slot is not set, the interpreter pauses all threads and enables the GIL before continuing. Additionally, the interpreter will issue a visible warning naming the extension, that the GIL was enabled (and why) and the steps the user can take to override it.
Why then is there even a need for a seperate nogil build? If it is like this says, wouldn't it be easier to just make the standard build switch to gil or nogil automatically (or honor the users choice).
The fact that the SC thinks having two versions suggests there is more complexity involved than this section of the PEP leads readers to believe.
You have to recompile your C extensions for the nogil build. Consider an application where one thread is calling xs.append from a C extension and another is calling xs.pop and xs[-1] on the same list object xs. In the nogil build these operations need to use a fine-grained lock on xs, and in the gil build these are thread-safe due to each thread holding the GIL when it does these operations.
On top of that, some of the list and dictionary operations are available to extensions as C macros to avoid the overhead of a C function call.
However, it looks like the nogil build will be able to run in "GIL mode" for maximum compatibility, including switching to GIL mode partway through execution, but I'm expecting this to be slower than running the gil build in GIL mode.
> However, it looks like the nogil build will be able to run in "GIL mode" for maximum compatibility, including switching to GIL mode partway through execution
Yes, that's what they were asking about. Why have two versions for a mode switch. Everything you explained before that is irrelevant, I'm afraid.
> but I'm expecting this to be slower than running the gil build in GIL mode.
That could be the answer to their question, but that's not definitive enough.
1. The "stable ABI" is broken on nogil, the selling point of the stable ABI was "add this C preprocessor flag and your extension will work on all future CPython versions, after paying a speed penalty". This is useful for eg closed source extensions. If the nogil build was the only build available, these extensions would require recompilation.
2. There are a lot of users that want fast Python and a lot of effort was put in to optimise it. I think the core team wouldn't want to release and expect everybody to use a version which regresses in performance, for a feature (nogil) which most won't be able to use due to using C extensions which haven't had nogil-supporting code changes.
I'm not sure if the stable ABI is really like... going to stick around. Python 3.11 deprecated numerous functions in it mostly around interpreter configuration (the new PyConfig and PyPreConfig APIs are by definition not ABI stable), and the last couple releases have shown that deprecations in Python do mean it'll be removed later. The question then is whether they'll drop these APIs with a 3.x release (which would break the entire preomise of the stable ABI) or during a bump to 4.x. It really ought to be the latter and I suspect Python 4.x is going to remove the entire notion of stable ABI.
In either case it seems obvious now that there is likely no point bothering with the stable ABI any more.
> The "stable ABI" is broken on nogil, the selling point of the stable ABI was "add this C preprocessor flag and your extension will work on all future CPython versions, after paying a speed penalty". This is useful for eg closed source extensions. If the nogil build was the only build available, these extensions would require recompilation.
It looks for a flag, if it doesn't see it then it turns the GIL on.
Where is the need for recompilation?
> I think the core team wouldn't want to release and expect everybody to use a version which regresses in performance
But you're just guessing that it's slower in GIL mode, aren't you?
The stable ABI lets extensions access the reference count of any object directly. I don't know why. Normally the functions Py_IncRef and Py_DecRef should be sufficient. Objects no longer have a single number as their reference count.
Edit: In Python 3.2-3.9, the stable ABI included Py_INCREF, the C macro.
The stable ABI lets extensions create new Python types. These can override tp_alloc field to use custom memory allocators when instances of the type are instantiated. The custom memory allocator needs to initialise the reference count to 1.
> But you're just guessing that it's slower in GIL mode, aren't you?
There are new implementations with per-object locks of list.append, dict.__setitem__ etc. These are incredibly common operations in Python code. These inherently will be more complicated than the previous implementation, meaning more instructions and slower. So there needs to be a branch at the start of list.append of whether to go to the old gil implementation or the new nogil one. Adding a branch so frequently will inherently make the runtime slower.
Now with a lot of work these can be optimised and the speed penalty reduced, but CPython goes on an annual release cycle. If the nogil code is held as a fork of the CPython repository without being merged in, until it reaches a performance goal, that brings other issues.
> Objects no longer have a single number as their reference count.
Does that stay true once the GIL turns back on?
> These can override tp_alloc field to use custom memory allocators when instances of the type are instantiated. The custom memory allocator needs to initialise the reference count to 1.
I'm not following why this affects ABI compatibility, sorry.
> So there needs to be a branch at the start of list.append of whether to go to the old gil implementation or the new nogil one. Adding a branch so frequently will inherently make the runtime slower.
I don't know about that. CPUs handle always-taken and never-taken branches very well.
In the current nogil implementation, AFAICS, it seems the GIL can't be turned back on so there is no answer yet.
Theoretically, you could have a one-off operation which fixes all objects when the GIL is turned on. However, there's no way to get all objects in Python. gc.get_objects() only returns tracked objects, and there is no way to list untracked objects.
No, I'm saying that the problem you worry about doesn't exist.
The problem -- as you point out -- with 2 -> 3 was that supporting both versions was very difficult. Because Python 2 couldn't run Python 3 code (and vice versa).
And thus libraries existed in awkward states for years.
But GIL can run no-GIL code. Supporting both is no harder than supporting one of those options (the no-GIL one).
Except they're not at all analogous situations.
In python 2 -> 3, if a new version of a library came out Python 3 only, you couldn't use both it, and your old Python 2 code at the same time.
Here, you can use the GIL until every one of your dependencies have migrated, even as new versions of your dependencies come out with nogil support until one magical day they all have nogil and you can migrate. But since you have been able to keep up with the library and weren't just arbitrarily cut off at their last Python2/GIL version, it's not a huge breaking change!
> If it was like you say there would be no need for the --nogil cli option.
Testing the difference between running it with and without `nogil` without having to install 2 different interpreters.
Testing libraries during transitions.
Simply giving users a choice.
Convenience.
Almost no one uses pre-Go.1.11 (GOPATH instead of Modules) any more for project organisation, and the transition is trivially easy. And yet, the toolchain still reacts to `GO111MODULE=off`.
It's good if there's a benefit for everyone, like python 3 fixing the terrible Unicode story. It's not clear non-GIL will even be a net performance improvement for most people--you are effectively moving syncronization from the core runtime to each and every library and program at the edge. Writing safe code at that level doesn't come for free, your basic program will be slower (and likely buggier) if every call into a library is now doing its own little bespoke GIL instead of relying on python's global one like now.
I’m not gonna argue that point; but it seems massively disingenuous to down vote someone who complains “but now I have to rewrite my library because some people might use it in non-GIL mode”.
That’s not whining; it’s just an observation that the committee making these decisions gives zero ducks about the impact this will have for anyone other than the handful of vested parties involved in making the decisions.
Pypi has what, 500k projects on it? Many abandoned.
Whom exactly is going to update those?
Or do packages get an automatic “doesn’t work with no-GIL” unless the author explicitly opts to enable it?
Or do we live in a future where any package, with any dependency may or may not have undefined behaviour in no-GIL mode?
Like, sure… it’s a good change for many people… once all the hard work is done by the community.
> Pypi has what, 500k projects on it? Many abandoned.
Most of them don't contain C extensions.
> Whom exactly is going to update those?
Its developers of course. Many are presumably watching PEP-703, others will require lobbying by their users. But there is no rush to do so because...
> Or do packages get an automatic “doesn’t work with no-GIL” unless the author explicitly opts to enable it?
> Or do we live in a future where any package, with any dependency may or may not have undefined behaviour in no-GIL mode?
... the interpreter will indeed fall back to enable the GIL if an extension is loaded that doesn't declare support of the no-GIL mode. Additionally, some extensions are actually safe to use without GIL as long as their Python APIs prevent concurrent access to them and thus act like the GIL themselves. In these cases, the interpreter can be forced to run without the GIL.
Isolating the C extension to another process is another possibility to reduce the impact on applications where all others can already run without the GIL. Actually, multiprocessing is already a common approach in the Python ecosystem for concurrent and parallel processing.
First of all, these changes are not being introduced because of a committee. They are being introduced because a way to get true thread-based parallelism in Python has been one of THE top priority demands of a huge part of the Python developer community for ages.
> “but now I have to rewrite my library because some people might use it in non-GIL mode”.
Yes, if library maintainers want their library to remain relevant, they will need to accomodate what the languages developer community uses. This is true for all languages. If they don't want to, that's okay, the community will come up with new libraries.
> the committee making these decisions gives zero ducks about the impact this will have
If they were giving zero ducks, they wouldn't make it backwards compatible, nor would there be a command line option to control the behavior.
>Pypi has what, 500k projects on it? Many abandoned.
>
>Whom exactly is going to update those?
Languages that base decisions on the update behavior, or lack thereof, of library maintainers, effectively freeze themselves.
And why exactly is the update behavior of abandoned packages a problem? They are abandoned anyway.
> Like, sure… it’s a good change for many people… once all the hard work is done by the community.
The people who want to get rid of the GIL are part of the Python development community. Many of them are library developers themselves.
> They are being introduced because a way to get true thread-based parallelism in Python has been one of THE top priority demands of a huge part of the Python
Where is this demand exactly? We hear a lot of complaining but very often this is due to a lack of awareness of available (& often better) alternatives to threading.
There is a very small number of use cases that will benefit from free threading.
Further discussions going back years can be found with a brief search. This discussion is almost as old as Python3.
> due to a lack of awareness of available (& often better) alternatives to threading.
Such as?
There are exactly 2: asyncio, which is useless for CPU/GPU bound workloads, and multiprocessing with all the pain of relying on expensive spawns, expensive and limited IPC and the joy of having to orchestrate across process boundaries.
Guess what the most common advice is for dealing with CPU bound parallelisation problems in Python? "Use another language". Guess what all the languages recommended (C, C++, Rust, Go, Java) have in common? They have thread-based parallelism.
> There is a very small number of use cases that will benefit from free threading.
Basically any workload that is CPU bound, which in the day and age of giant data aggragation and running huge ML models at scale is more important than every before, is a use case for this.
> Right, because the whole point of Python is as a glue language for native libraries.
I have had this precise need before. I have a multi-threaded native library. The multi-threading is essential for reasonable performance in the intended use-case for the library. But my users can't write C++ to call it; they'd like Python. They'd like to extend certain specific operations that my native library does during processing. I give them Python bindings. The perf gained by running the library on multiple cores is completely negated whenever multiple C++ threads of my native library need to run Python code.
> Pypi has what, 500k projects on it? Many abandoned.
> Whom exactly is going to update those?
> once all the hard work is done by the community.
If I want to use Python with parallel threads in the app I'm building, I don't need to wait for every last one of those 500k packages. I can wait for only the packages I'm using, or risk it and force Python to run in nogil mode anyway. It's my choice.
> Or do packages get an automatic “doesn’t work with no-GIL” unless the author explicitly opts to enable it?
As other comments say in this thread: yes.
> Does that remind you of anything?
> Mmm.
2-to-3 burned you. We get it. You might be getting flashbacks to that nightmare. That is understandable. But you need to look beyond a surface level similarity — "That was a transition. This is a transition. They're identical!" — and at the actual transition itself. The problems of 2-to-3 aren't present here. The user is not forced to choose between two incompatible options. Library authors aren't forced to migrate their code forward. Users and library authors do not need to collectively choose one version over another. The newer version remains compatible with older code. The sky is not falling.
At the end of the day,this is the very point the SC and core devs are ignoring in their decision.
They do recognize the impact on the ecosystem will be huge, they recognize there will be at least 5 years of parallel gil/nogil versions(*), and they say they don't want a 2-3 story all over again.
Yet they have decided against their own best advise (to avoid such a situation).
I find it utterly confusing.
(*) not just one parallel version, there will be 2 for every release following the introduction. In reality organisations will have to maintain 4-6 different baselines of Python releases along with a matching (and likely differing) set of libraries. I don't mean to fearmonger, I just happen to
maintain such environements and I know the effort that goes into this first hand.
They were put in an untenable situation, IMO. If they said no, there would be howls of protest, and a possible schism in the community, and the next SC elections could be an ugly competition between pro- and anti-GIL advocates. A "yes, but..." approach was about the only option.
> I’m not gonna argue that point; but it seems massively disingenuous to down vote someone who complains “but now I have to rewrite my library because some people might use it in non-GIL mode”.
1. But they don't "have to".
2. Even if they did, why would downvoting be "disingenuous"?
Yep I fully agree. It's going to ultimately mean 99% of people end up running in old GIL mode with deterministic behavior. Companies will get burned and have to have policies that absolutely under no circumstances will the GIL be disabled in their codebase. A very small handful of highly skilled and funded teams, probably at big companies only, will have the time and tenacity to make their code AND all their dependencies work in a multithreaded environment without the GIL.
I'm not convinced it's was the right choice either, but I'm not keen on Python, and not sold on Rust. Maybe Zig? I don't know. I personally like Typescript but even I will admit I'm not sure it's the best choice for a large, server-centric company.
> every call into a library is now doing its own little bespoke GIL instead of relying on python's global one like now.
Is this some kind of joke? Do we live in clown world now? You do realize that a lock is a trivial primitive in multi threading?
The concept of a "little bespoke GIL" is ridiculous. A lock is a lock. Sure, an extension could just put a global lock on every extension call and then you do end up with an extension wide lock but every other extension remains unaffected. There is no intelligence or genius behind putting a global lock in the interpreter that somehow gets ruined by putting the lock in the extension. In fact, GIL is the dumbest decision you could make and everything from that point on can only get better, not worse.
Where will you get no-GIL libraries, especially in the early days? Just yell at the maintainers of core libs like flask and requests until they use their volunteer and spare time to implement incredibly complex and tricky locking semantics all over their codebases, AND test it all with both GIL and non-GIL interpretors at scale to suss out race conditions? That just happens for free and overnight because a lot of people are plus one mashing on GitHub issues I guess?
If I understand correctly how the GIL works, the change will only affect libraries with native componentsml. Everything that's pure python will continue to work unchanged.
Even pure python code could have race conditions with the GIL disabled. Stuff like accessing and modifying a dictionary item in python code is assumed and currently guaranteed to be atomic because of the GIL. Remove the GIL and decades of assumptions break.
The GIL doesn't serialize the execution of threads, but ensures only one executes at a time. Therefore, the race condition issues you describe already exist even with the GIL. The GIL is only there to ensure that the interpreter doesn't corrupt Python objects and its internal data structures, which in the best case leads to a crash.
I think that's a misconception? At least the way how I understood this issue.
Things like `map[k] = v` would be atomic both before and after nogil, and things like `map[k] += 1` are not atomic even with GIL, the read and store can be split from each other.
Come on, think about it harder. Do you really think this would even have a chance of becoming a PEP if python code was affected? If you thought even a little bit about it, then you would have come to the very simple conclusion "no" but here we are.
Lexical scope, first class functions, native coroutines, compiler available at runtime. Trivially extensible. It's semantically really close to lisp. Yep, lua's one of my favourites. The front end syntax isn't what I'd like but whatever, I can still see the AST through it with a little effort.
I am not using luajit for various reasons but the problem with the regular Lua interpreter is that the lua ffi backport from luajit is not included by default and the third party ones are broken on ARM.
I didn't dive deep on this, but I assume that GIL mode can still run anything, including no-GIL code (it is one of their promises, at least). So, unlike 2->3, there is forwards-compatibility.
It also seems like the latter isn't meant as a replacement (for the moment), but rather as an option.
Presumably C extensions will have a different API name/ABI to prevent accidentally calling into GIL code when in non-GIL mode and vice/versa so that's going to complicate the compatibility story.
It seems to have clearly been a case of "I remember 2 to 3, that was really bad. I imagine everyone else is wondering this - is this going to be another 2 to 3?"
Long term the plan is to 100% remove the GIL from python.
From the article:
> Long-term, we want no-GIL to be the default, and to remove any vestiges of the GIL (without unnecessarily breaking backward compatibility). We don’t want to wait too long with this, because having two common build modes may be a heavy burden on the community (as, for example, it can double test resources and debugging scenarios), but we can’t rush it either. We think it may take as much as five years to get to this stage.
Under base assumptions it also says:
> We want to be very careful with backward compatibility. We do not want another Python 3 situation, so any changes in third-party code needed to accommodate no-GIL builds should just work in with-GIL builds (although backward compatibility with older Python versions will still need to be addressed).
Coming from a place of total ignorance, it would be nice if you could do this more incrementally, like have it be in no-gil mode by default, but then have a context manager you can use for gil sections, and have the interpreter bomb out if you try to enter gil-required code while still in no-gil mode.
The issue is when someone's unattended-upgrades bumps up the version and causes something to come crashing down.
The people who need to use nogil should know that they need to, and will now have the ability to enable it
We're now up to 128 core CPUs, and even cheap CPUs have 6 cores on them. Restricting things to a single core's performance gets more and more limiting as time passes.
I remember these warnings as well from when I started programming C seriously in the 90s. When I first encountered them I was convinced these issues will be resolved in a matter of weeks, maybe a couple of month at worst. Ohh, so little did I know.
I have a hard time seeing the equivalence here. In C the interaction with non thread safe functions is much more direct. Most people are also more cautious when writing in C.
In Python you have whole C modules with global state. Load 10 of them, add the interpreter complexity and soon enough no one knows what is going on any more.
As it is, most developers (including core devs!) don't even bother to check for memory leaks. I don't think they'll run tsan, and if they do, it will be on a small test suite that only covers 10% of the code.
Given the software development practices in Python and especially in the AI space, I'm very pessimistic about this feature.
While I'm happy to see optional GIL approved and happening,
I also suspect that the GIL has saved us from debugging reentrant and/or dangerously concurrent code for years, and I salute the GIL for forcing us to build Arrow for IPC in Python, in particular.
Someday, URI attributes in CPython docstrings might specify which functions are constant time, non re-entrant, functions.
I wonder why so many library developers even chose to build native libraries on the shaky and poorly-architected foundation that Python is.
Even writing a JVM native JNI library would have allowed to avoid a lot of that pain (and the library would have been useable from Clojure, Kotlin , Scala, JRuby[1], Jython[2], Java, etc) without any painful threading issues.
[1] which I’m aware of having been used in production by companies in the past
[2] which I’m aware has been quite a bit under maintained for the last several years
Because with the GIL it was dead simple to glue a C library into Python, and also was the canonical way to address hot inner loops in Python: rewrite into C. Nothing fancy but a little trial and error with module loading and you get 30% speed ups without being a great C programmer.
I don’t think it’s false to say that the ease of moving hot spots into C is part of the reason Python has been so successful for thirty years.
I hope this works, but I am very sceptical about being able to port code that worked with a locking solution provided for the enthusiast trying out C working without running into concurrency bugs.
My understanding is the thing about the binding system that drove adoption more than anything was due to how easy it was to generate bindings for an existing C library. This drove stdlib expansion and allowed rapid repurpose of huge amounts of long-standing and popular C libraries, and it also added instant credibility for those that already trusted those libraries. This helped avoid the chicken and egg problem of not enough devs for a serious stdlib, but not serious enough stdlib to draw dev (which is where most languages, even good ones, die).
I distinctly remember that during the time period where Python grew from “minor” to “dominate” (roughly 1995-2005), doing Python dev was often a process of answering “are there bindings for that?” And usually _there were_, because of that tooling.
C extensions are a different use case than accessing a C library using `ctypes`. The former allows intimate interactions with Python objects and therefore requires the GIL. The latter should only cause trouble if the C library executes callbacks on another thread. In that case, a Python thread is created for every invocation, which could cause race conditions with Python code. However, this problem also exists in GIL mode. In most cases, foreign libraries usually have Pythonic wrappers that should prevent some of these issues.
btw, many C extensions release GIL already e.g., one Python thread can do regex work, another some number crunching, yet another waiting IO--all in parallel.
- The history is backwards: it isn’t that devs wanted to make a native library and then chose Python, it’s that they chose Python and then they needed a native library.
- Python works very well for scripts and small programs, and decently well for medium-size programs. This makes libraries with concrete purposes very useful and productive. If your library, say, helps devs do some basic calculation with time series, the ability to be used in quick scripts is a big plus.
- The C API is fairly good, and libraries such as pybind11 make it even better. You don’t need a lot of code or boilerplate for an extension.
Actually, Java virtual threads have a similar problem with JNI. Your virtual thread gets pinned to a core if you call native code. This isn't as bad as the python GIL but it still limits the usefulness of virtual threads if they can't migrate to other cores.
Exciting. Python is mostly written as C shared libraries that knew they had a global lock to rely on.
Some of those do sufficiently simple things that they can run without any locking and all will be fine.
Others will still need locking, but are now under pressure to run without the gil. Some of those are going to do DIY locking within their own bounds.
Maybe what python has really been missing all these years is loads of ad hoc mutex calls scattered across the ecosystem. Data races and deadlocks introduced in the name of performance is not how I expected python to go out.
edit: expanding on this pessimism a bit.
Making C libraries written assuming a global lock thread safe is the sort of thing I'd expect concurrency experts to advise against and then make mistakes while implementing it. My working theory is that most people who wrote C extensions for python are not concurrency experts and are great programmers who won't back down from a challenge.
The data-race/hang/segfault consequences of this combination look totally inevitable to me. Python application developers are not going to love the new experience and I'm thankful my products are not built on top of the python ecosystem.
I think you're right. Making it an explicit opt-out, as is planned for the first stage, should be fine. Expecting to make it opt-in in 5 years seems too optimistic to me. It relies on all the library developers to fix their libraries (also the Python ones). That's tough work, and importantly, if done well, it will even go unappreciated: nobody will notice it.
Many libraries have never had a multi-processing use case, others are so big that bugs are bound to happen, many of them subtle, so one guaranteed outcome will be unreproducible complaints and devs throwing in the towel. Opt-in will make people unhappy.
Surely if it goes well people will see their existing python codebases become more performant with no development required aside from updating some dependencies? Not nobody will notice it.
It seems like it could be a great outcome for developers.
So, removing the GIL will slow down Python, at least initially. Speed gains from multi-threading won't come if you don't change your code, unless you happen to use a library that becomes multi-threaded (in which case you have to start worrying about callbacks).
> It seems like it could be a great outcome for developers.
For the ones that use certain libraries, not for those that have to build them.
Most C libraries that are often called from python also have bindings for other languages and so at this point should be threadsafe, right? Also even GIL python supports threads so all these libraries will at least be reentrant safe already.
For libraries that are reentrant but not thread safe, it should be sufficient to just add a global lock wrapping every call, which is pretty close to what the GIL was doing anyway.
It seems to me that in many (most?) cases it should be relatively straightforward to make existing libraries work without the GIL (at the cost of parallelism). I guess the main issue will be for libraries that call back into the python runtime from the C side.
Naiive question: Who needs No-GIL when we have asyncio and multiprocessing packages ?
never ever had a problem with GIL in python, always found a workaround just by spinning up ThreadPool or ProcessPool, and used async libraries when needed.
is there any use case of No-GIL which is not solved by multiprocessing ?
I thought Single threaded execution without overhead for concurrency primitives is the best way to high performance computing (as demonstrated by LMAX Disruptor)
It's only about performance. asyncio is still inherently single-threaded, and hence also single core. multiprocessing is multi-core and hence better for performance, but each process is relatively heavy and there's additional overhead to shared memory. GIL multi-threading is both single-core and difficult to use correctly.
No-GIL multi-threading is multi-core, though difficult to use. I don't know the Python implementation but shared memory should be faster than using multiprocessing.
That said, when designing a system from scratch, I completely agree with you that for almost almost almost all Python use cases, threads should never be touched and asyncio/multiprocessing is the way to go instead. Most Python programs that need fast multi-threading instead should not have been written in Python. Still, we're here now and people did write CPU-intensive code in Python for one reason or another, so no-GIL is practical.
In these threads, I also always see a lot of people who simply aren't aware of asyncio/multiprocessing. I assume these are also a significant share of people asking for no-GIL, though probably not the ones pushing the change in the committee.
I would argue that if you have large concurrency and shared complex state - you better off use kafka and redis/memcached as a shared state - and design proper fan-out.
This design scales much better for systems that will eventually overgrow one big machine. the No-GIL pytohn will be of no use, when you need to deploy your app across 100s machines.
I understand people want to take advantage of all cores etc, but at large scale - you will eventually need to split computation across machines and will resort to in-memory cache/queue anyways - so better just architect your system since day0
Stores like that, while scaling well, are orders of magnitudes slower than CPU memory. The kind of application I was thinking of is more compute-intensive, eg. image processing or fancy algorithms.
PostgreSQL shows how far you can get with a single big box and using multiple cores and shared memory. It's incredibly powerful and the vast majority of applications never have data big enough to warrant "100s of machines".
There are a lot of performance sensitive codebases where something like this would destroy performance, it works well for shared nothing parallelism, but the moment you have shared state it kinda falls over.
But many systems will never need to run at that kind of scale, but could still benefit from better threading performance, so it's good to have an "intermediate" choice, if that's what you consider it.
sometimes you need all of that data in-process. when you move that state into redis, you still need to perform i/o to access it. when speed matters, this is troublesome.
Do you include the man hours in those calculations? Because they pollute a lot between the car, the food, the electricity the computer and internet for devs consumes, etc.
That's fine for regular Python, but this doesn't convince me for multithreaded Python. Most Python modules which are optimized for performance (numpy, pytorch, pandas, and all others built on top of them) are already multithreaded and drop the GIL so you can parallelize your workload with the threading module.
If someone really needs several threads of pure python being interpreted, something is afoul imo.
This. Python had a unique stance, it was different for a good reason.
The seamingly endless attempt to become more likeable to yet another subgroup of needs is not a good development. It started with static typing, it continued with async, and now we have free threading as the final draw.
Especially PyCharm with its endless indexing. Have the time its key features are not available bc it has decided to go on another indexing spree. Alas that's for another thread ;)
A free-threaded Python will be harder to make faster for single-threaded cases. So this could be a win for those who want to write multi-threaded code at the expense of everyone else.
> asyncio is still inherently single-threaded, and hence also single core.
IIRC some of the proposals around removing the GIL in the past have actually suggested that the asyncio paradigm could become multithreaded for parallelism.
> is there any use case of No-GIL which is not solved by multiprocessing ?
Tons:
- A web server that responds using shared state to multiple clients at the same time.
- multiprocessing uses pickle to send and receive data, which is a massive overhead in terms of performance. Let’s say you want to do some parallel computations on a data structure, and said structure is 1GB in memory. Multiprocessing won’t be able to deal with it with good performance.
- Another consequence of using pickle is that you can’t share all types of objects. To make matters worse, errors due to unpickable objects are really hard to debug in any non-trivial data structure. This means that sharing certain objects (specially those created by native libraries) can be impossible.
- Any processing where state needs to be shared during process execution is really hard to do with the multiprocessing module. For example, the Prometheus exporter for Flask, which only outputs some basic stats for response times/paths/etc, needs a weird hack with a temporary directory if you want to collect stats for all processes.
I could go on, honestly. But the GIL is a massive problem when trying to do parallelism in Python.
> Manuel Kroiss, software engineer at DeepMind on the reinforcement learning team, describes how the bottlenecks posed by the GIL lead to rewriting Python codebases in C++, making the code less accessible:
> "We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers."
For average usage like web apps, no-GIL can be solved by multiprocessing. But for AI workloads at huge scale like Google and DeepMind, the GIL really does limit their usage of Python (hence the need to translate to C++). This is also why Meta are willing to commit three engineer-years to making this happen: https://news.ycombinator.com/item?id=36643670
I never really understood Meta/Facebook's practice of relying on scripting languages. Ok, replacing PHP might not have been an option given the accelerated growth of Facebook but Python was only used for tooling originally, as I understand. If they needed threading and performance so badly why didn't they go for a compilted, statically-typed language?
Sunk cost / laziness - I remember when Facebook wrote their own JIT VM to run PHP on top of (HHVM?) to speed up all that PHP code.
Probably was easier to have one crack team of software developers write something new which could interpret all of the existing codebase, than it was to lead a widespread conversion of all of that code into faster languages.
ie. not everyone's a senior dev. There's reams more junior devs coming from bootcamps and such, computer science grads, etc, who can grok "scripting" languages like python and JS and Java much easier than they can pick up C++SIGSEGV or Rust algebraic data type smart pointer closure trait object macros.
Think how much of the world is boring "business logic" and it makes more sense - focus efforts making the [on the surface] simple, widespread, generalist, scripting languages, faster - we've seen it with Python, we've seen it with JS (node), we've seen it with Java.
Given how big the slow languages are, it makes lots of sense to save their CPU cycles compared to trying to hire from a much smaller pool of "competent at lower level programming" devs.
I honestly don't understand why people complain about this. One of the best parts of software development is that your tools keep getting better. Your value as a developer keeps growing because the code you have written in the past gets automatic improvements.
Java isn't a scripting language. I don't get your point about junior devs either as FAANG companies such as Facebook can pick and choose from the highest caliber developers.
I conceptually think of Java in the same family of languages as the "scripting" languages as it still (in its default distributions) is a garbage-collected language running on top of a virtual machine instruction set, and allows you to do stuff with dynamic typing / duck-typing and reflection at runtime that less experienced developers (me, as a CS undergrad) can make use of. Compared to stricter typed compiled languages that less experienced devlopers (me, as a CS postgrad) had an inevitable learning curve with. It was slower than it used to be, over time it has had improvements to the language which have introduced progressive increases in performance but also introduced backwards incompatabilities.
RE "scripting" vs "compiled" - I'm probably using the semantics wrong :p
To me, "scripting" is more like... Rexxx, or Lua, or Bash. Stuff that's turing complete but more restricted in how you can express things in the code, or sandboxed (designed to be embedded). Python may have started off designed to be embedded as a scripting language, but these days it's a very very general purpose _predominantly interpreted_ language, considering where it's used and the libraries it has. It's not just used inside of Blender or OpenResty for example.
I'd argue the same about Perl and PHP. If people (psychopaths) are content with using PHP-Gtk to make _desktop_ apps, does it count as scripting language the same way "Lua embedded as a way to make Source Engine entities interactive" is a scripting language? :p
> FAANG companies such as Facebook can pick and choose from the highest caliber developers.
Sure - they have lots of money. They're also very, very big companies, with offices all over the world. Look at the sheer amount of people they hired which they backtracked on later "oops, we hired too many of you, haha, sorry! layoff time!". It doesn't take 10+ years of experience to come fresh from a coding bootcamp, complete the Google code test and become a Noogler in the "Wear OS performance metrics" team writing boilerplate AsyncTasks that call UrlRequests all day every day. Plus on the positive side they encourage people to join through undergrad / new grad schemes. Like, isn't there a whole thing about people _starting_ their tech careers in FAANG corps?
So Python is being fundamentally changed for everyone because of the needs of a niche subset of Python programmers (AI researchers), because that niche subset refuses to learn a language more suited to their task?
Ugh, don't - we're convincing the legions of data scientists to move _away_ from the specialist languages (Matlab, R), because at least in Python, the code they publish with their papers is more repeatable/reproducible/reusable, and is a free language [that doesn't require a license server and/or paid plugins], and then we can plug their Torch model / numpy based computer vision algorithm into a Celery worker or a Flask endpoint :-)
> Naiive question: Who needs No-GIL when we have asyncio and multiprocessing packages ?
1. Because asyncio is completely useless when the problem is CPU bound, as the event loop still runs only on a single core. As the name implies, it is really only helpful when problems are IO bound.
2. Because sharing data between multiple processes is a giant PITA. Controlling data AND orchestrating processes is an even bigger pain.
3. Processes are expensive, and due to the aforementioned pain of sharing data, greenlets are not really a viable solutions.
This probably isn't going to be that groundbreaking for your average web application. But for several of the niches where Python has a large footprint (AI, Data Science), being able to spin up a pile of cpu/gpu-bound threads and let them rip is a huge boon.
how likely, the corresponding code doesn't release GIL already? Pure Python is 100x slower than native code therefore the number crunching itself happens in C extensions where GIL can be released.
> therefore the number crunching itself happens in C extensions
The number crunching does, but distributing the workload, receiving results, storing and retreiving data, etc. doesn't. And these are huge losses in performance that could be avoided if we could parallelise them.
but it would incur overhead of concurrency control: mutex, locks, semaphores.
I dont believe python will ever have atomic operations, even if it had - they still incur significant overhead for concurrency control.
sharing state between threads is such a narow niche use case, this pattern is practically solved by memcached/redis for larger scale python based systems
Relying on Redis for data sharing between concurrent processes seems like a massive overhead to me. You've got network overhead as well as a single threaded data store.
I am thinking about multithreading every day to try make it easier to use. I journal about it in my ideas journals.
even if they get "free multithreading" with no-GIL, their system eventually will overgrow one beefy machine and will need to be deployed across a fleet of 10/100/1000 machines.
at which point you lose benefit of no-GIL, because you now have to introduce redis and kafka into the system
> even if they get “free multithreading” with no-GIL, their system eventually will overgrow one beefy machine
Why?
Yeah, if you are building a system that is, say, serving web requests, and have an internet scale potential market, success might mean that.
Not every system works that way. A simulation system with defined parameters doesn’t grow in scale if it becomes more popular, you just have more people running isolated instances that don’t depend on each other. Plenty of other applications scale that way rather than the “SaaS that serves ever more clients” way.
I think this argument presumes that everything is the sort of problem that maps well to redis and kafka. Scientific computing doesn't. And while things like numpy might lower contention on the GIL a bit, it's not a cure-all.
Finely-grained locks are useful. Even when you end up scaling between machines, it can be useful to have many threads in one memory space to maximize what you get out of one machine.
We're moving up to hundreds of cores; Python often being stuck only being able to use a couple while tightly coupling state has been unfortunate.
Why?
On AWS you can rent a 24 TB, 500 core machine. Almost all problems are smaller than that so don’t need to scale to more than one machine.
Building applications that run on multiple machines is at least one order of magnitude more complex and thus slower (in development velocity), so needlessly building an application to work distributedly is just bad engineering.
Yes well if you are distributing to N machines, you probably want to use all M cores on each of those machines. You'll still get a performance advantage from multi-threading.
You might think that you can simply spin M processes per machine instead but now you have N*M servers instead of N servers that are M times faster. In many cases this means you have significantly higher overheads: slower startups, a lot more RAM usage, more network IO etc.
Outside of a few embarrassingly parallel problem, two-level parallelism is usually the highest performance approach.
I kept making this point as well as the other arguments above (and others did too) in the Core Dev discussion group. Unfortuately to no avail. To be sure I am not a core dev.
Well I don't agree that just because one needs >1 servers, no-gil is suddenly useless.
Still lots of complexity and awkwardness that can be avoided if you can do threading instead of processes. Like Promotheus scraping from a non-webserver python app is a pita, as you need a new process and lots of communication, vs just plug and play as in other languages.
Or just the insane resource usage. Had a java app serving multiple orders of magnitude more customers running on a few containers. Our current python app needs multiple deployments with different entry points, and about 15x the amount of containers.
It is not fair to compare CPython (which is on purpose not optimized, only a reference implementation of interpretable scripting language without any focus on performance) to OpenJDK - an arguably state of the art compiled bytecode VM with JIT and AOT compilers available, with decades and many $millions poured into runtime/JIT/GC/etc research and optimization
"on purpose not optimized, only a reference implementation of interpretable scripting language without any focus on performance"
That policy is over.
As the last years have shown, no alternative implementation can get off the ground due to C extensions and compatibility concerns, and CPython is now relied on for many large applications. It no longer makes sense to prioritise a simple implementation over performance.
Do you think Meta (Instagram) are pushing GIL removal and Cinder for no reason? They clearly have that scale and still benefit from faster single machine performance
Most systems don't grow forever, and can stay on one machine.
And "one beefy machine" has a very high limit, so by the time you actually outgrow it you usually have tons of resources available to help rewrite things.
The problem with only relying on asyncio and multiprocessing is that they only implement per-process concurrency and parallelization per-process.
Threads let you use the same unified abstraction for parallelization and concurrency. They also make it easier to share state with parallelization (no need to go out of your way to do it) at the cost of requiring you to think about and implement thread safety when you do so.
Also, with no-GIL + threads the computational costs of creating and maintaining a parallel execution is much less vs multiprocessing. And data sharing and synchronization are less expensive.
What LMAX is doing is really just an overhyped way to speed up producer-consumer models. It might apply to your use case but it’s not the only reason you’d use parallelism or concurrency. I don’t even understand why they are claiming it to be an innovation when it’s just using a LockFreeQueue implementation within a pre-allocated arena? You also can’t synchronize with their implementation, which sometimes you really need to do. Not a silver bullet
multithreading with shared state introduces several limitations:
1. random jumps in memory and branch misses
2. L1/L2 cache flush
3. context switch cost
4. concurrency locks cost
my understanding is that LMAX eliminated these costs:
1. pre-allocated arena ensures cache locality of operations
2. we dont jump form one sector of memory into another. Algorithm more resembles linear scanning of working memory set, and mostly within L1/L2 cache
3. no context switches, no cache flushes
4. no concurrency control costs
Yes, but LMAX is a constrained model. With a producer:consumer dichotomy you don’t have to consider synchronization among consumers.
Let’s say you did try to implement that in LMAX. It’s common for consumers/“workers”/what have you to require synchronizations amongst themselves, for example if they are operating on a shared k:v store of strings (operating an in-memory db let’s say). You can’t do atomic reads or writes on the thing so you need a locking mechanism; under LMAX you’d have to introduce another layer of producers to control reads and writes and then have another layer of consumers afterwards to handle the rest of your “consumer flow”, or wait in the original consumer thread for the producer to complete, which starts getting very wasteful and is certainly no better than a typical regular locks and context switching.
Again, this is not even a new thing. Lock free queues and “local atomic concurrent pub-sub” have existed for a long time - we have an implementation where I work. It’s not a perfect model even for where that concurrency pattern is wholly sufficient for what you’re doing either - the performance boost from the cache and context switching improvements have to be greater than the slack (in cost or throughput) introduced from producers or consumers sitting idle waiting for upstream data.
Also, context switches/cache invalidation/concurrency overhead can be avoided or at least greatly reduced by smart userspace scheduling a la fibers. With hand tuning it can potentially be completely eliminated (you can control which concurrency units to collocate on a thread and resume concurrency units/threads immediately after their waiting locks free) which is basically the same idea as LMAX. The problem of course, like with LMAX, is that doesn’t generalize.
PEP-703 contains a whore Motivation section. Long enough to require a summary:
> Python’s global interpreter lock makes it difficult to use modern multi-core CPUs efficiently for many scientific and numeric computing applications. Heinrich Kuttler, Manuel Kroiss, and Paweł Jurgielewicz found that multi-threaded implementations in Python did not scale well for their tasks and that using multiple processes was not a suitable alternative.
> The scaling bottlenecks are not solely in core numeric tasks. Both Zachary DeVito and Paweł Jurgielewicz described challenges with coordination and communication in Python.
> Olivier Grisel, Ralf Gommers, and Zachary DeVito described how current workarounds for the GIL are “complex to maintain” and cause “lower developer productivity.” The GIL makes it more difficult to develop and maintain scientific and numeric computing libraries as well leading to library designs that are more difficult to use.
Yes, in the first line. Only spotted it now, totally my bad. It's a very not nice word to say to women, and what it makes it worse is that it actually doesn't outright destroy the meaning of the sentence. I'm sure PEP-703's authors are not that desperate about enacting this change.
..without needing to provide a protocol that covers each possible scenario the client might wish to execute in the process?
I believe the answer is "you don't", but passing functions in messages is a highly convenient way to structure code in a way that local decisions can stay local, instead of being interspersed around the codebase.
> I thought Single threaded execution without overhead for concurrency primitives is the best way to high performance computing
You can have shared-memory parallelism with near-zero synchronization overhead. Rust's rayon is an example. Take a big vector, chunk it into a few blocks, distribute the blocks across threads, let them work on it and then merge the results. Since the chunks are independent you don't need to lock accesses. The only cost you're paying is sending tasks across work queues. But that's still much cheaper than spawning a new process.
Agreed. Feeding the GPUs with multiple forked memory-hogging processes is no fun and leads to annoying hacks. And, yes, as per your other post, there could have been other solutions to this problem, some of which might have been better.
Yes but that's a very particular use case that could have been well served with a per gil thread and arena based memory for explicitely shared objects.
But I have the same question as you have if we add another coming concurrency model: SubinterpreterThreadPool, which will be possible with the per-interpreter GIL in python 3.12 and later.
That's another new model that is already confirmed to be coming: interpreters (isolated from each other) in the same process, that can run with each their own GIL.
Multiprocessing has a lot of issues, one of which is handling processes that never complete, subprocesses that crash and don’t return, a subprocesses that needs to spawn another subprocesses, etc.
Multithreading is more efficient but more difficult to work with.
You share the same address space in threads, so you can communicate any amount of data between threads instantly within a lock. The same cannot be said for network traffic or OS pipes or multiprocessing.
Multiprocessing uses pickle to serialize your data and deserialize it in the other python interpreter.
If you start a Python Thread, you're still single threaded due to the GIL.
Not sure why this is downvoted, I never had much issues with the GIL as well.
Multiprocessing does the parallel computation pretty well as long as the granularity is not too small. When smaller chunks are needed most of the time that's something better done from an extension.
When you create a new process you can't share things like network connections. Also, IPC tends to be very slow. It is abstracted away nicely in python, but it's still very slow, making some parallelism opportunities impossible to exploit.
For creating stateless, mostly IO bound, servers, it's great. Try to squeeze in performance and it all starts to fall apart.
This can (and I think will) cause issues for C extensions because many are written without multi-threading in mind. Here is a small example which is unsafe if lst can be accessed from another thread: https://news.ycombinator.com/item?id=36649769 Note that the code may cause a context switch even today if the C code callbacks into Python bytecode (via a __del__ method) and the bytecode is long enough (100 instructions I think). However, that is extremely unlikely and much C extension code is not written with such situations in mind.
People using C extensions may also rely on them executing atomically. For example, you could have a thread pool that posts and receives from a numpy array. Would work fine today but break without the GIL.
Yep there are a ton of issues like that to be found, and unfortunately they will manifest as difficult to find and debug race conditions. This is why the proposal and work is to make non-GIL mode entirely optional and not the default.
It just means for the brave few that flip it on and use it, be prepared to spend a huge amount of time finding and fixing subtle race conditions in decades of old python library code. The early adopters are going to be in for a lot of pain, or more likely they'll restrict their use of non-GIL processes to very specialized and dedicated processes that have as few dependencies as possible.
Does Python have a lot of secondary dependencies? I could see someone pulling in two dependencies, not realize they both use the same unsafe library, and end up having them step all over each other.
It does, so much so things like virtualenv were introduced so every program can have its own set of dependencies such that they won't clash with other libs on your system. Something like flask or fastapi pull in a lot of secondary dependencies alone.
I don't think this is true. There are fairly strong voices on both sides inside the community, at this time it's pretty uncertain.
To quote Guido:
>Let’s not blow it this time. If we’re going forward with nogil (and I’m not saying we are, but I can’t exclude it), let’s make sure there is a way to be able to import extensions requiring the GIL in a nogil interpreter without any additional shenanigans
The Steering Council said their intention is to remove the GIL-build in future:
> Long-term, we want no-GIL to be the default, and to remove any vestiges of the GIL (without unnecessarily breaking backward compatibility). We don’t want to wait too long with this, because having two common build modes may be a heavy burden on the community (as, for example, it can double test resources and debugging scenarios), but we can’t rush it either. We think it may take as much as five years to get to this stage.
That's fine. We go from suspecting there are issues to knowing exactly where the issues are. The rest is just chipping away at that list and making the issues go away. Either by adding some kind of mutex around the code or by replacing the native code with something less likely to have issues.
The argument against this seems mainly that it's a lot of work; not that it's impossible work. It probably is a lot of work but if there are enough people doing the work, we should get some results.
> Note that the code may cause a context switch even today if the C code callbacks into Python bytecode (via a __del__ method) and the bytecode is long enough (100 instructions I think). However, that is extremely unlikely and much C extension code is not written with such situations in mind.
As someone who works professionally on a parallel / async runtime that supports thousands of continuously running servers, "extremely unlikely" means that actually it breaks all the time, but it's also impossible to debug.
I think the CPython core devs are very keenly aware of these issues. Otherwise they'd have announced a plan to suddenly rip out the GIL altogether, rather than a phased approach that allows people to opt-in to the no-GIL mode.
Remember the transition of text to Unicode? 32 to 64-bit? Intel to ARM? Y2K?
No-GIL is a much smaller shift. It can follow the same transition path without radically breaking things. And if some things do break, there would be a well-defined way to handle those cases.
We all somehow survived those. Glad to see forward motion on this. It will open up a lot more terrain that has been marked off as untenable.
One of the things about early Swift that they got right was building breaking changes into the promise. Everyone knew where they stood and adjusted just fine. Sometimes I wish Python would take the same path.
I think that's a bit different. 32 to 64 - you could test whether it works. Same for arm. Same for y2k. Sure, maybe the testing wouldn't cover the failing case, but the testing you did would be deterministic. But here? Test all you want and the answer is: it's either correct or you haven't triggered the right race yet.
Yes, it is different because nobody would be forced to run the interpreter without the GIL. Applications where race conditions are unacceptable can keep using the GIL build. Even the `--disable-gil` build can be forced to keep using the GIL.
I know they say specifically that they don't want a repeat of the Python3 transition scenario, but the approach they're taking now still veers eerily close to that path, at least it looks that way to me.
A lot will depend on the Python community and the distribution channels. I could see the community struggling to adopt it in a timely fashion, or distributions jumping the gun (Ubuntu, Fedora, Anaconda). Maybe it's too early to make hard decisions, but how much control does the SC really have to avoid such a scenario?
They say they want no-gil to be the only build mode 5 years from when it becomes available. That is both too long and too short.
That is too long for Python to have 2 modes. Half a decade is more than enough time for 2 modes to become the status quo. For one thing, think of all the outdated Stackoverflow threads that will be hanging around after that 5 years. I am not optimistic that 5 years won't turn into 10 years of uncertainty and breakage.
But 5 years might be too short for everyone to dredge up all that C code, update it, test it, and call it mature.
> 5 years might be too short for everyone to dredge up all that C code, update it, test it, and call it mature.
Developers are lazy. Most people will just do nothing until, 5 years from now, some blog post will go "oh, next month we switch noGil as default, good luck!". At that point everyone will scramble, rush out buggy releases, and spend 5 years finding all the problems.
The no-gil "build mode" will still have the ability to switch to GIL "run mode", it will just be slower than the GIL run mode on the GIL build mode. Hopefully not much slower.
Yes, it will resemble the 2to3 scenario. Corporations that pledge support will mechanically convert some projects (pestering the actual developers or threaten with forks?), bugs will be ironed out by the actual, unpaid developers over years.
But apparently Python needs some "success" and this makes a good bullet point. Correctness does not really matter in the Python world.
They kind of burned a breaking major version transition for no good reason with 2-to-3, now they are prefacing a major change with "it won't be like 2-to-3". It sounds like they may be maintaining two operating modes in CPython 3 instead of going forward with another major transition, just because of that history.
> They kind of burned a breaking major version transition for no good reason with 2-to-3
The unicode/text changes alone were a pretty good reason. Division producing floats are also a nice change IMO. I don’t want to discount the challenges with the transition but saying there was no good reason isn’t right to me.
There were a lot of bad reasons as well. The removal of the u string prefix in versions 3.0-3.2 was unnecessary and made the transition much more difficult. It kind of gave Python 3 a bad reputation.
I’m glad they’re very conscious about how easily this could turn into a Python 4 debacle.
They’ll have to be intensely careful not to accidentally affect yes-GIL behaviour. All kinds of weird cases are possible if any sort of emulated GIL isn’t exactly like with a GIL.
>I didn't dive deep on this, but I assume that GIL mode can still run anything, including no-GIL code (it is one of their promises, at least). So, unlike 2->3, there is forwards-compatibility.
>It also seems like the latter isn't meant as a replacement (for the moment), but rather as an option.
As long as GIL mode remains compatible with both old and new code, I see very little danger in having a no-GIL mode (besides hogging CPython maintainers' time).
I under-appreciated the forward comparability. So this partly a framing critique. If you’re going to frame it as “how this won’t be like before”, good to really highlight the how.
Agreed--there needs to be discussion and thought about how this impacts library maintainers. How do they tell users their library supports or doesn't support non-GIL mode? Will pypi have new metadata to specify and enforce projectsdependencies support non-GIL mode, or is it just a chaotic free for all where users have to figure that out themselves? How will a library author have one codebase that supports both GIL and non-GIL mode--will they effectively fork the code and maintain two codebases (yuck!) or will there be support for detecting GIL mode? How does this work for C extensions too? There's a ton of work to make this smooth for libraries and I really hope it is being thought through better than the python 2 to 3 story for library authors (which was no story and chaos).
quite a large set of users are using default python installation. There is large number of companies and non-trivial codebases that run on default python that comes with the oldest LTS version of Linux distro.
currently quite a lot of companies use python3.6 only because it comes standard with the Ubuntu 14.04.6 which happens to be the oldest LTS version - and companies have habit of migrating from out-of-support LTS version to currently-supported-oldest-LTS
this can repeat 2->3 because there will be users stuck with older versions of python and library maintainers will have to maintain both versions: with GIL and without GIL (just like PHP extension developers did with thread safe methods)
But shouldn't old GIL python versions be fine running code that is no-GIL compatible (assuming it is otherwise compatible)? Having a thread per process doesn't mean you cannot run code which is fine with having n threads per process. So if you maintain code which is otherwise compatible with e.g. python 3.6, after making it no-GIL compatible it should still be compatible to 3.6.
I am sure the intent is good. I am not so sure it is possible to avoid. They already say it could take 5+ years of having gil + nogil exist in parallel.
For any tool builder that means their cost has just doubled for the next five years, at least. Why? Because people will want to use tools in either mode, no matter if it is deemed productive or experimental.
essentially the adoption of No-GIL Python will depend on:
1. In which version No-GIL will become default option in CPython
2. When that CPython version will come standard in LTS Linux distro
3. When all earlier LTS distros will go out of support
4. When companies switch from outdated to target LTS version of distro
currently quite a lot of companies use python3.6 only because it comes standard with the Ubuntu 14.04.6 which happens to be the oldest LTS version - and companies have habit of migrating from out-of-support LTS version to currently-supported-oldest-LTS
A customer on 18.04 just told me to wait another year and migrate their servers to 24.04 so they can stay there until 2029, or will that be 2030? They are on the standard 5 years LTS support, not the extended 10 years one.
1. There are some improvements worth breaking reverse-compatibility for, and removing the GIL is such an improvement. Whether the changes in Python 3 were worth making breaking changes for is debatable: certainly I don't see "print" being a function as particularly valuable. But the flipside is that the 2-to-3 transition was overblown by a vocal minority. I've transitioned more than 5 codebases from 2 to 3, and in most cases, there were few problems. Most problems were with codebases where previous developers had pulled in libraries for everything, resulting in an amalgamation of abandoned libraries, but these codebases run into problems even without the core language breaking compatibility. The answer isn't to flame your language into never breaking compatibility, it's to not import all of pip and expect that to be a sustainable strategy.
The situation we have now is that the steering committee has received so much heat from the vocal minority that they're terrified to make breaking changes. But removing the GIL should be a breaking change. It's too fundamental to how Python works to not be. So they're trying to remove the GIL and make it not a breaking change, which is a bad idea, because it is ultimately going to be a breaking change. It would be much better to admit this is a breaking change and start working on the transition plan, than to try the impossible task of making it not breaking because you're too terrified of your users to admit the truth.
We've already seen this in Python 3.11 which broke code in my codebase. The changes to fix the breakage weren't hard, but I would have liked better communication that this might happen. But I also understand why this was hidden in a deprecation warning in a minor release rather than publicized, because the Python team is probably tired of being flamed for making breaking changes.
2. The more fundamental problem here is that a lot of other features of Python were built around the GIL. Most obviously, the async paradigms makes sense largely because of the GIL. Sans-GIL, it looks like in retrospect a send/recv actor model a la Erlang would have been a much better way forward. It's not really possible to reverse this, and this might be pushing Python toward a less cohesive set of features that don't really make sense together. This makes it feel like this is too little too late.
Thank you so much Python core developers and steering council. Python is one of my favourite languages along with Java and C.
I greatly welcome true multithreading in Python.
I use both multiprocessing and multithreading in Python for different projects. See [0] for my multiprocessing example and python Threads for IO heavy tasks in [1]. But it would be far more efficient to use true threads.
Threads can communicate any amount of data in a single atomic almost instant operation. Using the local loopback interface or multiprocessing or pipes, this is not possible.
I am working on a multithreading architecture I call three tier multithreading architecture
With PEP703 you would compile Python either for multi or single-threading mode. The mode affects the ABI and therefore which C extensions are available. Eventually all C extensions would have an available port to the new ABI.
The chosen solution is similar to how PHP used TSRMLS_ macros in the Zend engine - if threadsafety (ZTS) was #defined, all functions took an extra thread context parameter, breaking ABI.
The main benefit of the threadsafe builds was reentrancy support for multithreaded web servers (e.g. IIS / some apache MPMs). They were only slightly slower for single-threaded code.
The new PHP 8 fibers are only coroutines on a single thread, but PHP has had fork/join since 2001 (!), which works pretty well with Linux CoW.
There has been a pthreads extension since about 2012. However keeping a 1:1 pthreads API prevents some optimization possibilities [1], so the new hotness is php-parallel [2], which will transparently copy closed-over variables to a subinterpreter.
All C extensions are available when running without GIL. A challenge for distribution is that all extensions will have to built twice to be compatible with the two Python builds. However, few source code changes are required where the developers don't want to make it compatible with running without GIL. Executing such extensions forces using the GIL for the whole interpreter, which is slower than the GIL-only build.
I do most of my performance coding in Numba which comes with a nogil mode. Still, I have been looking forward to this. The fewer layers we can have in our libraries, the better!
Why would you even want a no-GIL Python? Java and C showed how much more effort it takes to maintain slower thread safe code for no real benefit. Parallelize at the fork level or at the isolated numeric library level.
Exactly. I think a lot of the negativity about GIL comes from a misunderstanding about forking processes. If python is being used as a scripting language, and spawning other tools, you're already getting free multi-core.
A similar misunderstanding exists about SQLite and concurrency.. but that's a topic for another time.
Forking had a ton of its own downsides, it’s not a free lunch either, from poor ergonomics to communications overhead it works well for somethings and very poorly for others.
Show me someone who actually knows how threads work and what writing threading code entails who assumes that.
I am perfectly aware that threads are not free.
Just as I am perfectly aware that a context switch between threads is less expensive that switching a new process onto the core, and that IPC requires kernel involvement.
Except that the threads share the exact same virtual address space, and processes do not, which makes the thread context switch faster.
And that is to say nothing about the setup and teardown process, which for a process involves copy-on-demand'ing the entire memory, but for a thread merely setting up its own stack.
> Except that the threads share the exact same virtual address space, and processes do not, which makes the thread context switch faster
That's what I said. But it's really not much. I'm afraid we will need numbers now to continue the conversation. If I measured would you be open to changing your opinion? Or are you committed to this topic, so that it would have no bearing?
Well, I don't know how you'd test that, but you should really consider testing the other half of the post you're responding to which you ignored, because that's much easier to test:
Spin up and tear down a million pthreads in C, and see how long that takes and how much memory it takes. Then spin up and tear down a million processes in C and see your computer grind to a halt until you kill the process that is starting the processes, if you can even get your computer to do that without power-cycling.
It's <50 lines of code for each, so I'm eagerly waiting for your response!
Notably, my confidence here comes from the fact that I don't generally get into performance arguments without having actually tested what I'm saying. I've written this code before--it's what I do whenever I'm checking out a new programming language or threading library. Given the complexity of modern computers, nobody really can predict how a program will behave without testing it (except maybe in assembly) there's just too many variables. So you should stop doing that.
If you decide to try the same thing in Java (the other language mentioned), probably drop the number of threads/processes down to 100,000, since Java's lightweight threads aren't quite as efficient. 100,000 processes will probably still be enough to crash your computer.
I'm sure you can find some language/library which implements threads particularly inefficiently, so let's stick to pthreads/C and avoid that straw man.
EDIT: Here ya go, I had ChatGPT write this one for ya:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
void* threadFunction(void* arg) {
// Sleep for 10 seconds
sleep(10);
pthread_exit(NULL);
}
int main() {
int numThreads = 1000000;
pthread_t threads[numThreads];
// Create threads
for (int i = 0; i < numThreads; i++) {
int result = pthread_create(&threads[i], NULL, threadFunction, NULL);
if (result != 0) {
printf("Failed to create thread %d\n", i);
return 1;
}
}
// Join threads
for (int i = 0; i < numThreads; i++) {
int result = pthread_join(threads[i], NULL);
if (result != 0) {
printf("Failed to join thread %d\n", i);
return 1;
}
}
return 0;
}
And...
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int main() {
int numProcesses = 1000000;
pid_t childPID;
// Create processes
for (int i = 0; i < numProcesses; i++) {
childPID = fork();
if (childPID < 0) {
printf("Failed to create process %d\n", i);
return 1;
} else if (childPID == 0) {
// Child process
sleep(10);
return 0;
}
}
// Wait for all child processes to finish
int status;
pid_t pid;
while ((pid = wait(&status)) > 0);
return 0;
}
It looks like the latter just crashes the program without taking down my whole machine now, which is an improvement over the last time I tried this with processes.
I just tried your tests on Debian. For some reason the threads one was failing at about 3000 (probably my config), so I bumped it down for both. Here are the results:
$ gcc threads.c
$ time ./a.out
real 0m10.097s
user 0m0.035s
sys 0m0.239s
$ gcc process.c
$ time ./a.out
real 0m10.168s
user 0m0.579s
sys 0m0.347s
Were you running on something besides linux, or not natively? Is this something that degrades with the large numbers?
Also, spawning is not context switching. That's the overhead that matters. But according to your own test, spawning in reasonable numbers will be about the same.
> Were you running on something besides linux, or not natively?
Running on MacOS, but I've run this in Linux.
> Is this something that degrades with the large numbers?
The concern here is memory--once you push into pagefile your processes will become extremely slow.
> Also, spawning is not context switching.
Thank you obviousman.
> That's the overhead that matters.
Why do you think you know every use case? You don't. There are tons of use cases where having to be concerned about creating and destroying threads places a large burden on the developer.
> But according to your own test, spawning in reasonable numbers will be about the same.
You didn't run my test.
Running 3000 processes is a few orders of magnitude less than running 1000000, and you don't get to determine what "reasonable" is for every application that exists.
It's your own test, I don't need see why you need to respond in this manner.
> Running 3000 processes is a few orders of magnitude less than running 1000000
I didn't decide on the limit, my OS did. So they must think it's unreasonable.
> Running on MacOS, but I've run this in Linux.
Well your test works fine on linux. MacOS is not designed to run large multi-process server loads. Linux has specifically optimized forking and context switching for processes.
> Why do you think you know every use case?
Why do you think you can't handle most cases by using processes? The goal isn't for a tool to handle every use case, it's to handle a specified set of use cases well.
Python itself doesn't work for every use case.
Looks like we are safe to ignore overhead of launching processes for programs with fewer than 3000 threads.
> Thank you obviousman.
I wanted to compare context switch, you decided to measure something else. I am pleasantly surprised anyway.
> I didn't decide on the limit, my OS did. So they must think it's unreasonable.
> Well your test works fine on linux. MacOS is not designed to run large multi-process server loads. Linux has specifically optimized forking and context switching for processes.
Oh JFC, stop wildly speculating and pretending it's the truth. The test was a million threads/processes, and you ran 3000. By your own description you didn't run the test, period. I just ran it on Debian on a VPS and, you know, it did exactly what I said it was going to do, because gosh, this isn't the first time I've run this test on Linux.
And you seriously want to attribute this to Linux having optimized forking and context switching as if you have any idea what that means? Please do tell, which optimizations did they apply that somehow they've hidden from BSD and Apple?
Given your propensity to make things up when you don't know something, I'm beginning to think whatever problem you ran into with a million threads was fixable and you just didn't know how to, so you made up a new straw man test to try and win an argument. Have you measured the memory usage at 3000 yet, or are you still ignoring any part of reality that isn't convenient for your argument?
> Why do you think you can't handle most cases by using processes?
Where did I say that? Unlike you, I don't make generalizations about "most use cases" because I don't pretend to know what everyone in the world is doing.
> The goal isn't for a tool to handle every use case, it's to handle a specified set of use cases well.
What do you mean, "the goal"? You speak for every possible goal anyone using Python could possibly have now?
> I wanted to compare context switch, you decided to measure something else.
Then do it! I'd be interested to see the results, and even more interested to see how you tested it.
In any case, you can't just ignore tests you don't want to do or apparently aren't capable of. Being able to run a lot of threads can be extremely useful for networking applications, which is why I care. You don't get to decide my use case is "unreasonable" because you apparently can't compile a program that does a stripped down version of it.
That is to say, even if context switching is faster in processes than in threads, that doesn't mean there's no use to threads, because there are use cases where spinning up and tearing down is more common than context switching.
Please explain: In what sense is the overhead of starting actual OS processes, and relying on IPC "free", compared to running threads or even greenlets, and using shared process memory?
> In what sense is the overhead of starting actual OS processes, and relying on IPC "free"
With Python's current multiprocessing utilities - you get a big discount by not having to write thread-safe code, or worry about synchronization, despite the GIL still being there. Very broadly speaking, it's "free" in the sense that the OS handles parallelism automatically at the process-level, and provides a simple communication mechanism between those processes through standard APIs. It also reduces potential attack surfaces (although this is a lesser argument).
It's also "free" in the sense that you don't need to re-write large parts of the VM, as-well as all supported libraries, and teach the entire Python community how to safely write and test multi-threaded code (something I bet upwards of 75% of the people using Python today won't manage) to support this specific form of parallelism.
If your goal is to run code (whether IO bound or CPU bound) in parallel - Python has the means to do that already today, without removing the GIL.
> It's "free" in the sense that you don't need to re-write large parts of the VM
In that sense, never updating python again is "free" as well, because it would save the python devs the trouble of changing the interpreter. And yet I think we can all agree that Python benefits from the fact that we no longer use Python 3.5
> and teach the entire Python community how to safely write multi-threaded code
People who don't write threaded code don't need to worry about it. And people who write threaded code in python already need to worry about writing thread-safe code. The GIL doesn't prevent race conditions between individual python instructions.
> Python has the means to do that already
And as outlined above, these means are no suitable replacement for true thread based parallelism.
> "And as outlined above, these means are no suitable replacement for true thread based parallelism."
Your only argument is that "thread-based parallelism can't be achieved without threads", but that's not relevant to the conversation whatsoever.
The fact of the matter is that Python (already today) allows you to achieve parallelism across both IO-bound and CPU-bound workloads.
For CPU-bound workloads, the number of threads you can run in parallel is bound by the number of cores you have. For IO-bound workloads, your threads are just waiting on interrupts.
What are concrete use-cases where thread-based parallelism in Python is so desperately needed right now, that can't be achieved through process-based parallelism?
I'll give you one: real-time latency/throughput-sensitive DSP. Think real-time audio processing, or real-time algorithmic trading. Python isn't used there to begin with (for an entire flurry of reasons) - GIL or no GIL.
> The fact of the matter is that Python (already today) allows you to achieve parallelism across both IO-bound and CPU-bound workloads.
I also don't need to use goroutines. I could simply spin up my golang application as a couple of processes, and use pipes and other IPC to coordinate them.
"There is another way to do X" doesn't imply that other way is better.
I never said that multiprocessing is "better" than multithreading.
Both are mechanisms to achieve parallelism with different pros/cons, and there are legit cases where threads are a necessity that can't be satisfied with processes (ref my previous comment with examples). This conversation would've been much easier to have given specific constraints and examples (which nobody seems to give).
Within the context of a general purpose VM which was never built to support multithreading (CPython), and which carries the baggage of 30+ years' worth of 3rd-party packages and libraries which were never built with multithreading in-mind - I think we can both agree that the costs and risks of changing literally everything may outweigh the benefits. It's not a difficult stance to accept.
My main argument, if you distill it down to the abstract, is "use the right tool for the job" - and stop pretending that a hammer and a knife are the same thing, when they're not.
If you need hardcore, ultra efficient and parallelized workflows - Python is just the wrong tool for the job period. This isn't up for debate - it's a given fact. This isn't just about the GIL - it's about the type system, the frameworks, the bloat, etc. There's nothing wrong with Python being this way - Python fills a niche of its own incredibly well, something which others tools suck at, and Python is loved by millions for it.
Python is used today by certain demographics to perform certain jobs - and it excels really well at those jobs for those demographics. This whole discussion now (GIL vs. no GIL) is about bending and twisting Python to fit the use cases of few companies like DeepMind or Facebook - who I'm certain represent a miniscule usage compared to the millions of students, schools and universities, research institutes, web shops, hobbyists, tinkerers, etc. Those people want to get shit done quickly - and mutexes, sempahores, events, threads, synchronization primitives, atomics, etc - will do nothing but make their lives a misery, and drive them away.
Also, I keep asking you for concrete examples where the Python community absolutely needs multithreading (where multiprocessing fails) and you're not really responding which makes me feel like we're not conversing here...
I completely agree with this position. If you need an optimized paralyzed code to chew through a tough workload, write that part in C, and spawn it from python. Python is a coordinating scripting language.
> I never said that multiprocessing is "better" than multithreading.
Maybe not, but you seem to be implying that multiprocessing is a sufficient for the most case (or the average case ?) however ill define that average case is.
> This conversation would've been much easier to have given specific constraints and examples (which nobody seems to give).
Not quite, there is no need grand example here, processes vs threads is just a function of how chatty the interaction between the independent agent are. We should have both mechanism available and let the user choose.
> My main argument, if you distill it down to the abstract, is "use the right tool for the job" - and stop pretending that a hammer and a knife are the same thing, when they're not.
You don't get to decide in the abstract what the right tool is for other people and context we don't know anything about. The people pushing for proper threading are telling you that for them python + better threading is the right tool.
> If you need hardcore, ultra efficient and parallelized workflows - Python is just the wrong tool for the job period. This isn't up for debate - it's a given fact.
Maybe; But that's not the point here. Proper threading in python doesn't make python more efficient, but it makes the * scaling * of python performance more efficient. Those are two differents things.
> Python fills a niche of its own incredibly well, something which others tools suck at, and Python is loved by millions for it.
> This whole discussion now (GIL vs. no GIL) is about bending and twisting Python to fit the use cases of few companies like DeepMind or Facebook
> who I'm certain represent a miniscule usage compared to the millions of students, schools and universities, research institutes, web shops, hobbyists, tinkerers, etc. Those people want to get shit done quickly - and mutexes, sempahores, events, threads, synchronization primitives, atomics, etc - will do nothing but make their lives a misery, and drive them away.
This seems to me like the core of the problem, somehow you say that there is no valid need for threading, but at the same time the people needing threading are not important enough for python to care about... You have the magical ability to be plug into the python hivemind and decide which use case are valid, what are the true value of the python community and what python should and should not consider important... And after all that mental gymnastic you expect us to jump through hoops to convince you otherwise...
That's a lot of non technical assumption for a technical problem.
> Those people want to get shit done quickly - and mutexes, sempahores, events, threads, synchronization primitives, atomics, etc - will do nothing but make their lives a misery, and drive them away.
Not really. If anything, they will benefits from better libraries and more efficient C-extensions.
> absolutely needs multithreading
Nothing is absolutely needed. Even something are core as a type system is not absolutely needed. The fact that you ask for this level of qualification for a feature that pretty much every other language has is the problem.
The problem is not that deep, implementing multi-threading is not that complicated. I do agree that we need to be careful to provide a better API than raw-thread to end users (which no-gil is COMPLETLY agnostic about).
> "you don't get to decide in the abstract what the right tool is for other people and context we don't know anything about"
So which one is it? Do I get to ask for concrete examples for why this work is worth all the trouble, or should we just pretend like it doesn't matter because you say so?
"I want threads, threads fast, others have threads" is not good enough.
> "somehow you say that there is no valid need for threading"
I've never said that. All I've been asking for are concrete examples that would justify removing the GIL and the impact it would have on the ecosystem. You won't provide any, despite me asking over and over again.
> "You have the magical ability to be plug into the python hivemind and decide which use case are valid"
CPython has had a GIL for the past 30 years - and things worked out just fine. There are other Python VMs that don't have a GIL - yet CPython remains the most widely used and popular VM. So how about you stop putting words in my mouth, and start actually justifying the asks beyond hand-waving my arguments away?
> "The fact that you ask for this level of qualification for a feature that pretty much every other language has is the problem."
Again, you're twisting and putting words in my mouth. There's a clear difference between building an ecosystem with multithreading in mind from day one - and suddenly introducing it out of nowhere, 30 years into it being used by millions of workloads globally. This is why I'm asking for "qualifications" (which you won't provide), and this is why I also deem your assertion that "the problem is not that deep" to be a proof beyond any that we should end this conversation now before things get too embarrassing for you :)
> CPython has had a GIL for the past 30 years - and things worked out just fine.
No, they did not.
That's why the discussion about the GIL is about as old as Python3 itself.
The GIL has always been a major drawback of python, moreso because the language does in fact have threading support...only it can't use threads for parallel workloads.
This drawback was tolerated, because of the many advantages Python brings to the table, and because Python comes from an age when Moores Law was still in full effect; Powerful single cores were the norm not so long ago.
This isn't the case any more. Moores Law is done. Now we increase the number of cores, and languages that wish to remain relevant, need to reflect that.
> So which one is it? Do I get to ask for concrete examples for why this work is worth all the trouble, or should we just pretend like it doesn't matter because you say so?
Both ?
1 - There is no need for a grand example to justify the need for threading in a modern language, as we have 20/30 years of background on that topic. To repeat myself, it's all about the amount of communication between compute agent... The more communication you have , the more the process isolation/serialization cost become a problem.
2 - You yourself understand that there is are valid use case for threading, somehow those valid cases are not valid python uses. That's where the disconnect is. You don't get to say by decree what is and is not a "valid" use of python.
> "I want threads, threads fast, others have threads" is not good enough.
Neither is i don't use thread , you shouldn't use thread.
> CPython has had a GIL for the past 30 years - and things worked out just fine.
That's what the no-gil people are trying to tell you, no it's not fine and it was never fine. Effort and conversation about the replacement of the GIL are at least 10/15 years old.
> yet CPython remains the most widely used and popular VM
Faulty logic , the correlation doesn't imply any causal relationship.
> and start actually justifying the asks beyond hand-waving my arguments away?
Because you aregument are not really argument, they more like strong opinion on things that are closer to esthetics and right/wrong usage of things. Happy to disagree on those one.
> There's a clear difference between building an ecosystem with multithreading in mind from day one - and suddenly introducing it out of nowhere, 30 years into it being used by millions of workloads globally.
1 - no-gil isn't out of nowhere. Conversation about this are more that 10 years old. Combined with even longer conversation in other VM/programming language.
2 - no-gil doesn't introduce threading in random workload. It allow people who want threading to use threading.
> This is why I'm asking for "qualifications"
It's your prerogative to ask for qualifications. But it's also our to decide to judge your ask and decide if they are worth our time.
Much in the same way that if feel like we don't need (in 2023) a grand example to justify why we need to add a type system, we don't need a deep conversation about threading. We all have the same information, understand the trade-off. We just have different value system and want different things out of python. And that's okay...
> the problem is not that deep" to be a proof beyond any that we should end this conversation now before things get too embarrassing for you :)
I think i have some comment somewhere explaining why no-gil was never a technically challenging problem.
But the prof is simple... Sam Gross is definitely an exceptional dev, but the fact that a lone programmer come out with an acceptable solution is proof that the problem wasnt that deep.
> Within the context of a general purpose VM which was never built to support multithreading (CPython),
How do you figure that? Even python2 already supported the usage of OS threads [1].
> 30+ years' worth of 3rd-party packages and libraries which were never built with multithreading in-mind
Many of these packages also don't use other forms of concurrency, but are simply encapsulated functionality that runs in a single thread. Meaning, they will not be bothered by the change.
Besides, as I have mentioned elsewhere, library maintainers always need to keep up with the development of the underlying language as well as usage patterns of the community, or their libraries become obsolete. That is true no matter what programming language we talk about.
> If you need hardcore, ultra efficient and parallelized workflows - Python is just the wrong tool for the job period
Python is already used as an orchestration language for huge numerical workloads, be it data science or machine learning. It is simple, intuitive and has by far the largest library support of any contemporary language.
There is simply no good argument, why the language that we entrust to orchestrate this scale of computing power, shouldn't itself be as efficient as possible for a dynamically typed script-language. That this is absolutely possible, is demonstrated by languages like Julia.
The fact that Python will never be as fast as Go, Rust or C++, doesn't change that.
> Those people want to get shit done quickly - and mutexes, sempahores, events, threads, synchronization primitives, atomics, etc - will do nothing but make their lives a misery, and drive them away.
Those people will for the most not even realize that the GIL is gone. If they write...
- single threaded synchronous code
- asyncio based code
- multiprocessing code
...the change doesn't matter to them. The hobbyists small webserver, or the medium companies Flask-based webapp will still run as before. And if they write threading code, and do so correctly, then it is very likely the only change they will see, is that suddenly their application runs faster under high load.
The removal of the GIL neither takes away existing capabilities from Python, nor does it force everyone to write threading code.
> and you're not really responding which makes me feel like we're not conversing here...
That's because I have done so elsewhere in this thread already [2]
Supporting thread based parallelism is the norm among all mainstream PLs with the one inglorious exception of JS, which doesn't because it simply can't.
And Python already does support it, it simply is limited by a legacy design decision. That was okay in a bygone age when fast single core machines were still the norm, Python was primarily a scripting language for when bash wasn't enough, and most relevant webapps were IO bound.
Today, there simply is no excuse any more. Python is the most ubiquitous language in the world, and running scaling web applications, is the lingua franca of ML, and orchestrates huge systems. Servers have hundreds of cores, and CPU bound workloads become ever more important.
It's about time Python rids itself of that needless limitation.
It can't for the same reason python can't - every part was designed without it in mind.
> Python already does support it
The language runtime might have it in a branch, but the vast majority of C code it is based on, and the scripts themselves assume otherwise.
> fast single core machines
Once again, if you are using python as a scripting language, with C libraries, and spawning processes, you are already utilizing multiple cores without any adjustment on your part.
> CPU bound workloads become ever more important.
If that's true, then you shouldn't use python. It's about 100x slower than C. How much performance can you squeeze out of dividing work into cooperating threads, that can't be easily achieved by having multiple processes.
In the "web application" example you are using, the norm is already to have many processes to handle incoming connections.
Interesting, care to explain then why Python supported threading since Python2? [1]
> Once again, if you are using python as a scripting language
> If that's true, then you shouldn't use python.
Once again, I don't. I use it as an orchestration language calling other code, and there is no good reason why the orchestration language should have an arbitrary bottleneck.
Yes, the hot code isn't written in Python. That doesn't matter to this discussion.
> In the "web application" example you are using, the norm is already to have many processes to handle incoming connections.
Outside of the python world, it absolutely isn't. I also have numerous Go based webservices, and they don't have to jump through ICP hoops to facilitate communications between workers and services.
> "It's about time Python rids itself of that needless limitation."
I want to correct one thing that I see plastered all over this thread.
The GIL isn't a programming language construct. It's an implementation detail. The GIL isn't a "Python limitation" in any way, because it has nothing to do with Python.
CPython (aka. Cython), probably the most popular and widely used VM for Python, was built around a GIL.
There are other VMs out there that don't have a GIL (ie. Jython, IronPython) and can be used out of the box.
It's an implementation detail of CPython. Which is by far the most common Python interpreter. And it limits parallelisation via threads in Python, a language that otherwise natively supports threading.
def thread_function():
# thread may lose core here
value = store.get_value() # or anywhere inside get_value
# or here
update(value) # or anywhere inside update()
# or here
store.put_value(value) # or anywhere inside put_value()
The GIL only makes certain internal functionality atomic. It doesn't protect the implemented logic from causing a race condition.
So unless I protect store with a lock, I can already get a race condition, GIL or no GIL.
The user you're replying to is saying that the GIL is preventing multiple threads from executing Python bytecodes at once (preventing some classes of race conditions and ensuring thread safety). They are absolutely correct.
The GIL doesn't solve or prevent all classes of race conditions (which can stem from complex interactions with databases, the filesystem, etc). Removing the GIL though, will only make things worse from that perspective - making the language more difficult to work with (for non-technical people).
This is exactly why Python should be further simplified, and not made more complex - given the population which uses it most.
IPC is a PITA, and orchestrating processes is even worse.
> or at the isolated numeric library level
Not everything I want to parallelise in python runs in numpy. Simple example: WebService Backends. I have a 64 core server running a Werkzeug/Gunicorn application. The Service is mostly doing CPU bound tasks (data aggregation and analysis), so asyncio is pointless.
What happens is, it runs 60 worker processes. Which puts hefty limitations on any crosstalk and data sharing, because these either require IPC, or using redis/sql. Which are nowhere near as performant as actually shared memory would be.
Exactly my thinking, using more cores is exactly your use case and will hopefully make Python superb at large scale data processing with cross-communication.
I think, rightfully, the concern is people who will try to use this incorrectly causing major bloat to CPython.
If you want to use async, this change doesn't affect your code.
If you want to use multiprocessing, this change doesn't affect your code.
Even if you already use threading, and do it correctly, this change doesn't actually affect your code.
So who is "drawn into this"? And please don't say library developers. a) Having to update libraries to have them remain relevant, is normal procedure, in all languages. b) a lot of the people who want this to change are library devs.
You do realize that this grace period is intended precisely to make it easier for libdevs to adapt, yes? 3rd item in the list in the linked article, quote:
"We also need to bring along the rest of the Python community as we gain those insights and make sure the changes we want to make, and the changes we want them to make, are palatable."
End quote.
Yes, library developers have to keep up with developments in the underlying language as well as changes in usage patterns by the community. That is true for all programming languages.
And as I have pointed out numerous times before, this change is on the wishlist of many libdevs in the python community.
No, for C/C++ you don't have to do anything unless the compiler writers add yet another new warning to -Wall -W, which you have to disable because it is spurious.
Java is more backwards compatible, so is Common Lisp.
Someone here said that the thread on the Python "discussion" forum was shut down. That does not sound like everyone except for the proponents is supposed to be heard (or is even aware of the discussion).
Even for such stable languages, a library maintainer has to, at the very least, patch security problems as they are discovered.
> That does not sound like everyone except for the proponents is supposed to be heard (or is even aware of the discussion).
There is an official poll among the Python core devs, linked in the article, which shows overwhelming support for the change. Since they are the ones who have to work this out, that's the only discussion about this that is relevant.
Python gets an absolute kicking for being too slow relative to basically everything else and that performance characteristic is partially attributed to the interpreter lock. Misattributed in my opinion, but there we are.
Yep I think over the next year a lot of python devs are going to learn threading isn't magic pixie dust that makes your code fast, and in reality is starts by making your code very unstable.
I'm afraid that's exactly what will happen. Unfortunately the overarching sentiment will not be "multithreading is hard" but "Python has become really hard to work with"
We are working with a huge Go and Python codebase and Python is just a pain in terms of using all system resources. We moved many parts to C++ which are called and handled by goroutines. The outcome was a big success.
This proposal/change is a big step forward, especially for the deep learning community.
Quote: "In PyTorch, Python is commonly used to orchestrate ~8 GPUs and ~64 CPU threads, growing to 4k GPUs and 32k CPU threads for big models. While the heavy lifting is done outside of Python, the speed of GPUs makes even just the orchestration in Python not scalable. We often end up with 72 processes in place of one because of the GIL. Logging, debugging, and performance tuning are orders-of-magnitude more difficult in this regime, continuously causing lower developer productivity."
Quote: "We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers."
This requirement could have been well served with a gil per thread and arena based (shared) object allocation model. Every other use case would have been unaffected.
Now we change the world for everyone and put most of library developers through a valley of desperation for 5 years+, just so that a very few narrow use cases get the benefits they want.
Good point. Did the Meta and Deepmind devs really miss this?
I try to avoid python as much as possible, because I mainly work with Go & C++ and multi-threading with those languages is just better (imho). Bringing python a step forward and making it future proof might be a good thing... Even if this means to break some things? Not sure if dismissing the GIL is the right step, but there is a big performance gap to fix. Or maybe the AI community must move to a better suited language? Having python code in production just feels so wrong. Especially if a rewrite in another language shows the performance gap.
The PEP notes subinterpreters as an alternative and says it can be considered a valid approach to achieve paralleism. However it does not discuss why nogil was given preferences. I guess that's ok because the PEP is about nogil.
I'm not sure whether the SC has considered alternative approaches but it would be surprising if not
The use cases of the ML and AI world are very important though, as they massively contribute to Python's popularity. Thanks to Python, researchers and developers don't have to use different languages and library ecosystems for developing and scaling models.
Alas, subinterpreters sound like they could be a feasible solution for many use cases as well.
What an ignorant comment. No real benefit? Really?
The website you're using likely benefits immensely from thread-safe code. The browser you're using benefits immensely from thread-safe code. Your entire user experience using the modern internet is in an ecosystem of thread-safe code which you apparently are completely oblivious to.
There are a lot of threading models besides Java and C's as well, which you're apparently also unaware of.
Why would you assume you know all the possible use cases of a multi-purpose programming language?
PYTHONGIL is an awkward tri-state. 0, 1, and unset all do different things. Wouldn’t some self-explanatory strings be better? PYTHONGIL=auto for the default, force-gil and force-nogil for the forced modes.
Python could have been the one language with a sane multithreading model. Now it risks becoming a second version of Java. I fear this will make it a less attractive programming language, not least because it might lose its beginner friendlyness. For example, without the GIL a lot more care must be put into designing your programs. This can be true even though your own code is single threaded, for example when you use a library that is multithreaded and has callbacks to your code
Why?
Free threading as introduced by PEP 703 is well known to be errorprone, hard to get right and generally advised against, unless you know exactly what you are doing. In other words free threading is for expert (as in very experienced) use only. And Python already has an expert mode - called Cython oer Numba to name just two.
Personally I can see no good will come from bringing free threading to the masses. Yes it addresses a common critique (by many) and need (by very few), but it addresses it in a very risky way (for the vast majority of Python users).
A better alternative
IMHO the far better and still my preferred approach would have been to favor a per-thread GIL with an explicit mode to share particular objects. This would benefit everyone without the risks. It would be consistenly beginner friendly, and above all, offer a safe path to concurrent programming without impacting the whole ecosystem. Heck we could even call it the "Pythonic Threading Model", and it would be seen as a differentiator.
The GIL does very little to protect unexperienced users. It's still really easy to run into race conditions, for example, if your thread gets scheduled out in any multi-instruction operation (this is more common than you think [1]). In general, Python code still has to be thread-safe; you get the risks without the benefits.
If you don't care about CPU performance, instead of threads you should go for an event-loop approach (see asyncio in Python). As soon as you have threads (on a language-level, not implementation-level), there is some notion of implicit switching, and you run into issues. So, the language you're looking for is JavaScript, which is single-threaded and every context switch is explicit (in form of `await` or `yield`).
I don't think the blog you cite backs up your claim of "it's still really easy to run into race conditions".
The author literally says: "This was actually pretty hard to discover. The first few experiments failed, because Python is pretty smart about when it runs each thread."
But I think the main way the GIL protects inexperienced users is that its presence has the effect that Python code using the threading APIs is very uncommon, since those APIs don't currently provide parallel computation (only parallel IO). So inexperienced users are protected due to 99% of Python programs not using multithreading as a result (and so likely the one they're developing also doesn't), whereas this will presumably change when the limitations imposed by the GIL go away.
> I don't think the blog you cite backs up your claim of "it's still really easy to run into race conditions".
> The author literally says: "This was actually pretty hard to discover. The first few experiments failed, because Python is pretty smart about when it runs each thread."
I completely disagree here with your reading. Rare race conditions are much worse than races you trip over frequently.
More precise wording would've been: "It's still really easy to write code that runs into race conditions". The race conditions are rare, but code that (rarely) causes them isn't.
First, I don't see how the GIL would have much to do with free multithreading. The GIL should not have much observable logical impact on multithreaded _Python_ code. It should not make it more or less susceptible to race conditions. It's only practical impact should be slowness.
Second, your proposed "one GIL per thread" is pretty much the equivalent of the current state of multiprocessing. In that you fork your current interpreter state in an other thread with it's own GIL and start from there. This has been used for decades already, nothing new there. Sharing can be done through queues or shared memory.
Yes, used for decades - and for good reason and benefit.
No, no the same thing. Sharing objects between processes is not easily achieved for various reasons (at least in Python). It would be easier to get it in multithreading with an arena based allocation model where objects live in a shared or non-shared area of memory.
Also it's not my idea. I am just advocating it as the better model for Python to advance to. It would also not take years to implement and the risks are minimized as there is full compatibility with existing code.
Two threads sharing state through a common heap, or two processes sharing state through a shared memory is pretty much indistinguishable, at least on Linux.
The question is not multithreading or multiprocessing anymore, the difference to me is more semantic than real.
The question is then just how these threads/processes communicate.
I would argue that shared mutable state is rarely a good idea, and an equivalent message based system is often preferable.
For the few use cases that remain where you would want shared mutable states, as mentioned in my original answer, Python has shared memory support, though with non-built-in types. Improving these shared types should be the only thing you advocate for.
Right now, you often get stuck on using multiprocessing because you need to share one object that is unpickleable and can't be sent over queues.
Improving shared types would be a great addition. Problem is the reason for objects being unpicklable in the first place, is often a more underlying thing that can't be shared across processes, like open file handles.
> two processes sharing state through a shared memory
The point is that isn't very practical, especially in python. In practice, any data passed between processes must be serialized and deserialized over a pipe or in shared memory. With a thread per interpreter, native python objecta could potentially be shared across threads/interpreters as long as there was adequate synchronization. Technically, you could also do that with shared memory with multiprocessing, but it would be harder, and you would need to specify the object is intended to be shared from the beginning.
> pretty much indistinguishable, at least on Linux.
Linux isn't the only OS people use python on.
> I would argue that shared mutable state is rarely a good idea, and an equivalent message based system is often preferable.
I agree, but even for message passing, having to serialize and deserialize the object across process boundaries hurts performance.
It's just that (some) people really want multithreading. And with that Python had the rare chance to say, ok you'll get it, but we will make it safe by default.
Now they said ok you get it, even though it's a big risk, we hope it'll work. If not, we'll take it back.
> It would be easier to get it in multithreading with an arena based allocation model where objects live in a shared or non-shared area of memory.
That still relies on IPC, and the associated kernel overhead to set, ctl and access the shared memory, IN ADDITION to the kernel overhead of having to do a context switch between processes.
Threads only rely on the context switch.
> I am just advocating it as the better model for Python to advance to
> That still relies on IPC, and the associated kernel overhead to set, ctl and access the shared memory, IN ADDITION to the kernel overhead of having to do a context switch between processes.
In multithreading there is no need for a context switch. Also memory is automatically shared by all threads. The arena model can be completely managed by Python.
I don't get why the SC has not chosen this. Well, I do get it, but it don't like what it implies.
Indeed, multiprocessing with pipes / queues and sometimes areas of shared memory is a pretty sane way to do parallel processing, when more than one CPU core works for you at once. It's pretty ergonomic, and it's largely equivalent to Node's workers.
It has nothing to do with multithreading or GIL, though, and happily works without threads and with GIL in place.
It’s virtually impossible to do free threading safely, especially with large codebases developed by multiple people. This includes tiny Python scripts that pull in a bunch of dependencies.
It’s like saying that C is a safe language, just “get good” at it.
There are safe alternatives such as structured concurrency.
Rust is a brilliant lesson in using traditional threading safely. It uses & for thread-shared types and constrains &mut to a single thread, which naturally causes people to keep single-threaded data on an object only accessible from a single thread, and make multithreaded data either immutable, mutex-protected, or atomic.
Alternatively, message-passing isn't traditional threading, but Erlang/Go-style languages are another way to approach concurrency or parallelism.
In my experience with multi-threaded programming, C++ code with "careless sharing issues" is often filled with multiple threads accessing the same object and relying on convention to avoid calling the wrong thread's methods, pervasive data races and unsynchronized variable access, mistaken use of mutexes on only one side of shared memory, and logical race conditions that require adding mutexes (risking deadlock) or rewriting code to address. Whereas Rust code tends to not have these issues to begin with (outside of the implementation of synchronization primitives), store reader and writer methods on separate handle objects, use Arc to manage cross-thread shared memory, and similar which makes the code either correct or tractable to learn and make correct.
I also struggle to understand the threading model of COM and C libraries like libusb (https://libusb.sourceforge.io/api-1.0/libusb_mtasync.html), though that might just be me, and each library tends to have a different threading model. Rust's Send/Sync is a 90% solution which you can learn upfront, is checked by the compiler, and applies to all libraries and works for most use cases.
> In my experience with multi-threaded programming, C++ code with "careless sharing issues" is often filled with multiple threads accessing the same object and relying on convention to avoid calling the wrong thread's methods
Right, I can see the kind of codebase you're referring to.
I don't see Rust as a magical weapon solving concurrency issues though. Namely because Rust (the compiler) has a very limited view of what happens in the lifetime of a multithreaded system, and no view at all of the lifetime of a multiprocess system.
Even when writing purely single threaded Rust, you quickly end up having to let go of the strictly static memory sharing checks and switch to dynamic ones.
I have yet to find a use case where Rust solves anything but the most blatant synchronization issues.
> IMHO the far better and still my preferred approach would have been to favor a per-thread GIL with an explicit mode to share particular objects.
You just described Ractors in Ruby, which didn’t turn out great. The setup cost for either freezing or copying memory to the target ractor to guarantee memory safety is often higher than the perf gains of the parallelism.
Not that it can’t work or won’t be improved. But there is a real world case study of what you’re recommending that we can reference without having to guess.
>Free threading as introduced by PEP 703 is well known to be errorprone, hard to get right and generally advised against, unless you know exactly what you are doing.
Wouldn’t that require running an interpreter in each thread? How on earth could that be a “sane multi threading model”?
>offer a safe path to concurrent programming
Uh, doesn’t Python support this already? Python has “concurrency” from async and parallelism from multiprocessing. What it doesn’t support is thread-based parallelism. What you’re suggesting (if I understand it) is an implementation of parallelism that is barely different at all from the typical Python parallelism approach of using multiprocessing to achieve parallelism.
> a per-thread GIL with an explicit mode to share particular objects
This is like Ruby's Ractors and I haven't really seen that be super successful so far. The "Objects" that need to be shared are things like class definitions, etc.... there are a ton of subtle issues with objects that are being marked as sharable when they should really not be or vice versa.
That's not quite true. For example, most operations on lists and dicts are thread-safe in the current version of python. You can't say that about languages with true multithreading.
The difference is that it's a language feature in python whereas the java equivalents had to be written with locking or other approaches for handling concurrent access.
I want to agree with you. My hope is that PEP 703 can lead to better implementations of even your safe / structured concurrency model. It will after all still be possible to use, it just won't unfortunately be the only option.
Just like you can today spawn threading.Threads one by one in python, or just map a bunch of function calls over a threadpool using concurrent.futures thread pool.
It's simply not "sane multithreading" not being able to run Python (byte)code concurrently. It's just a huge annoyance.
> because it might lose its beginner friendlyness
Python is not beginner friendly. It has one of the worst documentations out there. Also it does things very different compared to other languages (C#, PHP, Java, JS, ...) - I would advise anyone against learning Python as their first language, while Python is my favorite language.
> And Python already has an expert mode - called Cython oer Numba to name just two.
CPython is the "official" Python. How is it beneficial to depend on some 3rd party projects? I don't want to use such projects.
As to the PEP itself: It's optional, with the GIL being enabled by the default. So what's the problem...
> It's simply not "sane multithreading" not being able to run Python (byte)code concurrentltly
Please read my post again. I am advocating a GIL per-thread model with an explicit feature for sharing selected objects. This allows for all cores concurrency, essentially like free threading yet with safeguards. I like to call that a sane way because it builds on decades of research and industry experience of the software engineering community.
> As to the PEP itself: It's optional, with the GIL being enabled by the default.
It's not optional if you build tools and libraries that need be able to run with both gil and no-gil.
> I am advocating a GIL per-thread model with an explicit feature for sharing selected objects
Is there a PEP or something written about it? I don't like this idea at first glance. Feels like another hack. Though I have to admit it sounds better than the annoying multiprocessing approach.
> It's not optional if you build tools and libraries that need be able to run with both gil and no-gil.
Why do they need to? A library dev is free not support one or the other? If a library then does not provide the preferred mode of the user, then that's bad, but that's life. Ideally most libraries will support a gil version (that's still the default anyway!), but provide a no-gil version as a bonus. For example, Pytorch could provide thread-based data loaders if the no-gil version is in use.
>> I am advocating a GIL per-thread model with an explicit feature for sharing selected objects
> Is there a PEP or something written about it? I don't like this idea at first glance. Feels like another hack. Though I have to admit it sounds better than the annoying multiprocessing approach.
It's basically becoming reality already, PEP 684 per-interpreter gil is the required structure for this (coming in Python 3.12), then only the interface on top of that remains to be exposed, see WIP like https://github.com/jsbueno/extrainterpreters The full interface to separately-locked interpreter threads is coming in <some future version of python>, independent of PEP 703 work.
* It's hard to program correctly using the thread model. And saying that locks are the solution to thread safety issues in Java, is like saying that malloc/free is the solution to memory safety issues in C.
* Threads are also relatively fat. You'd probably think twice about spinning up 1000 of them.
* Futures came in with Java 8 and are lighter weight, but have certain implementation defects - can't cancel them for example. Although there are some instances in which regular threads aren't cancellable. Also, the API isn't terribly popular. I happen to like flatmapping my way around a codebase, but plenty of devs' eyes glaze over when they read that code.
* Fibers are released I think - at least in beta. More like JS's model I believe. They're even lighter weight and get rid of the flatmappy stuff again. I assume it's still up to the programmer to get the locking right.
If I remember Guido van Rossum did mention the status of the GIL in one of the Lex Friedman episodes [1] he was on (it's been a while, so I may be misremembering). Surprised to see a big decision like this happen so quickly. Did Meta's announcement play a big role in this [2]?
It sure did. Also the decision was pretty much rushed through, if you follow the latest discussion (which got shut down on the notion that everything has been said and trust the SC and coredevs).
Removing him as BDFL was probably the best thing to have happened to Python. He never prioritized performance as a top priority, at least not the same way Lua, JavaScript and Java did. Even Ruby has a JIT now.
> Removing him as BDFL was probably the best thing to have happened to Python. He never prioritized performance as a top priority
That doesn't follow; you're assuming that Python should prioritize performance as a top priority, which is very much not a given. Python has always excelled at being easy to use, being flexible, being a great glue language - but performant? An interpreted, dynamically typed language? That's like making a C interpreter - you can do it, but that doesn't make it a good idea.
His leadership & decision arbitration is dearly missing. Now we have decide by committee and as a result we see Python being pushed and pulled in all directions.
This seems somewhat delayed, and it may be considered too little, too late. Python community had the chance to leapfrog and embrace alternative concurrency abstractions, such as go routines etc, but it appears that this opportunity was not fully utilized.
After enduring the arduous process of migrating from Python 2 -> 3 and navigating the complex world of dependencies, my hope is that we won't encounter another nightmare of dependency management, forcing users to choose between GIL and no-GIL builds.
Is it really too late to not do this ? The only reason to get rid of the GIL is to help threading, but that's not a thing we should be doing. Threads need to just die, and be replaced by something less idiotic. Seriously, having the CPU run fragments of your program at random, so that all the previously ordered pieces are now contending with each other and even themselves ? How can anyone not see that this is the stupidest idea in the world ?
In Python, asyncio and multiprocessing packages can get nearly the same or better performance for IO- and CPU-intensive tasks respectively as no-GIL multithreading (and are more performant than GIL multithreading), with only a tiny fraction of the pitfalls. For any use case where the last few percent matter, consider not using Python (which will be much much more significant).
Regardless, we did somehow end up here, and there's plenty of multi-threaded Python code that would benefit from no-GIL, so I support the proposal just from a practical perspective. But when designing a new codebase, you'll almost almost almost always want to avoid Python threads, even with no-GIL.
...is useless for CPU bound tasks. The event loop uses only one core.
> multiprocessing
...relies on IPC and running actual system processes, both of which have alot more overhead than switching thread context and using shared memory.
> For any use case where the last few percent matter, consider not using Python (which will be much much more significant).
Here is an interesting question: If asyncio and multiprocessing already give us "nearly the same or better performance", then why is "use another language" such a common advice to escape parallelism-problems in Python?
Because, curiously enough, the languages that are usually recommended for this (C, Go, Rust, C++, Java) all implement thread-based parallelism.
If you're interested, I wrote a more elaborate comment on asyncio vs. multiprocessing vs. multithreading here: https://news.ycombinator.com/item?id=36915581 (fyi, Python's multiprocessing has support for interprocess shared memory, with overhead)
Those languages you mentioned are not only recommended to escape parallelism problems in Python. They are recommended because they are much, much faster, period. Four of those you listed compile down to machine code, the last one has a highly optimized JIT. Just two are garbage-collected, and none do reference counting. All of them are strongly typed. These differences sum up to orders of magnitudes. If well-written Python code were equally fast as well-written C code but the only way to do parallelism was using processes, then I promise you you wouldn't get (most of) those voices telling you to escape Python. Plenty of well-written C programs choose a multiprocessing approach for parallelism over multithreading.
Long story short, if you worry about the performance overhead between multithreading and multiprocessing, make sure you worry about plenty of more significant factors that differentiate Python from faster languages first.
> (fyi, Python's multiprocessing has support for interprocess shared memory, with overhead)
I know. I have a werkzeug/gunicorn application that currently runs 60 worker processes on a 64 core server.
And I would love nothing better than to rip out every. single. last. one. of the IPC facilities, and replace them with the same easy and convenient mutex and CSP systems, that I can use in my Golang applications.
> They are recommended because they are much, much faster, period.
I know, but "rewrite it in Rust/Go/C++" isn't always an option, be it because of legacy status, constraints in available dev-hours, compatibility problems, library support or simply ease of use.
Keep the GIL and avoid the problems its removal will cause. Allow parts of your program to run in a separate namespace with explicit passing of objects. No sharing means no contention, so no overhead.
I wonder if there will ever be Python 4, it seems that the core developers want to avoid bumping the major version number ever again after 3 under any circumstances.
Yes bc they fear a Py3 to 4 transition would be perceived as a major burden.
I'm afraid we'll soon learn its not the version # that's burdensome, but the real or perceived(!) incompatibility between versions.
I wonder if introducing such a monumental change in a build flag of the minor version is really wise. Certainly its not in line with any interpretation of semantic versioning (to be fair I think the PSF does not claim to use that).
Let us wait and watch but I somehow feel that this no-GIL mode is just a band-aid solution to Python's performance problem. The cause goes deep inside the core of Python, it gradually came to this stage as more and more features got added to the language since the 3.x transition.
I think new language features shouldn't just be added to provide syntactic sugars or coding shortcuts to programmers or just because a certain feature has become very cool (like lambda functions, for eg).
I'm glad that the Python community has realized that performance is an issue and started working on things like no-GIL mode.
People often say that Python's biggest strength is its readability and easy syntax but I disagree. Python's real strength is the enormous third party library ecosystem, popular packages like numpy, pandas, scikit, etc. which have almost become addictive in most data science projects. But now, people are thinking of other alternatives to these due to Python's performance issues. Other ecosystems like golang and rust are getting built at rapid pace and at some point, they will also have (more performant) equivalents of these packages if public shows enough interest.
The creator and lead maintainer of SQLAlchemy, one of the most popular and most used Python libraries for accessing databases (who doesn't?) gave a rather interesting response to PEP703.
> Basically for the moment the GIL-less idea would likely be burdensome for us and the fact that it's only an "option" seems to strongly imply major compatibility issues that we would not prefer.
(...)
> Adding an entirely new mode of operation to cPython that's optional would be an enormous burden for us as far as ensuring we use APIs appropriately, adding support, testing, we would have to spin up new test workers to test SQLAlchemy in both modes of operation, we would be getting strange new race condition related issues reported
It will be interesting to see how this will be executed. I think many in Ruby land wanted something similar but couldn't get some general agreement. Ractors tried and arguably failed. We have Samuel Williams basically the one person pushing very hard for Async changes.
Ruby could learn a lot once this is done, but at the moment GIL optional in python seems to be a 2030 goal.
I'm looking forward to this, Python plays a fairly significant role in our scientific computing code and not having to have entirely separate processes will be very convenient for cutting down data duplication.
Afaik, nogil will be a compile flag, which means that when there are two builds, you separately compile Gil and nogil builds. They will be two separate programs/binaries/packages. It could be possible for something like conda to install both binaries, then run your program with the one that matches the library flags, but python itself could not do this (afaik).
there's a lot of code I wrote (and saw people write) in Python over the years, conscious that noone will ever run it in threads (ofc it's possible, but typically pointless), thus going quite easy on things that wouldn't be thread-safe. this used to quite a comfortable stance. community ended up inventing other ways to share state, other ways to vectorize, other ways to avoid blocking on I/O, that might sometimes be annoying, but evolved to be quite reasonable for Python.
giving up this stance? a lot of code is instantly a legacy, and a lot of it is a legacy people won't even know about before they notice the problems. and for what?
i must say that i have no experience running Python without GIL so my idea of the ways things can be not thread-safe is purely speculative/borrowed from very different languages (that I finally moved on to long ago, thank god). so maybe i'm wrong, i misunderstand the impact, and all this code is just fine
people in this thread mention that, for some reason, "even with GIL you still have to write thread-safe code", which is an admirable stance, but I don't think many people do it, because their webserver or whatever uses the many single-threaded processes model and they don't want to waste time on that
Indeed. And the reason they use multiprocessing is bc they have learned that Python's multithreading is not a good option in cpu bound tasks. The blessing in disguise of course being that multiprocessing is also a shared nothing model, so (mostly) lock free programming is the default. Oth if you have a need for concurrently accessed shared memory/resources and need locks, it comes with an explicit cost. I think that's a good thing.
In the future the default concurrency model will be shared everything free threading, and all hell might break loose. Hopefully not.
IMO Python doesn't even offer enough in the way of good synchronisation primitives for the async code. I don't think the ecosystem is ready for this one.
I hope it'll be recognised by the most "host" applications (uvicorn or whatever) that multi-threading is not a good idea anyway and they'll discourage it. But there'll definitely be a macho-land of thread-"safe" programmers whose bugs we'll be downstream of
Yeah it's going to be weird for some years where some libraries support no-GIL and others don't, while folks cry about the ones that don't support it holding them back.
Like asyncio's introduction we'll probably see core stuff like http requests, file IO etc. all now have an entirely new permutation of libraries made to support non-GIL mode. This is going to get pretty spicy as stuff like http already has regular (blocking IO) and asyncio (non blocking IO) versions, so now do they need regular non-GIL and asyncio non-GIL versions too? Is the default for a library author going forward to be creating four permutations of your library with vastly different behavior in each of them? Yuck.
Not really. The GIL doesn’t actually make threading easier for a typical developer as they still have to worry about thread safety. You can ignore locks if you know what Python operations are atomic. But that’s incredibly perilous and you really shouldn’t try given that relies on implementation details. Eg. What if you didn’t realize a setter was overridden and setting to a dict-like isn’t atomic anymore?
It’ll make the Python source code much more complex and complicated, which is probably not a big deal, though I’ll say the CPython source is quite brilliant.
It’ll also mean for C library developers that they can’t assume Python opcodes are atomic. But I’m not sure C library developers will really mind too much because they already worry about this kind of stuff.
Only if the user explicitly uses threads. By default people can still approach their code in the exact same way they do. I imagine most users won't think about threads at all, but may relay on frameworks and libraries that take advantage of them under the hood.
It's only more performance if you're using the threading primitives and spawning new threads, moving work to them to do in parallel, etc.--this isn't something any beginner will consciously be doing. It might actually be slower in regular single process use that 99% of python users use (since the GIL is there for a very good reason and synchronizing access to python's internal state doesn't just happen for free or without some cost somewhere).
I think multi-threadedness is already an intermediate level concept so maybe its not a big downside. In turn, the ones that understand and need the performance get it.
The underlying pressure on the Python ecosystem is to transition to a post-Moore's law era and effectively become a HPC platform where the "same" code runs on a CPU, a GPU, multicore, clusters etc.
Python may feel the pressure more than others because of the GIL and the fact it is used in compute intensive tasks more than others.
But this major need to transition to easy and seamless HPC/heterogeneous computing is the same for all languages. The question is who will get there first.
Mojo is a superset of python, with goal to be able to run any python code and to import any python module under its own execution model, that runs magnitudes faster. Now, if that actually works as advertised, it would render python obsolete and become mojo instead, even if the authors don’t put it that way.
Lex Fridmans podcast had an interview with one of the creators recently. Chris Lattner, who is also the creator of LLVM and Swift. Recommended listen for those who haven’t heard of mojo yet, if you have 3h to spare.
"Python 4, but not really", because we want to squeeze out more multithreading performance and be cool again. Some questions from reading the OP:
- How much does performance improve due to this No-GIL thing? Is it greater than 2x? For what workloads?
- Do I have to compile two versions of every extension (gil/nogil)? I would prefer building extensions does not get any more complicated.
- Can I automatically update my code to handle nogil? (a tool like lib2to3 or six)
GIL is a promise from the internal implementation (eg CPython) of Python that things will happen atomically within the Python interpreter. This means that when multiple threads try to access and modify Python objects at the same time, the GIL ensures that only one thread can execute Python bytecode at any given moment, preventing potential conflicts and ensuring data integrity. However, this comes at the cost of limiting the full utilization of multiple CPU cores for certain CPU-bound tasks.
Non-GIL adds some complexity to the implementation and some risk when writing multithreaded code at the benefit of improving performance.
But people are not doing cpu bound tasks in native python (performance is a joke), it all comes down to calling a C library that is optimized for compute - what will be the change that GIL brings here?
I get that it's a hard problem, but it isn't a guessing game. Concurrency has always required one to carefully design their program. We will see how Python implements these new APIs, but I trust it will be approachable to those who want to do it.
It's not about the API. The multithreading API (in any language) is easy:
1. Start a thread
2. Give it work to do (data + code)
The problems start when multiple threads work on the same data and compete for the same system resources.
Agree, it's not a guessing game. But it is a huge state machine with unobvious hidden states. The cure are locks everywhere, and the magic is to find the least number and places of locks to still make it fast and correct, in any and all situations aka race conditions.
It's a much harder problem to reason about than most people realize.
Python uses a GIL (global interpreter lock) which prevents both Python code and native C modules from executing in parallel in the interpreter. Removing the GIL means Python could provide in-process parallelism.
Slightly faster performance when you write multithreading code correctly (much easier said than done). Very few python devs are actually running into this as a bottleneck day to day.
so that "PEOPLE WILL STOP WHINGING ABOUT THE GIL."
To be fair, there's another group of people who stand to benefit. Namely, any python programmers who currently believe that "threads are an easy way to make my program go faster" will soon be the recipients of a valuable learning experience.
Considering Python is probably the most popular language ever, I would say the members of the council keep a pretty low profile. There are declining languages with userbases orders of magnitude smaller that make the front page more often.
Approving a PEP isn’t just flipping a bit, there are other decisions which come with it; this is process transparency and, implicitly, calling for feedback relevant to those other bits.
The ruling class in python-dev are populists who are not threading experts. Python is run by the wrong people.
They will approve something if it serves a corporation. The submission here is likely CYA, so they can say that "they asked the community".
There is no appreciation for people doing grassroots open source software. If Instagram can add another hack instead of switching to Java, it will be approved.
It is important to remember that paid corporate developers will have job security every time new pain is introduced in the Python ecosystem.
Seems these guys never learn from the Python 2 -> 3 mess. GIL to NoGIL is even worse. Most changes from Python 2 to 3 are syntactic, but GIL to NoGIL could require a complete redesign and rewrite. Either it takes another 10+ years migrating OR nobody with a realistic production codebase with 10k dependencies can turn this No-GIL thing on.
String parsing which tokenised in-place. DNS calls which used static buffers. Things which exploited Vax specific stack behaviour.
I think the GIL has been a blessing and a curse.