What's up, Python? The GIL removed, a new compiler, optparse deprecated

dang · 2023-07-31T08:00:02

Recent and related:

Intent to approve PEP 703: making the GIL optional - https://news.ycombinator.com/item?id=36913328 - July 2023 (488 comments)

wmwmwm · 2023-07-30T21:10:49

Historically I’ve written several services that load up some big datastructure (10s or 100s of GB), then expose an HTTP API on top of it. Every time I’ve done a quick implementation in Python of a service that then became popular (within a firm, so 100s or 1000s of clients) I’ve often ended up having to rewrite in Java so I can throw more threads at servicing the requests (often CPU heavy). I may have missed something but I couldn’t figure out how to get the multi-threaded performance out of Python but of course no-GIL looks interesting for this!

iknownothow · 2023-07-31T06:16:51

I would consider the following optimizations first before attempting to rewrite an HTTP API since you already did the hard part:

1. For multiples processes use `gunicorn` [1]. Runs your app across multiple processes without you having to touch your code much. It's the same as having the n instances of the same backend app where n being the number of CPU cores you're willing to throw at it. One backend process per core, full isolation.

2. For multiple threads use `gunicorn` + `gevent` workers [2]. Provides multiprocessing + multithreaded functionality out of the box if you have IO intensive. It's not perfect but works very well in some situations.

3. Lastly, if CPU is where you have a bottleneck, that means you have some memory to spare (even if it's not much). Throw some LRU cache or cachetools [3] over functions that return the same result or functions that do expensive I/O.

[1]: https://www.joelsleppy.com/blog/gunicorn-sync-workers/

[2]: https://www.joelsleppy.com/blog/gunicorn-async-workers-with-...

[3]: https://pypi.org/project/cachetools/

danpalmer · 2023-07-31T12:04:44

These don't really apply to the parent commenter's scenario.

1) gunicorn or any solution with multiple processes is going to just multiply the RAM usage. Using 10-100GB of RAM per effective thread makes this sort of problem very RAM bound, to the point that it can be hard to find hardware or VM support.

2) This isn't I/O bound.

3) If your service is fundamentally just looking up data in a huge in-memory data store, adding LRU caching around that is unlikely to make much of a difference because you're a) still doing a lookup in memory, just for the cache rather than the real data, and b) you're still subject to the GIL for those cache lookups.

I've also written services like this, we only loaded ~5GB of data, but it was sufficient to be difficult to manage in a few ways like this. The GIL-ectomy will probably have a significant impact on these sorts of use cases.

kayodelycaon · 2023-07-31T13:13:55

For #1, would copy on write help? Or does python store the counters on the objects?

danpalmer · 2023-07-31T15:14:14

Ha! Yes! Unfortunately I know this because of terrible reasons. Python is reference counted so copy-on-write doesn't work for this with Python objects (note: if your Python object is actually just a reference to a native object in a library all bets are off, may work or may not).

We had an issue with the service I mentioned above where VMs with ~6GB RAM weren't working, because at the point that gunicorn forked there was instantaneously >10GB RAM usage because everything got copied. We had to make sure that the data file was only loaded after the daemon fork, which unfortunately limits the benefits of that fork, part of the idea is that you do all your setup before forking so that you know you've started cleanly.

xmaayy · 2023-07-31T11:26:19

> 1. For multiples processes use `gunicorn`

This will load up multiple processes like you say. OP loads a large dataset and gUnicorn would copy that dataset in each process. I have never figured out shared memory with gUnicorn.

zbentley · 2023-07-31T12:13:19

> gUnicorn would copy that dataset in each process

Assuming you're on Linux/BSD/MacOS, sharing read-only memory is easy with Gunicorn (as opposed to actual POSIX shared memory, for which there are multiprocessing wrappers, but they're much harder to use).

To share memory in copy-on-write mode, add a call to load your dataset into something global (i.e. a global or class variable or an lru_cache of a free/class/static method) in gunicorn's "when_ready" config function[1].

This will load your dataset once on server start, before any processes are forked. After processes are forked, they'll gain access to that dataset in copy-on-write mode (this behavior is not specific to python/gunicorn; rather, it's a core behavior of fork(2)). If those processes do need to mutate the dataset, they'll only mutate their copy-on-write copies of it, so their mutations won't be visible to other parallel Gunicorn workers. In other words, if one request in a parallel=2 gunicorn mutates the dataset, a subsequent request has only a 50% likelihood of observing that mutation.

If you do need mutable shared memory, you could either check out databases/caches as other commenters have mentioned (Redislite[2] is a good way to embed Redis as a per-application cache into Python without having to run or configure a separate server at all; you can launch it in gunicorn's "when_ready" as well), or try true shared memory[3][4]

1. https://docs.gunicorn.org/en/stable/settings.html#when-ready 2. https://pypi.org/project/redislite/ 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...

sanderjd · 2023-07-31T11:40:27

One way to achieve similar performance is redis or memcached running on the same node. It really depends on the workload too. If it is lookups by key without much post-processing, that architecture will probably work well. If it's a lot of scanning, or a lot of post-processing, in-process caching might be the way to go, maybe with some kind of request affinity so that the cache isn't duplicated across each process.

nwallin · 2023-07-30T22:35:17

> I may have missed something but I couldn’t figure out how to get the multi-threaded performance out of Python

Multiprocessing. The answer is to use the python multiprocessing module, or to spin up multiple processes behind wsgi or whatever.

> Historically I’ve written several services that load up some big datastructure (10s or 100s of GB), then expose an HTTP API on top of it.

Use the python multiprocessing module. If you've already written it with the multithreading module, it is a drop in replacement. Your data structure will live in shared memory and can be accessed by all processes concurrently without incurring the wrath of the GIL.

Obviously this does not fix the issue of Python just being super slow in general. It just lets you max out all your CPU cores instead of having just one core at 100% all the time.

RayVR · 2023-07-30T23:43:20

Multiprocessing is not a real solution, it’s a break-glass procedure when you just need to throw some cores at something without any hope for reliability. Unless something has changed since I used python, it is essentially a wrapper on Fork.

This means you need to deal with stuck/dead processes. I’ve used multiprocessing extensively and once you hit a certain amount of usage, even in a pool, you just get hangs and unresponsive processes.

I’ve also written a huge amount of Cython wrapped c++ code which releases the GIL. This never hangs and I can multithread there all I want without issue.

emmelaich · 2023-07-31T01:04:28

Why would they get stuck/dead and why wouldn't that happen with threads which might be even worse as they're more tightly bound? At least with zombies or inactive processes you can detect and kill them externally - if needs be.

Haven't played with multiprocess at scale, so am genuinely interested.

oivey · 2023-07-31T01:53:03

If subprocesses die (segfault maybe) it isn't uncommon for them to not be cleaned up and/or cause the parent process to hang while it waits for the zombie to respond. That's one I experienced last week on Python 3.9. A thread that experienced that would likely kill the parent process or maybe even exit with a stacktrace. Way easier to debug, and doesn't require me to search through running tasks and manually kill them after each debug cycle.

My impression is that the multiprocessing module is a heroic effort, but unfortunately making the whole system work transparently across multiple OSs and architectures is a nearly insurmountable problem.

empthought · 2023-07-31T02:17:48

You may be interested in the concurrent.futures library, available for over a decade now. It keeps you from shooting yourself in the foot like that.

https://docs.python.org/3/library/concurrent.futures.html

KolenCh · 2023-07-31T03:38:27

Why do you think it would help?

It provides a nice interface but is using multiprocessing or multi threading under the hood depending on which executioner you use:

> The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

empthought · 2023-07-31T04:30:45

Your trouble seems to involve not understanding how to set up signal handlers, which ProcessPoolExecutor handles for you and exposes via a BrokenProcessPool exception.

wiseowise · 2023-07-31T07:01:21

> Derived from BrokenExecutor (formerly RuntimeError), this exception class is raised when one of the workers of a ProcessPoolExecutor has terminated in a non-clean fashion (for example, if it was killed from the outside).

What if it hangs?

empthought · 2023-07-31T10:08:34

That isn’t the scenario originally described, but there is a timeout parameter in future.result().

nine_k · 2023-07-31T04:19:31

Always setting a timeout on every IPC or network operation helps immensely. IIRC multiprocessing module allows that everywhere, but defaults to waiting forever in a couple of places.

emmelaich · 2023-08-01T02:57:38

Zombies don't respond, they merely have to be wait()'d for. Which should take microseconds at most.

I've seen orphaned processes sometimes idle, sometimes busy doing god knows what. But Zombies OTOH are rarely a problem, and should be able to be dealt with easily.

Perhaps the desire of Python to be Windows compatible mitigates against some design more suitable for Unix.

jjoonathan · 2023-07-31T00:01:17

Yep, multiprocessing is a cope.

If processes were a universal substitute for threads we wouldn't have threads. That reasoning only gets stronger when you apply python's heavy limitations, but it gets the most strength when you experience the awkwardness of multiprocessing firsthand.

akvadrako · 2023-07-31T11:26:33

There isn't much difference on Linux between threads and processes that share memory. Multiprocessing is fine, it's just slightly more isolated threads.

jjoonathan · 2023-07-31T13:24:27

That's why I took special care to mention how python's multiprocessing module was particularly poor.

slt2021 · 2023-07-31T00:10:57

multiprocessing is very good solution for scatter-and-gather (or map/reduce) type workloads: for example ssh to 1000 machines, run some commands, grab output, analyze output, done some action based on output, etc

if you are managing a fleet of machines and have some tasks to do on each machine, then multiprocessing is the life saver.

durumu · 2023-07-31T12:53:25

There is a "fork" mode and a "spawn" mode. Fork (the default) tends to result in broken process pools as you say, spawn seems to work a lot better but the performance is worse.

toasted-subs · 2023-07-31T00:00:25

I’m not a huge fan of Cython and the like. It seems to be more natural to open a tcp connection to a c/c++ program and let that do the heavy lifting. Anything else seems like not a proper UNIX style solution.

KolenCh · 2023-07-31T03:39:53

That's not natural at all. Eg pybind11 is more natural.

mort96 · 2023-07-31T08:15:07

I want to warn people against multiprocessing in python though.

If you're thinking about parallelizing your Python process, chances are your Python code is CPU-bound. That's when you should stop and think, is Python really the right tool for this job?

From experience, translating a Python program into C++ or Rust often gives a speed-up of around 100x, without introducing threads. Go probably has a similar level of speed-up. So while you can throw a lot of time fighting Python to get it to consume 16x the compute resources for a 10x speed-up, you could often instead spend a similar amount of time rewriting the program for a 100x speed-up with the same compute resources. And then you could parallelize your Go/Rust/C++ program for another 10x, if necessary.

Of course, this is highly dependent on what you're actually doing. Maybe your Python code isn't the bottleneck, maybe your code spends 99% of its time in datastructure operations implemented in C and you need to parallelize it. Or maybe your use-case is one where you could use pypy and get the required speed-up. I just recognize from my own experience the temptation of parallelizing some Python code because it's slow, only to find that the parallelized version isn't that much faster (my computer is just hotter and louder), and then giving in and rewriting the code in C++.

aragilar · 2023-07-31T11:26:58

The first thing you should do is profile the code (py-spy is my preferred option) and see if there are any obvious hotspots. Then I'd actually look at the code, and understand what the structure is. For example, are you making lots of unnecessary copies of data? Are you recomputing something expensive you can store (functools.cache is one line and can make things much faster at the cost of memory)?

Once you've done that, then you should be familiar enough the code to know which bits are worth using multiprocessing on (i.e. the large embarrassingly parallel bits), which if they are a significant part of your code should scale near linearly.

The other thing to check is which libraries are you using (and what are your dependencies using). numpy now includes openblas (though mkl may be faster for your usecase), but sometimes you can achieve large speedups though choosing a different library, or ensuring speedups are being built.

sanderjd · 2023-07-31T11:43:32

Is there a better resource than the py-spy docs for figuring out how to use it?

coldtea · 2023-07-31T00:15:58

>Use the python multiprocessing module. If you've already written it with the multithreading module, it is a drop in replacement. Your data structure will live in shared memory

Only if it can be immutable. So it can't be shared and changed by multiple processes as needed (with synchronization).

And even if you can have it mostly immutable, if you need to refresh it (e.g. after some time read a newer large file from disk to load into your data structure), you can't without restarting the whole server and processes.

So, it could work for this case, but it's hardly a general solution for the problem.

felipetrz · 2023-07-31T06:13:05

For this use case it would be better to put the data in a shared SQLite database than relying on multiprocessing CoW.

Even accessing objects from the shared memory would cause the reference counter to increment and the data would be copied, causing a memory usage explosion.

coldtea · 2023-07-31T08:30:32

>For this use case it would be better to put the data in a shared SQLite database than relying on multiprocessing CoW

In Python yes. In Java you could take advantage of shared memory and get spared the overhead of SQLite.

alfalfasprout · 2023-07-31T00:38:42

Nowadays multiprocessing is rarely the answer. Between all the gotchas (memory usage can be horrific, have to be careful what you modify, etc.) it's almost never the right answer.

Nowadays numba is usually a better solution for when you want to run some computationally expensive python code that itself calls numpy, etc.

For the parent commenter's use case though that wouldn't be a great solution either. In general, Python does not have an optimal way of operating on a shared data structure across OS threads and certainly not in a way that doesn't require forking the interpreter.

chlorion · 2023-07-31T02:10:34

You have to be much more careful about what you modify when using multithreading, so I'm not sure what you mean by that.

A lot of people here mention that sharing data is much easier with multithreading, but doing this without races is not easy.

You can't just use the values from difference threads like you would in normal code, you need to synchronize access with locks, which can be difficult to do correctly and can harm performance in a lot of cases.

I think a lot of the people who complain about the GIL are going to become acutely aware of why it was useful when they attempt to use GIL-less multithreading, and realize that removing it wasn't as great as it sounded at first!

In my experience, most problems are inherently synchronous with lots of mutable state and complex data dependencies, or inherently parallel with lots of tasks that can run independently. Problems that can be easily parallelized already work fine with multiprocessing! Problems that can't be easily parallelized are not something you can just slap some threading on to get more performance, and will require a lot of work to keep state synced!

This is just my opinion though and I'm sure there are plenty of domains that I don't have experience with that will benefit from no-GIL python!

mvncleaninst · 2023-07-31T08:09:30

> Problems that can be easily parallelized already work fine with multiprocessing!

Yeah, except afaik you pay more in context switches, sharing is more cumbersome. Also language runtime of a single process is likely working with less information, you end up using more memory on multiple language runtime instances

Frankly I'd just use Java or Go at that point and not even bother

thrashh · 2023-07-31T02:27:56

Multithreading is hard but once you have been doing it a while, it becomes easy and most importantly, it’s stable.

When you have to deal with processes, there’s a lot of external factors out of your control because processes are much more visible and carry a lot of extra baggage.

Hard multithreading problems are fun. Hard multi-process problems are just tedious.

da39a3ee · 2023-07-31T12:59:51

As I understand it on Linux processes and threads are implemented in almost the same way, just that threads share memory. I've heard it said several times that the idea that processes are "heavier" is a bit of a myth. I guess they need to allocate heap space and threads don't. I'm not an expert, just mentioning because it sounded like you might be believing something which is at odds with what people say about processes and threads on Linux.

chlorion · 2023-08-02T02:21:35

I'm not a Linux kernel dev but I think this is true! Not sure what's up with the downvotes.

You can create a process/thread chimera with certain system calls, and get something that is in-between a thread and process if you want, which is neat but maybe not that useful.

Creating processes on Linux is actually much faster than people seem to realize. I can spawn at least a few thousand a second from a quick test of spawning bash instances.

alfalfasprout · 2023-07-31T16:43:13

Not sure why this is directed at my comment-- I didn't touch on synchronization.

Yes, locks like mutexes, semaphores, etc. and approaches like atomics, lockfree datastructures come into play when writing multithreaded code. There's no getting around that.

> In my experience, most problems are inherently synchronous with lots of mutable state and complex data dependencies, or inherently parallel with lots of tasks that can run independently. Problems that can be easily parallelized already work fine with multiprocessing!

This is a hot take though-- most problems that are truly embarrassingly parallel don't work as well as you'd think w/ multiprocessing. There's a ton of overhead there and when you do need synchronization steps (eg; in reductions) it can get pretty messy.

imtringued · 2023-07-31T06:31:06

[flagged]

dang · 2023-07-31T07:58:08

I don't mean to pile on to what ghshephard already posted, but I'm afraid you've been breaking the site guidelines repeatedly lately - not just here but these:

https://news.ycombinator.com/item?id=36923922

https://news.ycombinator.com/item?id=36921060

... as well as others. Can you please not do this? We're trying for something different here. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

ghshephard · 2023-07-31T07:34:49

https://news.ycombinator.com/newsguidelines.html

> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

dekhn · 2023-07-30T23:13:04

Over quite some time I've become convinced multiprocessing module is better than an optional GIL removal.

It may leave many useful bits on the table (compared to pure multithreaded coding, like C++/pthreads) but I've still been able to get it to scale my application performance (CPU-bound, large-memory) to the number of cores of even large boxes (96+ vCPUs). IIRC the future/concurrent library was key to being productive.

20 years ago I would said different, as at the time, IronPython demonstrated a real alternative to CPython that was faster, and fully multitrhreaded (including the container classes).

apelapan · 2023-07-31T08:11:19

Sure, with multiprocessing you can get 96 python processes running at 100% CPU while sharing a large dataset.

Only problem is that 99% of that CPU usage is for serializing/deserializing IPC messages and total throughput would have been higher using a single process.

There are use-cases for multiprocessing. As long as data sharing between processes is insignificant, it can be quite performant. Just like using a bash-wrapper script that orchestrates a bunch of python (or other) processes.

OkGoDoIt · 2023-07-31T03:17:02

Whatever happened to ironpython? I used to do a lot of C# development and remember dabbling with ironpython back in the day. It seemed like it was important to Microsoft, .Net added the whole concept of dynamic data types mostly to support ironpython and ironruby. But I never really used python much until recently, so of course when I finally needed to do python I looked for ironpython and it doesn’t appear to be a thing anymore.

throwaway2037 · 2023-07-31T04:38:10

It looks like Microsoft abandoned these dynamic language implementations in 2010. Maintaining parallel implementations of two complex, mature scripting languages is a huge feat. It would take some very expensive talent. That said, IronPython was loved by those who used it, which means it captured them in the DotNet ecosystem. Perhaps that win was not enough for Microsoft to continue the project. Ideally, Python foundation should "own" (and fund) Jython and IronPython development, but that takes (a lot of) money. (Sorry, I'm much less familiar with Ruby and IronRuby.)

Turskarama · 2023-07-31T03:49:08

It is still a thing, but it's open source now instead of maintained by Microsoft. There was a release that finally supports Python 3 in December last year.

I don't know how useful it is really, if you really want performance then you probably shouldn't choose Python to begin with, or you use the libraries which may not be compatible with IronPython. These days it barely takes me longer to build a simple script in C# than in Python either.

HdS84 · 2023-07-31T05:11:51

It's so so. Pythons core value is it's huge stack of lib's. And most important fall down with IP due to them using c and so on.

When we needed python c# interop it was better to use python.net and integrate that way. Annoying to setup but when it works you can get both to work seamlessly

amrx101 · 2023-07-31T05:25:42

I dont really partake in programming "wars", but the idea of launching a set of separate processes instead of separate threads to do a bunch of IOs has always seem to be weird to me. Yes, I have built software using Python. Yes, I have done things as you suggest. Now I use asyncio, since the syntax has matured and I finally understand coroutines, runners, tasks etc. Lets see where the GIL less Python takes us.

da39a3ee · 2023-07-31T13:03:23

I'm confused. If you're doing a "bunch of IOs" then that's the situation where people use threads in Python, not processes. The argument for processes in Python is CPU-bound workloads.

scrozart · 2023-07-31T01:43:16

Yup. I work at the Space Telescope Science Institute, where we maintain pipelines for astronomical data that move petabytes, among other things. All of the heavy lifting is done in Python.

the8472 · 2023-07-31T08:43:21

Loading 100GB into RAM and then calling fork() is just painting a giant OOM Killer target on your back. It'll work until something breaks the CoWs or the parent gets restarted while some forks still linger or other fun things like that.

Threads make it transparent to the OS that this memory really must be shared between compute tasks.

zbentley · 2023-07-31T12:34:51

While that does sometimes happen, I find the risk to be overstated. Most simple "allocate a large, complex data structure (e.g. dict of vectors of dataclasses) before creating a multiprocessing.Pool/Process/concurrent.futures.ProcessPoolExecutor and then refer to parts of it in the executor's jobs" work that deals in GBs of data does not suffer from copy-on-write-induced OOM issues in my experience. If the data in the shared memory isn't mutated in python, the refcount mutations are rarely enough to dirty more than a fraction of a percent of pages (though there are pathological allocation/reference schemes where that's not true).

If you do have memory issues, calling 'gc.freeze()' right before creating your multiprocessing.Pool/Process/concurrent.futures.ProcessPoolExecutor is sufficient to mitigate refcount-related page dirtying in the vast majority of cases. In the small remaining minority of cases, 'gc.disable()' as suggested by the freeze docs[1] may help. If that still doesn't do it, or if your page-dirtying is due to actual mutations of data (not just refcounts), it may be time to reach for actual shared memory instead[2][3].

1. https://docs.python.org/3/library/gc.html#gc.freeze 2. https://docs.python.org/3/library/multiprocessing.html#share... 3. https://docs.python.org/3/library/multiprocessing.shared_mem...

godelski · 2023-07-30T22:51:15

This exists, but one of two things happen, which still significantly slows things down. Either 1) you generate multiple python instances or 2) you push the code to a different language. Both are cumbersome and have significant effects. The latter is more common in computational libraries like numpy or pytorch, but in this respect it is more akin to python being a wrapper for C/C++/Cuda. Your performance is directly related to the percentage of time your code spends within those computation blocks otherwise you get hammered by IO operations.

oivey · 2023-07-30T23:15:43

You have to manually set up shared memory with its own API that has its own limitations, right? I thought some seamless integration was a new feature, but AFAICT, transfers between multiprocesses still leads to things being pickled and copied. Am I wrong?

zbentley · 2023-07-31T12:57:22

> Am I wrong?

Only partially. When you send things to a multiprocessing.Pool/concurrent.futures.ProcessPoolExecutor, they're pickled and copied. "Sending" happens when passing arguments to e.g. "multiprocessing.Pool.apply_async()", "multiprocessing.Queue.put()" or "concurrent.futures.ProcessPoolExecutor.submit()".

However, there are two other ways to share data into your multiprocessing processes:

1. Copy-on-write via fork(2). In this mode, globally-visible data structures in Python that were created before your Pool/ProcessPoolExecutor are made accessible to code in child processes for (nearly) free, with no pickling, and no copying unless they are mutated in the child process. Two caveats here, which I've discussed in other comments on this thread: mutation may occur via garbage collection even if you don't explicitly change fork-shared data in Python[1]; and fork(2) is not used by default in multiprocessing on MacOS or Windows[2].

2. Using explicit shared memory data structures provided by Multiprocessing[3][4]. These do not incur the overhead (in CPU or copied memory) that pickle-based IPC does, but they are not without complexity or cost.

Unfortunately, truly "seamless integration" is not really possible with multiprocessing, so users will have to use one or more of the above strategies according to their application needs.

1. https://news.ycombinator.com/item?id=36940118 2. https://news.ycombinator.com/item?id=36941791 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...

whywhywhydude · 2023-07-31T00:15:03

If you have a non trivial application, multiprocessing just takes a lot of memory. Every child process that you create duplicates the parent memory. There are some interesting hacks like gc.freeze that exploits the copy on write feature of forks to reduce memory, but ultimately you can just create a few hundred of processes compared to thousands of threads because of memory consumption.

coldtea · 2023-07-31T00:18:03

>If you have a non trivial application, multiprocessing just takes a lot of memory. Every child process that you create duplicates the parent memory.

Not really, unless you want to alter it. The OS uses copy on write behind the scenes for forked processes, so will use the same memory locations already loaded until/if you modify that. So parent memory isn't really duplicated.

As for any new memory allocated by each child process, that's its own.

whywhywhydude · 2023-07-31T00:50:22

Unfortunately, python garbage collector messes up copy on write. Here’s a blog from instagram on how they fixed it - https://instagram-engineering.com/copy-on-write-friendly-pyt...

sgeisenh · 2023-07-31T00:26:22

Unfortunately the generational GC modifies bits all over the heap, so you have to use some tricks to really leverage copy on write (as the commenter alludes to).

Kranar · 2023-07-31T01:58:24

Fork's copy on write does not mix well with garbage collection.

zbentley · 2023-07-31T13:02:24

The situation is a bit more complicated than this. While it's usually not the case that child processes always duplicate parent memory, that does happen on certain platforms (MacOS and Windows) on some Pythons. Additionally, the situation regarding unexpected page dirtying of copy-on-write memory is nuanced as well, which some of the sibling comments allude to.

I'll copy the tl;dr from another comment I've made nearby:

There are three main ways to share data into your multiprocessing processes:

1. By sending that data to them with IPC/pickling/copying, e.g. via "multiprocessing.Pool.apply_async()", "multiprocessing.Queue.put()" or "concurrent.futures.ProcessPoolExecutor.submit()".

2. Copy-on-write via fork(2). In this mode, globally-visible data structures in Python that were created before your Pool/ProcessPoolExecutor are made accessible to code in child processes for (nearly) free, with no pickling, and no copying unless they are mutated in the child process. Two caveats here, which I've discussed in other comments on this thread: mutation may occur via garbage collection even if you don't explicitly change fork-shared data in Python[1]; and fork(2) is not used by default in multiprocessing on MacOS or Windows[2].

3. Using explicit shared memory data structures provided by Multiprocessing[3][4].

1. https://news.ycombinator.com/item?id=36940118 2. https://news.ycombinator.com/item?id=36941791 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...

nine_k · 2023-07-31T04:23:40

Multiprocessing is great. But then every process keeps its own copy of hundreds of gigabytes of stuff. May be okay, depending on how many processes you spawn.

If the bulk of the data is immutable (or at least never mutated), it can be safely shared though, via shared memory.

zbentley · 2023-07-31T12:47:06

> every process keeps its own copy of hundreds of gigabytes of stuff. May be okay, depending on how many processes you spawn

That depends on how you're using multiprocessing. If you're using the "spawn" multiprocessing-start method (which was set to the default on MacOS a few years ago[1], unfortunately), then every process re-starts python from the beginning of your program and does indeed have its own copy of anything not explicitly shared.

However, the "fork" and "forkserver" start methods make everything available in python before your multiprocessing.Pool/Process/concurrent.futures.ProcessPoolExecutor was created accessible for "free" (really: via fork(2)'s copy-on-write semantics) in the child processes without any added memory overhead. "fork" is the default startup mode on everything other than MacOS/Windows[2].

I find that those differing defaults are responsible for a lot of FUD around memory management regarding multiprocessing (some of which can be found in these comments!); folks who are watching memory while using multiprocessing on MacOS or Windows observe massively different memory consumption behavior than folks on Linux/BSD (which includes folks validating in Docker on MacOS/Windows). There's an additional source of FUD among folks who used Python on MacOS before the default was changed from "fork" to "spawn" and who assume the prior behavior still exists when it does not.

This sometimes results in the humorously counterintuitive situation of someone testing some Python code in Docker on MacOS/Windows observing far better performance inside Docker (and its accompanying virtual machine) than they observe when running that same code natively directly on the host operating system.

If you're on MacOS (not Windows) and wish to use the "fork" or "forkserver" behaviors of multiprocessing for memory sharing, do "export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" in your shell before starting Python (modifying os.environ or calling os.setenv() in Python will not work), and then call "multiprocessing.set_start_method("fork", force=True)" in your entry point. Per the linked GitHub issue below, this can occasionally cause issues, but in my experience it does so rarely if ever.

1. https://github.com/python/cpython/issues/77906

2. https://docs.python.org/3/library/multiprocessing.html#conte...

da39a3ee · 2023-07-31T13:09:23

Is what you're describing only true of the "Framework" Python build on MacOS? It sounds like that's the case from a quick read of the issue you linked. I would say that people should basically never use the "Framework" Python on MacOS. (There's some insanity IIRC where matplotlib wants you to use the Framework build? But that's matplotlib)

zbentley · 2023-07-31T13:23:07

> Is what you're describing only true of the "Framework" Python build on MacOS?

No. This behavior is present on any Python 3.8 or greater running on MacOS, enforced via "platform == darwin" runtime check: https://github.com/python/cpython/pull/13626/files#diff-6836...

You can check the default process-start method of your Python's multiprocessing by running this command: "python -c 'import multiprocessing; print(multiprocessing.get_start_method())'"

AlphaSite · 2023-07-31T04:18:59

Python is also going to get a JIT eventually, so they’re fixing that too! One of the concerns with no gil was that it would make certain optimisations harder for the JIT, but it’s very cool to see both being worked on.

bmitc · 2023-07-31T02:51:14

Or just use a language that was actually designed to be something other than a scripting language?

jgalt212 · 2023-07-31T00:46:16

> Multiprocessing. The answer is to use the python multiprocessing module, or to spin up multiple processes behind wsgi or whatever.

I assume mod_wsgi under apache was not the answer here due to memory constraints. That being said, why not serve from disk and use redis for a cache. This should work well unless the queries had high cardinality.

tsimionescu · 2023-07-31T05:13:35

Serve what from disk? If they are using python, they are almost certainly writing am api server, not static files.

Waterluvian · 2023-07-30T21:39:27

No, that’s about right.

The response, which isn’t technically wrong, is “unless you’re CPU bound, your application should be parallized with a WSGI. You shouldn’t be loading all that up in memory so it shouldn’t matter that you run 5 Python processes that each handle many many concurrent I/O bound requests.”

And this is kinda true… I’ve done it a lot. But it’s very inflexible. I hate programming architectures/patterns/whatnot where the answer is “no you’re doing it wrong. You shouldn’t be needing gigs of memory for your web server. Go learn task queues or whatever.” They’re not always wrong, but very regularly it’s the wrong time to worry about such “anti patterns.”

dotnet00 · 2023-07-31T06:29:09

Yes, this is even more the case in languages that are popular with more "applied" programming audiences, like scientific computing. Telling them "no you should be using this complicated DBMS" (or whatever other acronym) is not productive.

It tends to get them exceptionally mad because their concern isn't the ideal way to write the code and architect the system, they simply want to write just enough code to continue their research, and even if they did care about proper architecture, they don't have the time or interest in learning/testing a new library for every little thing. They'd rather be putting that time reading up on their field of research.

9dev · 2023-07-31T07:36:58

This stance always rubbed me the wrong way a bit. Effectively, code is one of the tools a researcher uses to do their work. As soon as their work interacts with other people, for example when publishing a purportedly reproducible study or supplying novel algorithms to developers, they have a responsibility to deliver proper work that can be used and understood by other people. This is something we expect of every other profession, yet scientists appear to somehow have no concern for such lowly ambitions.

To be clear, I’m not advocating for data scientists to write production-grade webapps. But I absolutely think they should be bothered to write code that fulfills minimal requirements, is reproducible, documented, and mostly bug-free.

dotnet00 · 2023-07-31T08:42:39

I think data scientists tend to have a lot of overlap with computer people so expectations for them may be a bit higher, my experience comes mainly from physicists.

Reproducible, documented and bug free is fine, they care plenty about those things too, the issue is the "no you're doing it the wrong way, use this entirely different technology instead" being based almost entirely on ideological reasons.

If we take C multithreading as an example, with my superivising scientist, multithreading is fine, he's willing to put some time into learning how it works because it's valuable and has had a stable interface backed by a reliable body for a while now. But if tomorrow you came up to him and insisted that doing multithreading was wrong without a solid technical reason (eg actual bugs and an explanation of how the only way to fix it is to dump the existing code and spend a few months redesigning) you'd get shot down.

knorker · 2023-07-31T08:26:37

Well, it's like showing your plan for painting a room, and asking "I seem to get stuck here after painting all but the corner, how do I get out of the corner?". The answer actually is "don't leave the corner for last".

Or like the martial arts student asking the master "how do I fight a guy 100m away with a rifle?" - "don't be there".

threatripper · 2023-07-30T21:43:51

You have a single big data structure that can't be shared easily between multiple processes. Can't you use multiprocessing with that? Maybe mapping the data structure to a file and mmapping that in multiple processes? Maybe wrapping the whole thing in database instead of just using one huge nested dictionary? To me multi-threading sounds so much less painful than all the alternatives that I could imagine. Just adding multi-threading could give you >10x improvement on current hardware without much extra work if your data structure plays nice.

dathinab · 2023-07-30T21:55:51

> You have a single big data structure that can't be shared easily between multiple processes. Can't you use multiprocessing with that? Maybe mapping the data structure to a file and mmapping that in multiple processes? Maybe wrapping the whole thing in database instead of just using one huge nested dictionary?

ton of additional complexity, not worth it for many use-cases and anything on the line of "using multiple processes or threads to increase python performance" does have (or at least did have) quite a bunch of additional foot guns in python

In that context porting a very trivial ad-hoc application to Java (or C# or Rust, depending on what knowhow exist in the Team) would faster or at least not much slower to do. But it would be reliable estimable by reducing the chance for any unexpected issues, like less perf then expected.

Basically the moment "use mmap" or "use multi-processing" is a reasonable recommendation for something ad-hocish there is something rally wrong with the tools you use IMHO.

nine_k · 2023-07-31T04:37:59

How good is support for numpy / scipy / pandas or equivalents, if they exist, outside Python?

Actually the resulting structure should of course be dumped into an RDBMS or a graph DB and served from there more readily. Doing that takes skill and time though, which often are worth applying elsewhere.

threatripper · 2023-07-30T22:03:10

The use case I'm thinking about is very simple: One big data structure that is mostly read from and sometimes written to. Use a single mutex with a shared lock for reading and an exclusive lock for writing. Then the readers are safe and would only block during updates when one writer is active. Everything else beside the data structure can be per-thread and wouldn't interfere.

The problem why we wouldn't want to port this application to another language is 100k lines of existing code that is best written in Python and no resources to rewrite all that.

ggm · 2023-07-30T22:53:23

> Basically the moment "use mmap" or "use multi-processing" is a reasonable recommendation for something ad-hocish there is something rally wrong with the tools you use IMHO.

Hmm. So you're saying only languages which bury lock and mutex over shared data are appropriate to use for async parallelism over shared data? Because calling explicit lock() and releae() isn't that hard. However it does incur a function call overhead. I suppose some explicit in language support could minimise that partially.

dathinab · 2023-07-31T00:29:50

no I never said that

kroolik · 2023-07-30T22:06:33

One annoying part with multiprocessing in Python is that you could abuse the COW mechanism to save on loading time when forking. But Python stores ref counters together with objects so every single read will bust your COW cache.

Now, you wanted it simple, but got to fight with the memory model of a language that wasn't designed with performance in mind, for programs whose focus wasn't performance.

viraptor · 2023-07-30T23:16:43

There's gc.freeze for that now https://docs.python.org/3/library/gc.html#gc.freeze

If you load something big before forking workers, there's no CoW issue with that big structure anymore.

justinc-md · 2023-07-31T02:48:33

gc.freeze prevents considering the objects in gc, but doesn’t disable reference counting so you’ll still have CoW issues. PEP 683 introduces a way to make an object immortal which disables reference counting, which will address that issue.

TylerE · 2023-07-31T08:00:56

I'd go for a db, yeah, or if that's a really painful mapping, this, erm, is actually the sort of thing Go is pretty good at it, and it's not too hard to write a fairly simple program that will traverse your data structure and communicate via a JSON api or something. That's a useful technique in general - separate the big heavy awkward thing from your main web processes.

While I hate how verbose and inexpressive it is, Go does hit a sweet spot of fairly good performance, even multi-core, while still being GCed so it's not nearly as foreign for a native python user.

SanderNL · 2023-07-31T05:53:04

It sounds I/O heavy, but you mention it being CPU-heavy in which case I’d say Python is just not the right tool for the job although you may be able to cope with multiprocessing.

jeremycarter · 2023-07-30T21:21:23

Similar experience. Even with multi process and threads python is slow, very slow. Java, Go and .NET all provide a very performant out of box experience.

__d · 2023-07-30T23:27:34

Python is both an interpreter, and quite dynamic. Both of these lead to lower performance when compared to less dynamic, compiled solutions. All of Java, Go, and .NET are compiled and (much) less dynamic.

This is absolutely an expected outcome.

gpderetta · 2023-07-31T01:14:50

These days even elisp can be compiled. I think python need to be dragged kicking and screaming into cutting edge '80s dynamic compilation technology.

__d · 2023-07-31T01:30:29

I'm sure skilled volunteers would be very welcome.

There are numerous active, moderately serious efforts to both optimize and/or JIT Python bytecode. I think AOT compilation is mostly out-of-scope for 100% compatibility, but again, there's lots of different efforts to compile either subset languages or subsets of programs.

"Kicking and screaming" suggests some reluctance to embrace this, but I think that's probably unfair: it's just hard.

pjmlp · 2023-07-31T09:39:25

It isn't as if PyPy doesn't exist. Embracing it during the 16 years of its existence is another matter.

mike_ivanov · 2023-07-30T23:54:35

"absolutely an expected outcome."

Good day. Is it the right time to talk to you about Common Lisp?

tsimionescu · 2023-07-31T05:22:26

To be fair, if you use CL in a similarly dynamic way as Python (don't compile anything, don't add any declarations etc) it won't be that much faster. You'll get some boost out of the stdlib stuff being compiled already, but otherwise it will incur similar performance penalties.

pjmlp · 2023-07-31T09:37:15

We can add Smalltalk, SELF, Dylan, JavaScript into the discussion then.

__d · 2023-07-31T12:43:08

And maybe Strongtalk

pjmlp · 2023-07-31T15:06:10

Kind of, I left it out on purpose, as it was designed with strong typing in mind, and I only wanted to list dynamic languages with good JIT support.

__d · 2023-07-31T01:13:59

Always a good time.

cypress66 · 2023-07-30T23:40:38

Node is pretty performant for anything IO related, not compiled and reasonably dynamic.

__d · 2023-07-31T01:22:21

I think it's worth the clarification that Javascript is usually JITed; (C)Python isn't.

And that CPython's I/O isn't really the problem: some of its async event loop implementations are fairly competitive with Node.

But still ... yes.

Javascript has benefited from two decades of intensive, well-funded work by the best people in the business, with clear focus on performance as a high priority goal. Not to take away from those who work on Python, but I think it's fair to say the effort has had orders of magnitude difference.

I don't have a deep enough understanding to say whether the nature of Python or Javascript makes one better suited for performance optimization than the other. Python is perhaps able to benefit from seeing what's been done with Javascript, although of course Javascript has stood on the shoulders of its own giants.

ActorNightly · 2023-07-31T00:01:10

3.11 and on should be comparable to Java for most use cases with multiprocessing (set up correctly of course)

geysersam · 2023-07-31T01:07:34

How do you mean? 3.11 is something like 10-20% faster than earlier Python releases. Why should that make it comparable to Java? Typically Java is still several times faster than Python, and this is totally natural since Java performance benefits from static type declarations and the language is generally less dynamic than Python.

That said I still use Python for CPU intensive tasks since in my experience Numpy/Scipy/Numba etc does a good job speeding up the CPU intensive parts of Python code.

oivey · 2023-07-31T02:00:07

Static type declarations don't make Java fast. The compiler does. Dynamically typed languages with no type declarations can be very fast if the compiler can infer the types.

That's not to say that Python will ever get there. My understanding is that the design of the language and leaky implementation details make generally compiling Python to fast machine code nearly impossible.

TillE · 2023-07-31T02:23:28

Well, we already have a mature, real-world Python JIT in PyPy, with impressive performance.

I dunno if Python is ever gonna be as fast as Java or C#, but we know it can be much better.

metadat · 2023-07-31T03:34:53

I can't find any benchmarks of PyPy vs OpenJDK or GraalVM, but unless I'm mistaken it's still more than 100% difference, and maybe much, much more for pure-Python vs. Java.

mattip · 2023-07-31T04:54:33

Here ya go. On these sometimes one is faster, sometimes the other. https://github.com/kostya/jit-benchmarks/blob/master/README.... Personally i don’t like such comparisons. Benchmarking is hard and far from objective. Much of what makes python popular is the developer experience. Generic benchmarks will only give a rough guide about what to expect in your application. If you are in a niche like the OP, you will have to figure out how to handle your bottlenecks.

metadat · 2023-07-31T05:05:16

Eagerly awaiting no-Gil Flask vs. Dropwizard performance analysis.

strictfp · 2023-07-30T22:55:35

My tip for this is Node.js and some stream processing lib like Highland. You can get ridiculous IO parallelism with a very little code and a nice API.

Python just scales terribly, no matter if you use multi-process or not. Java can get pretty good perf, but you'll need some libs or quite a bit of code to get nonblocking IO sending working well, or you're going to eat huge amounts of resources for moderate returns.

Node really excels at this use case. You can saturate the lines pretty easily.

hughesjj · 2023-07-31T00:44:34

0_o

Did I miss something? Does nodes/highland have good shared memory semantics these days?

I've always felt the best analogy to python concurrency was (node)js, but I admittedly haven't kept up all that well.

goatlover · 2023-07-31T04:39:22

Wouldn't Elixir or Go be better for this use case? Node still blocks on compute heavy tasks.

porridgeraisin · 2023-07-31T02:19:57

I think they mentioned CPU intensive work, which I'm taking to imply that it's more CPU bound than I/O bound. So unless you're suggesting they use Node's web workers implementation for parallelism, the default single threaded async concurrency model probably won't serve them well.

pid-1 · 2023-07-31T04:00:00

Isn't Node single threaded, just like Python?

krylon · 2023-07-31T09:38:46

Python is technically multithreaded, but the GIL means only one thread can execute interpreter code at a time. If you use libraries written in C/C++, the library code can run in multiple threads simultaneously if they release the GIL.

I vaguely recall Node used to run multiple threads under the hood for disk I/O, but it might use kqueue/epoll these days.

strictfp · 2023-07-31T09:53:33

Node is essentially a single-threaded API to a very capable multithreaded engine.

https://youtu.be/ztspvPYybIY

rrishi · 2023-07-30T22:13:17

I am not too deeply experienced with Python so forgive my ignorance.

But I am curious to understand why you were not able to utilize the concurrency tools provided in Python.

A quick google search gave me these relevant resources

1. An intro to threading in Python (https://realpython.com/intro-to-python-threading/#conclusion...)

2. Speed Up Your Python Program With Concurrency (https://realpython.com/python-concurrency/)

3. Async IO in Python: A Complete Walkthrough (https://realpython.com/async-io-python/)

Forgive me for my naivety. This topic has been bothering me for quite a while.

Several people complain about the lack of threading in Python but I run into plenty of blogs and books on concurrency in Python.

Clearly there is a lack in my understanding of things.

Jtsummers · 2023-07-30T22:50:39

Re (3): asyncio does not give you a boost for CPU bound tasks. It's a single-threaded, cooperative multi-tasking system that can (if you're IO bound) give you a performance boost.

hughesjj · 2023-07-31T00:40:10

Ehhh I mean you're not wrong, but I wouldn't say you're fully right either.

You can absolutely send stuff to a thread pool executor or process pool executor and then never await the returned value/never have it "return until interrupted, but the issues with shared memory (or really, the lack thereof in comparison to ex C) are still present to my understanding.

Then again, I mean you can always spin up a sqllite server or something on the same machine, but that's stupid heavy and more of a workaround than a solution. Super excited for nogil.

https://docs.python.org/3/library/concurrent.futures.html#co...

mypalmike · 2023-07-31T01:39:36

Not sure why you mention "thread pool executor", which of course does not get you concurrency due to the gil.

xcv123 · 2023-07-31T04:21:55

Pedantic nerd nitpick: it gives you concurrency but not parallelism. (Concurrent threads can be time sliced on one core)

IshKebab · 2023-07-31T07:29:55

It was clear from the context that he meant concurrently running not concurrently in progress. I wish nerds would give up on this parallelism/concurrency pedantry or at least choose some new nomenclature that didn't conflict so massively with the English meaning of "concurrent".

I mean it's not even right. Most parallel/concurrent pedants would consider multithreaded code to be "parallel" even if it is running on a single core.

I think the best thing is to talk about threads, because then you can distinguish e.g. OS threads and hardware threads.

xcv123 · 2023-07-31T19:55:27

> I mean it's not even right. Most parallel/concurrent pedants would consider multithreaded code to be "parallel" even if it is running on a single core.

If you're using these as technical terms which have specific technical definitions, then concurrency and parallelism are distinct but related concepts, and parallel computing means executing code simultaneously on separate execution units, not time sliced on a single core. So yes I am actually right about this. Parallel computing is defined this way in CS and it's not a matter of opinion.

da39a3ee · 2023-07-31T13:17:28

> I mean it's not even right. Most parallel/concurrent pedants would consider multithreaded code to be "parallel" even if it is running on a single core.

Hm, I think running on a single core is the exact definition of what the "pedants" say is not parallel. If all you have is one core then you can't achieve parallelism under their definition.

I think the terminology is pretty well established now. But I do agree with you that it's a bad choice of words and that it's annoying, intelligent even, for people to pick a particular unintuitive definition and then go around brow-beating people for not using/understanding their definition.

wmwmwm · 2023-07-30T22:31:06

You can throw python threads at it, but if each request traverses the big old datastructure using python code and serialises a result then you’re stuck with only one live thread at a time (due to the GIL). In Java it’s so much easier especially if the datastructure is read only or is updated periodically in an atomic fashion. Every attempt to do something like this in python has led me to having to abandon nice pythonic datastructures, fiddle around with shared memory binary formats, before sighing and reaching for java! Especially annoying if the service makes use of handy libraries like numpy/pandas/scipy etc!

teraflop · 2023-07-30T22:24:31

The whole point of the GIL is that even if you use Python's threading or asyncio, you don't get any benefits from scaling beyond a single CPU core, because all of your threads (or coroutines) are competing for a single lock. They run "concurrently", but not actually in parallel. The pages you linked explain this in more detail.

In theory, multiprocessing could allow you to distribute the workload, but in a situation like OP describes -- just serving API requests based on a data structure -- the overhead of dispatching requests would likely be bigger than the cost of just handling the request in the first place. And your main server process is still a bottleneck for actually parsing the incoming requests and sending responses. So you're unlikely to see a significant benefit.

aardvark179 · 2023-07-30T22:24:01

Threading in Python is fine if your threads are io bound or spend their time in a C extension which releases the GIL, if you are bound then the GIL means effectively one thread can run at a time and you gain no advantage from multiple threads.

indeedmug · 2023-07-31T03:19:59

I had this misunderstanding for a long time until I saw Go explain the difference: https://go.dev/blog/waza-talk

The confusion here is parallelism vs concurrency. Parallelism is executing multiple tasks at once and concurrency is the composition of multiple tasks.

For example, imagine there is a woodshop with multiple people and there is only one hammer. The people would be working on their projects such as a chair, a table, etc. Everyone needs to use the hammer to continue their project.

If someone needed a hammer, they would take the single hammer and use it. There are still other projects going on but everyone else would have to wait until the hammer is free. This is concurrency but not parallelism.

If there are multiple hammers, then multiple people could use the hammer at the same time and their project continues. This is parallelism and concurrency.

The hammer here is the CPU and the multiple projects are threads. When you have Python concurrency, you are sharing the hammer across different projects, but it's still one hammer. This is useful for dealing with blocking I/O but not computing bottlenecks.

Let's say that one of the projects needs wood from another place. There is no point in this project to hold on to the hammer when waiting for wood. This is what those Python concurrency libraries are solving for. In real life, you have tasks waiting on other services such as getting customer info from a database. You don't want the task to be wasting the CPU cycles doing nothing, so we can pass the CPU to another task.

But this doesn't mean that we are using more of the CPU. We are still stuck with a single core. If we have a compute bottleneck such as calculating a lot of numbers, then the concurrency libraries don't help.

You might be wondering why Python only allows for a single hammer/CPU core. It's because it's very hard to get parallelism properly working, you can end up with your program stalling easily if you don't do it correctly. The underlying data structures of Python were never designed with that in mind because it was meant to be a scripting language where performance wasn't key. Python grew massive and people started to apply Python to areas where performance was key. It's amazing that Python got so far even with GIL IMO.

As an aside, you might read about "multiprocessing" Python where you can use multiple CPU cores. This is true but there are heavy overhead costs to this. This is like building brand-new workshops with single hammers to handle more projects. This post would get even longer if I explained what is a "process" but to put it shortly, it is how the OS, such as Windows or Linux, manages tasks. There is a lot of overhead with it because it is meant to work with all sorts of different programs written in different languages.

wood_spirit · 2023-07-30T22:03:57

That’s right.

In the past, for read-only data, I’ve used a disk file and relied on the the OS page cache to keep it performant.

For read-write, using a raw file safely gets risky quickly. And alternative languages with parallelism runs rings around python.

So getting rid of the GIL and allowing parallelism will be a big boon.

xcv123 · 2023-07-30T21:21:07

> I may have missed something

You did not miss anything. The GIL prevents parallel multi threading.

brightball · 2023-07-30T21:52:21

This is actually one of the reasons I was drawn to Ruby over Python. Ruby also has the GIL but jRuby is an excellent option when needed.

antod · 2023-07-30T22:20:08

I wonder what lead to JRuby attracting support while Jython not? I know the Jython creator went on to other things (was it eg IronPython for dotnet?). I suppose it was the inverse with dotnet - eg IronPython surviving while IronRuby seems dead.

Is it just down to corporate sponsorship?

brightball · 2023-07-30T23:21:25

JRuby has been pretty actively maintained for about 15 years and had a big release this year.

It’s an impressive project.

krylon · 2023-07-31T09:41:44

I looked into it a long time ago (~10-12 years?), and was disappointed JRuby could not use extensions written in C. It's not surprising in retrospect, for obvious reasons, but has there been some progress in this area?

empthought · 2023-07-31T02:19:44

Twitter used JRuby and invested heavily for a time.

severino · 2023-07-31T12:13:07

May I ask why you didn't consider writing that quick implementation in Java in the first place?

datadeft · 2023-07-31T20:15:31

I don't think that Python was designed for this. I found it largely unsuited for such work. It is much easier to saturate IO with (random order) F#, Rust or Java (that I have used for in scenarios you mentioned).

nesarkvechnep · 2023-07-31T11:58:05

If your data doesn't change, you can leverage HTTP caching and lift a huge burden off of your service.

TylerE · 2023-07-31T01:59:25

Spin up as many processes as you need, map connections 1:1 to processes if possible.

lfkdev · 2023-07-31T07:53:12

You could have just use gunicorn and spawn multiple workers maybe

vorticalbox · 2023-07-31T17:05:49

Why not load the data into sqlite dB and let the clients query that? Is there a reason you're loading 10s/100s gb into memory?

qbasic_forever · 2023-07-30T23:43:04

Are you just reading from this data structure? If so I wouldn't do any locking or threading, I'd just use asyncio to serve up read requests to the data and it should scale quite well. Multithreading/processing is best for CPU limited workloads but this sounds like you're really just IO-bound (limited by the very high IO of reading from that data structure in memory).

If you're allowing writes to the shared data structure... I'd ask myself am I using the right tool for the job. A proper database server like postgres will handle concurrent writers much, much better than you could code up hastily. And it will handle failures, backups, storage, security, configuration, etc. far better than an ad hoc solution.

Jtsummers · 2023-07-31T00:00:59

> I'd just use asyncio to serve up read requests to the data and it should scale quite well.

Quoting GP:

>> often CPU heavy

We have to take their word for it that it's actually CPU heavy work, but if they're not lying and not mistaken then asyncio would do nothing for them.

tsimionescu · 2023-07-31T05:30:30

Reading from memory is really not IO. Perhaps you're suggesting doing something like mmapping a file to memory, putting the data structure in that memory, and then using asyncio on the file to serve things, but this would only work if you can compute byte ranges inside the file to serve ahead of time, in which case there are much simpler solutions anyway. Most likely, when receiving a query they need to actually search through the datastructure based on the query, and it's very likely that this is the bottleneck, not just reading some memory.

wodenokoto · 2023-07-31T03:43:55

When it was an in dev project, I felt the consensus on HN was that it was amazing work and a shame that it looked like the steering committee wouldn’t adopt it.

Now they have and everyone seems to hate it.

BiteCode_dev · 2023-07-31T06:12:53

It's the eternal pendulum:

- take no risk, and people will blame the project for being static.

- take risks, and people will blame the project for being reckless.

E.G:

- don't adopt a new feature, and your language is old, becoming irrelevant, and a wave of comments will tell you how they just can't use it for X because they don't have it.

- break compat, and you will have a horde stating you don't care about users that need stability. You got one comment in this thread talking about "the python treadmill"!

And all that for an open source project most don't contribute to and never paid a dime for.

antupis · 2023-07-31T07:02:06

World would need one more language which would have very barebone core something like very minimal go or python but strong metaprogramming features so you could expand language if you need.

wanderingmind · 2023-07-31T10:50:25

You are thinking mojo [1] that's claims full python compatibility but can be extended for static typing and high performance scenarios

[1] https://www.modular.com/mojo

mathisfun123 · 2023-07-31T11:24:24

How many lines of mojo have you written?

gchamonlive · 2023-07-31T13:00:13

LISP has been around since the 60s

BiteCode_dev · 2023-07-31T11:38:38

It wouldn't stay barebone, or would stop being used. That's the point.

smcl · 2023-07-31T07:45:36

Well these likely will be entirely different groups of people voicing their opinions at different times. I don't imagine those who were enthusiastic about the project originally have done an about-face and now hate it.

paulryanrogers · 2023-07-31T03:47:48

My guess, it's easier to dismiss the downsides of something likely to fail, and likewise focus on the positives. Now that the unexpected has happened reality demands more consideration for both.

thiht · 2023-07-31T13:29:31

Almost as if there was more than 1 person on the internet

rightbyte · 2023-07-31T08:51:23

It is probably language design enthusiasts push all these backwards incompatibilities into Python because they are not the users of the language.

They are a different group from those having their code broken in a never ending incompatibility churn.

Well atleast it gives us jobs ...

gjulianm · 2023-07-31T11:38:13

I'm one of those happy to see the GIL removed. I've had troubles with the 2->3 transition and more recently with the 3.6 EOL, which wasn't as traumatic but still a little bit troublesome. Despite that, I prefer another transition and being able to actually use parallelism in Python rather than rewriting a huge codebase in a different language, and losing the advantages of Python.

BiteCode_dev · 2023-07-30T19:38:46

Summary:

- Python without the GIL, for good

- LPython: a new Python Compiler

- Pydantic 2 is getting usable

- PEP 387 defines "Soft Deprecation", getopt and optparse soft deprecated

- Cython 3.0 released with better pure Python support

- PEP 722 – Dependency specification for single-file scripts

- Python VSCode support gets faster

- Paint in the terminal

swyx · 2023-07-31T02:45:07

great recap thanks!

thiht · 2023-07-31T13:31:28

It's literally the summary at the top of the article

Doesn't anyone click on links anymore?

swyx · 2023-07-31T15:46:56

just trying to compliment the author on a useful blogpost, calm down.

thiht · 2023-07-31T17:35:12

Oh I didn’t realize it was the author of the blogpost that also posted this summary. I thought someone copy pasted it because « it saves a click », as it sometimes happen!

BiteCode_dev · 2023-08-01T05:33:24

I started to write summaries for all articles everywhere I post. No need for people to waste their time if they are not interested in the topic.

This also keeps my traffic stats clean: people that comes are the ones that are interested in the content. The number are smaller, but closer to what my real readership looks like. Since I have no ads, I don't aim for volume, so I rather know the truth.

fbdab103 · 2023-07-30T21:06:02

Still not encouraged by the no-GIL, "We don't want another Python 2->3 situation", yet very little proffered on how to avoid that scenario. More documentation on writing thread-safe code, suggested tooling to lint for race conditions (for whatever it is worth), discussions with popular C libraries, dedicated support channels for top tier packages, what about the enormous long-tail of abandoned extensions which still work today, etc.

doctoboggan · 2023-07-30T21:36:38

The big and obvious difference is that all the GIL vs no-GIL stuff happens in the background and your average python dev can just ignore it if they want to. The interpreter will note if you have C extensions that don't opt in to no-GIL and then will give you the GIL version.

This is _very_ different to the 2-to-3 transition where absolutely every single person, even those who couldn't care less, had to change their code if they wanted to use python 3.

crabbone · 2023-07-31T09:49:28

> your average python dev can just ignore it if they want to.

Oh, so naive... All the mutation code in Python which "worked" because Python didn't really have any real concurrency. Add to it -- there's no real plan about what to do with Python concurrency. Removing GIL is only one "half" of the problem, you need to give developers some sort of a framework to use to deal with concurrency. Python's threads are extremely underdeveloped and dangerous to use. Python doesn't even have anything like "synchronized" from the Java world. So, all synchronization requires dealing with locks, mutexes, condition variables...

Most Python programs today didn't bother to deal with threads because they didn't confer enough benefits to be worth using. So, "automatically" parallelizing Python code, as in allowing it to run in actual threads is going to bring about lots and lots of bugs in trivial code written by people with no clue about concurrency.

quietbritishjim · 2023-07-31T10:36:20

> So, all synchronization requires dealing with locks, mutexes, condition variables...

As always, by far the best way to interact between threads is to use thread-safe queues (AKA message passing). Luckily, Python has one of those [1]. No complicated synchronisation needed.

[1] https://docs.python.org/3/library/queue.html

crabbone · 2023-07-31T23:28:55

That's just completely missing the point of threads... but that wouldn't be the first nor the hundreds stupid thing found in Python's documentation.

The reason to want threads is to be able to share memory. That's literally why they were created. If you are sending messages instead of sharing memory, you don't need threads. You need something like Erlang processes.

The problem is that people who wrote Python never had a plan. They sucked and still suck as programmers. So... they knew there are threads. And it was easy to write a bunch of wrappers around pthreads. And that's what they did. And then they realized they don't know how to deal with concurrency, so they found a simple way out -- GIL.

The whole history of Python is the history of choosing the easy but wrong. And it's probably the only consistent thing about the language.

quietbritishjim · 2023-08-01T21:08:40

The object going onto the queue is the shared memory. The queue itself is essentially a fancy type of lock.

Yes you could use multiple processes, but you have the extra expense of serialisation, or you could use shared memory but you'd have to administer that and you'd still have the expense of context switches. And inevitably, yes, there is some actual shared state like a logger and modules that you've loaded, which again would be a pain in multiple processes.

You can call a Python thread plus a queue an Erlang process if you like, or say that I should use Erlang processes instead. But the fact is, the Python version works perfectly well for many problems. It does all the things that you typically need if threads: shares state (via the queues), let's you concurrently use the CPU (via C libraries that release the GIL – but no GIL would be even better), and writing blocking IO if you wish. Not missing the point at all.

The developers of Python didn't "suck as programmers" and it doesn't help your point to claim they do. Guido choose to use the GIL because he was OK with multiple threads but not at the expense of single thread performance, and no one showed any solution to that that beats the GIL – until now. (Personally I think the trade off was wrong, and a small hit to single threaded performance would have been with it. But that's different from being ignorant to the fact there was an actual reason.)

gjulianm · 2023-07-31T11:41:43

Which code is automatically going to run in threads? As you say, basically nobody uses Python threads. So even enabling no-gil, nothing is going to change because sequential code will still be sequential.

crabbone · 2023-07-31T23:37:22

> As you say, basically nobody uses Python threads.

Not at all. I'm saying that a sizable portion of Python libraries is completely unaware of threads. But they can still take foreign-own object and operate on them as if threads didn't exist.

So, imagine a simplified hypothetical scenario, where one library has a function for counting keys in a dictionary. This library was written by someone unaware and unwilling to acknowledge thread existence. So, if the dictionary it counts the keys of is modified in a separate thread -- boom! But, third-party code using that library has no easy way of knowing if the library is prepared to deal with threads, and may have been using it for a while, until, again boom!

Now, to make this more concrete: have you ever heard of Boto3, the AWS client library? Well, it does roughly what's described in the paragraph above -- it manipulates a bunch of its own objects in a non-thread-safe way. But, you would really want to use it in threads because that makes it so much easier to manage things like rate-limiting (across multiple clients), and, obviously, you don't want to deploy a large fleet of VMs one-by-one. The end result? -- boom!

gjulianm · 2023-08-01T06:58:03

Of course a lot of libraries are not thread safe. However, that's not at all rare, lots of libraries for other programming languages aren't thread safe either. My point is that those libraries won't start magically crashing when running in no-gil mode unless the dev using them starts using threads in Python. Yes, it's hard to know which libraries are thread-safe and which ones aren't, and just like any other language you should default to "not thread safe" unless the developer explicitly says otherwise or you inspect the code.

quietbritishjim · 2023-07-31T16:06:06

> basically nobody uses Python threads

Not true at all. Plenty of people (including me) use threads in Python for:

* Blocking I/O

* CPU heavy libraries written in C (as those release the GIL)

They work fine, even with the GIL. They only work badly if you want to run a lot of pure-Python (non-I/O) code in multiple threads - which, fair enough, sometimes you might want to do, and the GIL is a problem for that.

woodrowbarlow · 2023-07-31T12:16:26

any existing async/await code.

crabbone · 2023-07-31T23:44:58

Async is for asynchronous I/O, i.e. such I/O that is only possible through file descriptors that support something like epoll() (i.e. network sockets).

It's a thing completely separate from threads, where no code is supposed to run concurrently. The idea of this feature is that a program may schedule a bunch of I/O operations and then wait for their completion instead of scheduling I/O operations one at a time.

As of now, this is an obsolete mechanism of dealing with I/O as now we have uring_io. But, truth be told, it never really worked well... I mean, if you knew what you were doing, you could have taken advantage of this feature, but it was never in shape to be library-grade multi-purpose functionality.

Python made a stupid bet on it and encoded it into the language through async / await keywords. But if this was the only stupid thing Python has done in its history, flying cars and hoverboards would probably be an integral part of our daily lives.

woodrowbarlow · 2023-08-01T13:53:23

i know what async/await is. the GIL is the only (technical) thing preventing async/await from being concurrent-by-default. nearly every other language that has async/await (or promises) is concurrent-by-default. people will want to run their async python code concurrently, but most async code will not "just work".

dragonwriter · 2023-08-01T14:20:46

> the GIL is the only thing preventing async/await from being concurrent-by-default

Async/await is concurrent, that’s the whole point. Its not usually parallel, because the asyncio runtime (and, IIRC, all the major alternate runtimes) schedules tasks on the same thread, and if there was a multithreaded runtime, its parallelism would be limited by the GIL to only actually having multiple threads making progress if all but one were in native code that released the GIL.

> nearly every other language that has async/await (or promises) is concurrent-by-default.

JavaScript isn't parallel for async/await. Ruby has multiple async/promises implementations, some of which are parallel (use separate threads) to some degree even with the GVL (which is like Python’s GIL), and others are not. (all are, of course, concurrent.)

The GIL limits the value of a multithreading async/await runtime, but it doesn’t prevent it, and a GILectomy doesn’t buy you one for free (or make a multithreading a cost-free choice.)

gjulianm · 2023-07-31T13:12:54

async/await code already runs in threads, so that's not really a change.

quietbritishjim · 2023-07-31T16:03:34

What do you mean it already runs in threads? It does so if you specify it with run_in_executor [1], or if you run multiple event loops at once, but it doesn't automatically.

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asy...

gjulianm · 2023-07-31T16:16:29

Woops, yeah, you're right. I thought the default executor was a threadpool, my mistake. However, in that case, I assume that the default executor will not change to multithreaded when no-gil comes.

quietbritishjim · 2023-07-31T18:03:01

There isn't an executor at all, at least not in the concurrent.futures.Executor sense. It just runs in the thread where you call asyncio.run.

quickthrower2 · 2023-07-31T04:13:37

But you need to pick your horse. In 5 years time, Python will either be GIL or no GIL, and it is hard to tell which. It might be a setting (which might be more ideal).

If you assume nogil, you need to choose dependencies that support that. You may need to trade off: eschew dependencies that aren't looking like they will be nogil compatible by the deadline. You are stuck on Python 3.18 maintenance branch or whatever, rather than the 3.19 (in reality .. 4.0) version.

Or choose gil then you can use everything. But is there a prisoners dilemma - everyone picks gil, uses whatever dependencies, library maintainers assuming this don't bother to add nogil support, and then the decision becomes to stick to gil, which if you suspect will happen makes you reason even harder not to support nogil.

doctoboggan · 2023-07-31T04:19:39

I don’t really understand this. Unless I am missing something you should always pick the “no GIL” version as that will work with or without a GIL. Thread safe No GIL code would be totally fine to run on python compiled with the GIL with zero modifications.

Because of this I don’t expect there to be multiple versions of any library. Once a library does the (admittedly heavy) lift to no GIL it will just be the main version of that library going forward.

quickthrower2 · 2023-07-31T04:24:19

Each library maintainer (probably mostly volunteers) has to decide whether to put effort into making their code thread safe. Clearly it won't be 100% of libraries that "upgrade".

Then on top of that, they know their effort might be for nothing if the decision is made to keep Python GIL-only all along (one of the possible 3 outcomes at the end of the 5 years: ["gil", "nogil", "both supported").

Joeboy · 2023-07-31T09:15:53

> Clearly it won't be 100% of libraries that "upgrade".

I'm wondering how many libraries with binary extensions are actually in common use. Like, maybe 90% of python projects use a subset of a few hundred such packages?

That's a hassle if you maintain one of those packages, and will be a bit disappointing if in 5 years' time you're still depending on GIL-reliant packages.

But it's nothing like the chaos of the python 2-3 changes, where ~100% of python files in every package and end-user project had to be fixed.

I only learned about this this morning though, it's very possible I'm missing something. A lot of the concerns people are raising look a bit overblown to me.

I take the point that after so many abortive GIL removal attempts, it's harder to be confident this one will happen. But having the go-ahead from the steering council seems like a good indicator this one has traction.

gjulianm · 2023-07-31T11:53:16

But thread-unsafe code is not the same as incompatible code. That's the point. You can just choose to say "NOT THREAD SAFE" (just as many C libraries aren't thread safe and need to be wrapped in locks to be used by multiple threads) and users will still be able to use it. More importantly, if it's a pure Python extension, you can just not modify the library and the users will still be able to use it whether or not they have gil or no-gil.

doctoboggan · 2023-07-31T04:37:30

That’s true. I was more thinking from the perspective of a library user not library dev. I suspect for some classes of problem going no GIL will be so tantalizing that the work will definitely be done. Either in the incumbent library or an upstart will come out and take over the community with no GIL support.

kzrdude · 2023-07-31T09:21:30

Current plan says there has to be separate builds per module, as if it is an ABI break. Would be much better if it could be combined into one build. Hopefully necessity triggers some invention here.

ynik · 2023-07-31T09:57:59

There's no way to make it work with the old ABI. Because sizeof(PyObject) is fixed in the old ABI, there's simply no way to attach additional information (e.g. the new cross-thread ref count) to every Python object. The Python ABI (even the "limited" stable ABI) exposes too many implementation details, it's not really possible to make any fundamental changes to the Python interpreter without breaking that ABI.

You could have a single new ABI supporting both no-GIL and with-GIL, but it wouldn't be compatible with the existing stable ABI.

gjulianm · 2023-07-31T11:50:35

You're missing something, which is that a lot of libraries will be "i-don't-care-about-gil". Only native extensions need to choose GIL or noGIL due to the ABI difference, but pure Python libraries should run with the same code in both variants. And a lot of them will probably be thread safe at some level (function or class) without any changes. For those that aren't thread-safe, I bet that quite a lot can just get away with a "NOT THREAD SAFE" warning and letting the user wrap access to them with locks.

And that's talking about multithreaded code. I bet that even with noGIL, lots of Python code will still continue to be single-threaded, making the gil/no-gil decision irrelevant (save for those native extensions).

zarzavat · 2023-07-31T02:43:46

But at least after the transition you could stop caring. NoGIL makes maintainers’ lives worse permanently because now you have to care about it forever if you publish a library.

doctoboggan · 2023-07-31T03:07:30

Why? Once you make your code thread safe it can be run as-is on python compiled with a GIL.

LexiMax · 2023-07-30T21:23:53

In a past life I hacked on PHP for a living, and in the time it took Python 2 to ride off into the sunset, PHP got two major migrations under its belt in 5.2 to 5.3, and then again 5.6 to 7.0.

It was amazing to see the contrast between the two languages. PHP gave you plenty of reasons to upgrade, and the amount of incompatible breaking changes was kept to a minimum, often paired with a way to easily shim older code to continue working.

I really hope to see no-GIL make it into Python, but in the back of my mind I also worry about what lessons were learned from the 2 to 3 transition. Does the Python team have a more effective plan this time around?

charrondev · 2023-07-30T21:36:23

I’ve taken an application codebase from PHP 5.3 to 8.2 now and it was relatively easy the whole way.

The real key to minimize the pain was writing effective integration tests with high coverage. We didn’t have a good test suite to start but once we added some utilities to easily call our various endpoints (and internal API client if you will) and make assertions about the coverage came quickly.

Popular frameworks like Laravel offer such test utilities out of the box now.

That combined with static analysis tools like psalm make it so we can fearlessly move past major upgrades.

One thing I was surprised at was just how much crap PHP allowed with just a notice (not even a warning for a long time). A lot of that stuff still works (although over time some notices have progressed to warnings or errors gradually). We have our test suite convert any notices or warnings to exceptions and fail the test case.

acdha · 2023-07-30T23:13:59

> The real key to minimize the pain was writing effective integration tests with high coverage

I think this makes it really hard to do comparisons: I’ve done Python 2 to 3 migrations which took an hour or two because the code had tests and was well-maintained, and PHP migrations which were painful slogs without tests and sloppy code (“is this ignored error new or something we should have fixed in the 2000s?”). Most developers don’t have enough data points to say whether the experience they had was due to the language or the culture.

charrondev · 2023-07-31T01:32:17

I’m not familiar enough with the python transition to say much. I can think of a few things that the PHP developers did that helped make the transition easier:

- multibyte aware string functions were implemented as a separate (and optional) extension with separately named functions (prefixed with mb) and there was a popular community polyfill from the Symfony project (and is for many new language functions). - Weird sloppy behaviours (like performing array access on a Boolean, or trying to access a property on null, and many more than would silently just turn into null/false) had lengthy deprecation periods and if you had error logging turned on you could clean these up relatively easily even without a big test suite.

acdha · 2023-07-31T01:38:11

> multibyte aware string functions were implemented as a separate (and optional) extension with separately named functions (prefixed with mb)

Python had a different take on this with some interesting psychology: you had a new string type which had to explicitly be converted (i.e. concatenating a Unicode string with a byte string causes an exception), which had a stark divide. Projects which had previously handled Unicode correctly converted almost trivially, but the projects which had been sloppy were a morass trying to figure out where Unicode was desirable and where you really needed raw bytes. Almost all of the code I saw where this was a problem didn’t handle Unicode properly but the developers _hated_ the idea of the language forcing them to fix those bugs.

LexiMax · 2023-07-31T12:02:43

There were valid reasons to be upset at Python 3's handling of Unicode.

- https://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

- Discussion: https://news.ycombinator.com/item?id=7732572

- https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journ...

- Discussion: https://news.ycombinator.com/item?id=22036773

Chalking these complaints up to bad development practices is _precisely_ the reason why the Python 3 migration was handled so poorly. If this attitude is repeated for no-GIL Python, it will fail.

threatripper · 2023-07-30T22:11:51

I was assuming that no-GIL will only be enabled if all imported libraries support it. That means that they are marked as no-GIL ready and otherwise the import would throw an exception. Not sure how it is implemented now but that sounded very reasonable to me. The no-GIL compatible code would start with the core libraries and then expand from that. Using legacy libraries just means that you have to revert back to GIL-mode. Any no-GIL enabled library should 100% still function in GIL-mode, so I don't expect the Python 2->3 transition situation to repeat.

rtpg · 2023-07-31T01:19:51

> what about the enormous long-tail of abandoned extensions which still work today, etc.

I mean there they're talking about keeping GIL in (and I imagine that will be the case for many many years) so those would still keep working. The fear is if some libraries just drop GIL-ful support, but there too I am hopeful for that not to be the case.

thiht · 2023-07-31T13:30:56

> Note that if the program imports one single C-extension that uses the GIL on the no-GIL build, it's designed to switch back to the GIL automatically. So this is not a 2=>3 situation where non-compatible code breaks.

Sounds good enough to me, am I missing something?

schemescape · 2023-07-30T22:30:10

The title says "GIL removed", but the article says "This means in the coming years, Python will have its GIL removed."

I'm assuming the article is correct and the GIL has not been removed yet (but there is a plan to remove it in the future). If that's not the case, please correct me!

Jtsummers · 2023-07-30T22:40:19

It's not been removed. PEP 703 has been accepted and they've got a path forward to no-GIL. No-GIL versions will be available as experimental versions starting with 3.13 or 3.14.

https://peps.python.org/pep-0703/

https://discuss.python.org/t/a-steering-council-notice-about...