Hacker News new | past | comments | ask | show | jobs | submit login

I just wrote a post about how the Cpython is much faster without GIL:https://news.ycombinator.com/item?id=40988244



I mean, only the threaded version, which is expected. For tons of cases Python without the GIL is not just slower, but significantly slower; "somewhere from 30-50%" according to one of the people working on this: https://news.ycombinator.com/item?id=40949628

All of this is why the GIL wasn't removed 20 years ago. There are real trade-offs here.


30-50% is an understatement. The latest beta is more than 100% slower in a simple benchmark:

https://news.ycombinator.com/item?id=41019626


How is single-threaded code slower without GIL?


Because in the --disable-gil build data structures like ref-counting, dicts, freelists, etc. are locked, even when there is only a single thread.

This is the reason why previous attempts were rejected. But those attempts came from single individuals and not from a photo sharing website.

This matters if --disable-gil becomes the default in the future and is forced on everyone.


That cannot be the reason for a 30-50% slowdown. Uncontested locks are very fast.


They may be fast in C++, but not in the context of CPython. Here are the dirty details. Note that fine-grained locking has also been tried before:

https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...


Thanks for the link, that's an interesting read. Actually the referenced PyMutex is a good old pthread_mutex_t, the same you'd use in C or C++. But I shouldn't have written so surely. Although uncontested locks are very fast, if the loop is tight enough, adding locks will be significant.

However, PEP 703 specifically points out that performance-critical container operations (__getitem__/iteration) avoid locking, so I'm still highly skeptical that those locks are the cause of the 30-50%.

https://peps.python.org/pep-0703/#optimistically-avoiding-lo...


The pthread_mutex_t is focused on compatibility at any cost. So while you're right that the C++ stdlib chooses this too, it's not actually a good choice for performance.

But I think you're right be sceptical that somehow this is to blame for the Python perf leak.


One of the things this spends some time on that was already obsolete in 2011 is using a pool of locks. In 1994 locks are a limited OS resource, Python can't afford to sprinkle millions of them in the codebase. But long before 2011 Linux had the futex, so locks only need to be aligned 32-bit integers. In 2012 Windows gets a similar feature but it can do bytes instead of 32-bit integers if you want.

If a Linux process wants a million locks that's fine, that's just 4MB of RAM now.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: