Hacker News new | past | comments | ask | show | jobs | submit login

That cannot be the reason for a 30-50% slowdown. Uncontested locks are very fast.



They may be fast in C++, but not in the context of CPython. Here are the dirty details. Note that fine-grained locking has also been tried before:

https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...


Thanks for the link, that's an interesting read. Actually the referenced PyMutex is a good old pthread_mutex_t, the same you'd use in C or C++. But I shouldn't have written so surely. Although uncontested locks are very fast, if the loop is tight enough, adding locks will be significant.

However, PEP 703 specifically points out that performance-critical container operations (__getitem__/iteration) avoid locking, so I'm still highly skeptical that those locks are the cause of the 30-50%.

https://peps.python.org/pep-0703/#optimistically-avoiding-lo...


The pthread_mutex_t is focused on compatibility at any cost. So while you're right that the C++ stdlib chooses this too, it's not actually a good choice for performance.

But I think you're right be sceptical that somehow this is to blame for the Python perf leak.


One of the things this spends some time on that was already obsolete in 2011 is using a pool of locks. In 1994 locks are a limited OS resource, Python can't afford to sprinkle millions of them in the codebase. But long before 2011 Linux had the futex, so locks only need to be aligned 32-bit integers. In 2012 Windows gets a similar feature but it can do bytes instead of 32-bit integers if you want.

If a Linux process wants a million locks that's fine, that's just 4MB of RAM now.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: