Totally agree. Debugging features -- e.g. deadlock detection / introspection -- easily pay for themselves. If you're actually acquiring locks so frequently they matter, you should revisit your design. Sharing mutable state between thread should be avoided.
> if you're going to grab the lock so frequently that the uncontended lock/unlock time shows up as a significant percentage of your execution time, then use a spinlock.
Yeah, and maybe also consider changing your design because usually this isn't needed.
This is (in my experience) a byproduct of good design, so changing the design wouldn't be a great idea.
Every time I've seen this happen it's in code that scales really well to lots of CPUs while having a lot of shared state, and the way it gets there is that it's using very fine-grained locks combined with smart load balancing. The idea is that the load balancer makes it improbable-but-not-impossible that two processors would ever want to touch the same state. And to achieve that, locks are scattered throughout, usually protecting tiny critical sections and tiny amounts of state.
Whenever I've gotten into that situation or found code that was in that situation, the code performed better than if it had coarser locks and better than if it used lock-free algorithms (because those usually carry their own baggage and "tax"). They performed better than the serial version of the code that had no locks.
So, the situation is: you've got code that performs great, but does in fact spend maybe ~2% of its time in the CAS to lock and unlock locks. So... you can get a tiny bit faster if you use a spinlock, because then unlocking isn't a CAS, and you run 1% faster.
Read and load mean the same thing. (I think GP just missed the end of your comment.)
You care about exchange vs read/load because of cache line ownership. Every time you try to do the exchange, the attempting CPU must take exclusive ownership of the cacheline (stealing it from the lock owner). To unlock, the lock owner must take it back.
If the attempting CPU instead only reads, the line ownership stays with the lock holder and unlock is cheaper. In general you want cache line ownership to change hands as few times as possible.
On x86 you can. When xchg is used with a memory parameter it locks the bus. This is true even in the absence of a lock prefix. I included a spinlock implementation in the blog post. If you see any errors with it, then please let me know!
MSVC 2022's std::mutex is listed, though. (That said, GCC's / clang's std::mutex is not listed for Linux or macOS.)
absl::Mutex does come with some microbenchmarks with a handful of points of comparison (std::mutex, absl::base_internal::SpinLock) which might be useful to get an approximate baseline.
reply