Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are specific. See 5 in 5.1.2.4 Multi-threaded executions and data races in n1570.

In term of real hardware it translates well in MESI/MESI-like protocol on cache lines on ooo cores, without much more constraints (of course some arch are sufficiently weak to still require special instructions, but on the other hand x86 don't need anything for acq/rel). If the different cores never touch the same cache line, they don't need to do interact at all even when both execute unrelated acq/rel atomics.



Interesting. I bought C++ Concurrency in Action today and am learning all these details of the memory model.

You mention of caches makes me realize that a single-reader single-writer queue can probably also be optimized by putting the head pointer and the tail pointer on different cache lines. The reader and writer can cache the other's value on their own page, and only reload it when the queue otherwise looks full or empty, respectively. This should allow the reader and writer to act without needing to synchronize cache lines for many operations.


I think that works on some architectures but would not be guaranteed behavior. In theory on a system with no memory ordering constraints, the pointers could update before the cache lines for the underlying ring buffer. Which means that without an acquire barrier at the beginning of aring_take, which without an item count, would require writes to the shared head/tail pointers to sync anyway, you can't ensure the the data written to the ring buffer is visible to the thread, even though your pointers would indicate there's data there and so you may load either torn or completely different data into the consumer thread.

Which means that if your memory barriers are operating correctly, they're just causing two cache lines to sync for the metadata instead of one, if the pointers are split across two lines.

Whereas with an item count, only one line ever has to sync, even if the pointers are split across two, since the pointers aren't shared between threads.

In practice, the compiler might not be smart enough to realize this and might be enforcing order for all side effects before the barriers though, even if the other thread doesn't read them. So maybe this isn't a good approach.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: