
Memory Barriers (2003) - luu
https://yarchive.net/comp/linux/memory_barriers.html
======
gpderetta
> The notion of making the language more complex to be able to tell the
> compiler details like the above so that it would just do the RightThing(tm)
> is just crazy talk. Compilers are good at some things (mindless instruction
> scheduling, register allocation etc), but that shouldn't make you think they
> should be smart.

Compare this to Boehm 'Threads cannot be implemented as a library'.

The issue of course was with volatile which is useless, not with the notion of
letting a compiler know about multi-threading assumptions.

I think today Linus has begrudgingly accepted that slowly moving the kernel to
the C/C++11 acquire-release memory model, where possible, is a good thing.

Personally I had never fully grokked the reordering based memory barriers (I
understood what they did, but it was hard for me to understand all the
implications); on the other hand the C++11 memory model and its happens-before
based somehow seems much more natural and it helps me retrospectively to
understand even barrier based code.

~~~
kjeetgill
Absolutely agree. I can't wrap my head around the store-load, load store etc.
stuff but I think the Java memory model's happens before guarantees are quite
digestible.

I think C++'s is based on that one, as Doug Lea worked on both. I should
double check that factoid.

~~~
BonesJustice
It doesn’t help that there are sometimes subtle but _extremely important_
differences in the guarantees that certain types of barriers provide, be it at
the hardware level, language level, VM level, etc.

Example: does a write with release semantics prevent writes to _any address_
from being reordered past the barrier, or only writes to the _same address_?
The answer for the general definition of “write-release semantics” depends on
who you ask. The specifics for a given language, VM, or architecture may or
may not be clearly stated. And even if they are, incomplete or wrong versions
of the rules will no doubt spread across the Internet via the likes of Stack
Overflow.

~~~
BeeOnRope
It's writes to _any_ address. It would be quite useless otherwise, since it
wouldn't then have the happens-before ordering with regards to all the plain
(non-atomic) writes before that.

Also, reordering writes to the same address is uncommon to non-existent since
the obvious meaning of that would break even single threaded semantics.

~~~
BonesJustice
_> It's writes to any address. It would be quite useless otherwise ..._

Ugh, you're right, of course. My brain hasn't quite woken up today :-/. I
remember there was some subtlety about read/acquire and write/release
semantics that I consistently see different takes on.

I think it was _actually_ whether write/release semantics prevent _any memory
operation_ or _only other writes_ from being reordered, e.g., is it a
StoreStore+StoreLoad barrier or just StoreStore? And, similarly, whether
read/acquire semantics imply LoadLoad+LoadStore or just LoadLoad. My
understanding is that write-release only guarantees a StoreStore barrier, and
read-acquire only guarantees LoadLoad (and maybe LoadStore for direct
dependencies?). I have, however, seen the stronger guarantees implied many
times. The point is, regardless of which is 'correct', you are likely to
stumble across an incorrect explanation from someone who ostensibly knows what
they're talking about. And that itself is a problem.

I think the confusion stems from various languages and platforms providing
stronger guarantees than are generally required to satisfy read/acquire and
write/release. Java's `volatile`, I believe, requires both acquire+release
semantics on writes. People get used to the stronger guarantees and then
forget that language A's stronger guarantees aren't necessarily provided by
language B.

~~~
dragontamer
I think you're trying to talk about Independent-Read / Independent Write
(IRIW) Sequential Consistency
([https://stackoverflow.com/questions/50462948/acquire-
release...](https://stackoverflow.com/questions/50462948/acquire-release-vs-
sequential-consistency-in-c11))

It seems like there are roughly 4 levels of consistency

1\. Sequential Consistency -- Total ordering exists for all atomics.

2\. Acquire-Release Consistency -- Acquire barrier ensures all memory
operations before the Acquire "happened before" the read of the spinlock.
Release barrier ensures all memory operations before the release barrier
"happens before" the write of the spinlock.

3\. Consume-Release Consistency -- C++11 hypothetical: has not been used by
C++ Compilers yet. This weaker memory ordering is a guarantee in some older
chips but very few people seem to understand it.

4\. Relaxed Consistency -- Atomics remain atomic, but no ordering is
specified.

\---------

It seems like #2 is sufficient for most cases, #1 is needed for a few obscure
cases (the IRIW / Independent read-Independent write case). #3 is (probably)
sufficient in cases where pointers are being used as the synchronization
point, but very few programmers have studied consume-release and its straight
up not-implemented in any major compiler yet. (Some CPUs do offer consume-
release, which is why it is included in the C++11 standard).

#4 is rare, but useful if anyone wants the absolute fastest and knows atomics
are enough (ex: a global multithreaded counter)

------
caf
The (2003) is not entirely precise - the first posting on that page is from
2003, but the majority are from 2007 and the most recent from 2009.

