
CPU cache misconceptions, and the MESI cache coherence protocol - ingve
https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/
======
opencl
A lot of these 'misconceptions' are absolutely true though, for certain
architectures. Every architecture has a specific set of guarantees about
memory ordering and the coherence protocol used is an implementation detail
relevant only to performance. The article is about x86 but i.e. ARM has much
weaker guarantees about memory ordering. Relying on the behavior of a specific
architecture here is not a good idea. This Linux document about memory
barriers[1] repeatedly calls out problems specific to the Alpha architecture
because of its extremely weak memory ordering guarantees.

Though also in principle if you're not writing in assembly and your compiler
is not buggy it is the programming language memory model that matters rather
than the hardware model. The CPU's memory ordering guarantees don't matter if
your compiler reorders memory accesses anyway to "optimize" your code. Modern
C/C++ compilers WILL do this if you don't use explicit synchronization because
the C/C++ model has very weak ordering guarantees that are basically similar
to Alpha.

[1] [https://www.kernel.org/doc/Documentation/memory-
barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)

~~~
gpderetta
The last mainstream architecture (edit: [1]) that wasn't fully cache coherent
was Alpha and hasn't been relevant for at least 15 years. And even Alpha was
almost always cache coherent except for some weird corner cases requiring a
barrier even in the load-dependent case.

What the article has been trying, and apparently failing, to explain is that
cache coherency has nothing to do with memory ordering.

[1] Mainstream _CPUs_ that is, GPUs are still not fully coherent AFAIK.

~~~
jcranmer
Alphas were cache coherent. The corner case you're talking about isn't a cache
incoherency, it's ultra-weak consistency.

Some more detail:

Cache coherency essentially boils down to a guarantee that every thread sees
the same total order of reads and writes to any given memory location. At any
point in time, everyone agrees on what the value of a memory location is.

That said, there is trickery on defining time in this sense. The interconnects
take time to send the messages back and forth, and it is possible for one
memory request to issue after another one but complete first. (We can
generally take the time of completion to refer to the time of a memory
operation). The problem of _consistency_ is a reflection of the fact that it
is _usually_ not necessary to wait for all of this traffic to settle down
before progressing to other operations.

On many hardware systems, there are multiple banks of caches that can each
talk to the interconnect independently. On Alpha, there is no logic to enforce
ordering between banks. You can load a value from one bank, and then use that
value to load an address from the other bank. And because of traffic, a write
to the address may be visible on the main bus before the write to the value,
so it's possible for a CPU to see the update of the pointer but not the value
being pointed to.

The problem here is that, in multiprocessor systems, it is not generally the
case that the total orders of individual memory locations (guaranteed by cache
coherency) can be combined into a single, global total order of all of memory.
Coherency is the partial order of a single memory location, while consistency
is the partial order of all of memory at once.

~~~
gpderetta
I have seen described this lack of preserving causality as a failure of
maintaining cache coherency, but I guess that you are right.

------
modells
Not the whole story in the real world for x86_64 architecture, there are other
states like Owned and Forward. IIRC, modern ARM uses AXI4 and ACE for CC.

[https://en.wikipedia.org/wiki/MOESI_protocol](https://en.wikipedia.org/wiki/MOESI_protocol)

[https://en.wikipedia.org/wiki/MESIF_protocol](https://en.wikipedia.org/wiki/MESIF_protocol)

~~~
jplayer01
Uh, he mentions that some details such as additional states might differ
depending on the CPU/architecture. He isn’t trying to write a comprehensive
reference for everything you’ll see in the wild.

------
Dayshine
Poor title, it's a short introduction to MESI.

~~~
gpderetta
The title is correct, the cache-flushing myth is very widespread and the
article tries to dispell it.

~~~
Bootvis
Should be a Myth instead of Myths then. I was slightly misled by this title
(but still found the article interesting).

~~~
JdeBP
Yes. It's not a _falsehoods_ list.

* [https://github.com/kdeldycke/awesome-falsehood](https://github.com/kdeldycke/awesome-falsehood)

------
RootKitBeerCat
Poor reading comprehension on my part; I thought “Myth: The Fallen Lords” game
programmers were going to share their beliefs about CPU caches.

