Hacker News new | past | comments | ask | show | jobs | submit login
CPU cache misconceptions, and the MESI cache coherence protocol (rajivprab.com)
87 points by ingve on Aug 2, 2018 | hide | past | favorite | 15 comments

A lot of these 'misconceptions' are absolutely true though, for certain architectures. Every architecture has a specific set of guarantees about memory ordering and the coherence protocol used is an implementation detail relevant only to performance. The article is about x86 but i.e. ARM has much weaker guarantees about memory ordering. Relying on the behavior of a specific architecture here is not a good idea. This Linux document about memory barriers[1] repeatedly calls out problems specific to the Alpha architecture because of its extremely weak memory ordering guarantees.

Though also in principle if you're not writing in assembly and your compiler is not buggy it is the programming language memory model that matters rather than the hardware model. The CPU's memory ordering guarantees don't matter if your compiler reorders memory accesses anyway to "optimize" your code. Modern C/C++ compilers WILL do this if you don't use explicit synchronization because the C/C++ model has very weak ordering guarantees that are basically similar to Alpha.

[1] https://www.kernel.org/doc/Documentation/memory-barriers.txt

Cache coherency is not the same as memory consistency models. This article is talking about cache coherence, and you are talking about memory consistency.

As an example, both ARM and x86 have coherent caches, but neither guarantees sequential consistency. x86 guarantees (something similar to) TSO, and ARM is way more relaxed than that.

The article is confusing because it is talking about cache coherency but keeps calling it consistency.

"The most common protocol that’s used to enforce consistency amongst caches, is known as the MESI protocol."

When MESI is a coherence protocol.

The last mainstream architecture (edit: [1]) that wasn't fully cache coherent was Alpha and hasn't been relevant for at least 15 years. And even Alpha was almost always cache coherent except for some weird corner cases requiring a barrier even in the load-dependent case.

What the article has been trying, and apparently failing, to explain is that cache coherency has nothing to do with memory ordering.

[1] Mainstream CPUs that is, GPUs are still not fully coherent AFAIK.

Alphas were cache coherent. The corner case you're talking about isn't a cache incoherency, it's ultra-weak consistency.

Some more detail:

Cache coherency essentially boils down to a guarantee that every thread sees the same total order of reads and writes to any given memory location. At any point in time, everyone agrees on what the value of a memory location is.

That said, there is trickery on defining time in this sense. The interconnects take time to send the messages back and forth, and it is possible for one memory request to issue after another one but complete first. (We can generally take the time of completion to refer to the time of a memory operation). The problem of consistency is a reflection of the fact that it is usually not necessary to wait for all of this traffic to settle down before progressing to other operations.

On many hardware systems, there are multiple banks of caches that can each talk to the interconnect independently. On Alpha, there is no logic to enforce ordering between banks. You can load a value from one bank, and then use that value to load an address from the other bank. And because of traffic, a write to the address may be visible on the main bus before the write to the value, so it's possible for a CPU to see the update of the pointer but not the value being pointed to.

The problem here is that, in multiprocessor systems, it is not generally the case that the total orders of individual memory locations (guaranteed by cache coherency) can be combined into a single, global total order of all of memory. Coherency is the partial order of a single memory location, while consistency is the partial order of all of memory at once.

I have seen described this lack of preserving causality as a failure of maintaining cache coherency, but I guess that you are right.

Old ARM had software-maintained TAG RAM (CAM? state of cache lines), ICache and DCache.

yeah, it seems like a really weird article. And the main misconception he lists is one of those "true but completely irrelevant" things - it doesn't matter if different processes can read different values from the same memory location at the same time, what matters is that for a variety of reasons you can't define what "the same time" is in a useful way, and defining the order events happen in becomes really important.

Not the whole story in the real world for x86_64 architecture, there are other states like Owned and Forward. IIRC, modern ARM uses AXI4 and ACE for CC.



Uh, he mentions that some details such as additional states might differ depending on the CPU/architecture. He isn’t trying to write a comprehensive reference for everything you’ll see in the wild.

Poor title, it's a short introduction to MESI.

The title is correct, the cache-flushing myth is very widespread and the article tries to dispell it.

Should be a Myth instead of Myths then. I was slightly misled by this title (but still found the article interesting).

Yes. It's not a falsehoods list.

* https://github.com/kdeldycke/awesome-falsehood

Poor reading comprehension on my part; I thought “Myth: The Fallen Lords” game programmers were going to share their beliefs about CPU caches.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact