Though also in principle if you're not writing in assembly and your compiler is not buggy it is the programming language memory model that matters rather than the hardware model. The CPU's memory ordering guarantees don't matter if your compiler reorders memory accesses anyway to "optimize" your code. Modern C/C++ compilers WILL do this if you don't use explicit synchronization because the C/C++ model has very weak ordering guarantees that are basically similar to Alpha.
As an example, both ARM and x86 have coherent caches, but neither guarantees sequential consistency. x86 guarantees (something similar to) TSO, and ARM is way more relaxed than that.
"The most common protocol that’s used to enforce consistency amongst caches, is known as the MESI protocol."
When MESI is a coherence protocol.
What the article has been trying, and apparently failing, to explain is that cache coherency has nothing to do with memory ordering.
 Mainstream CPUs that is, GPUs are still not fully coherent AFAIK.
Some more detail:
Cache coherency essentially boils down to a guarantee that every thread sees the same total order of reads and writes to any given memory location. At any point in time, everyone agrees on what the value of a memory location is.
That said, there is trickery on defining time in this sense. The interconnects take time to send the messages back and forth, and it is possible for one memory request to issue after another one but complete first. (We can generally take the time of completion to refer to the time of a memory operation). The problem of consistency is a reflection of the fact that it is usually not necessary to wait for all of this traffic to settle down before progressing to other operations.
On many hardware systems, there are multiple banks of caches that can each talk to the interconnect independently. On Alpha, there is no logic to enforce ordering between banks. You can load a value from one bank, and then use that value to load an address from the other bank. And because of traffic, a write to the address may be visible on the main bus before the write to the value, so it's possible for a CPU to see the update of the pointer but not the value being pointed to.
The problem here is that, in multiprocessor systems, it is not generally the case that the total orders of individual memory locations (guaranteed by cache coherency) can be combined into a single, global total order of all of memory. Coherency is the partial order of a single memory location, while consistency is the partial order of all of memory at once.