

Store-to-Load Forwarding and Memory Disambiguation in x86 Processors - nkurz
http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/

======
rayiner
Dave Kanter on RWT has published a few articles going into more depth about
the memory disambiguation on Merom (Core 2+) and Haswell processors:
[http://www.realworldtech.com/merom/7;](http://www.realworldtech.com/merom/7;)
[http://www.realworldtech.com/haswell-tm-
alt](http://www.realworldtech.com/haswell-tm-alt) (in the context of how the
traditional memory order buffer had to be updated to support transactional
memory).

This is a really interesting area of modern OOO processor design. Every entry
of the store buffer has to be probed on every load to see if there is an
earlier store to that address that hasn't hit the cache yet. If you make it
bigger, you can perform more stores without waiting for the cache, but you
also need a bigger, more power-hungry CAM to implement the store buffer. That
structure tends to be a major point of contention in trading off between
increased memory parallelism and the cycle-time/power usage of the design.
Structures for predicting the addresses of stores use even more power.

See this discussion of why Silvermont (the OOO Atom core in Bay Trail), avoids
memory disambiguation by simply stalling on stores with unknown addresses, in
order to save power:
[http://www.realworldtech.com/silvermont/7](http://www.realworldtech.com/silvermont/7).

In order to avoid expensive memory disambiguation, Itanium punts on the
problem entirely and uses a software-visible structure called an ALAT:
[http://www.realworldtech.com/poulson/6](http://www.realworldtech.com/poulson/6).

