> "One operation per port, per cycle" > I didn't catch the issue, can you elabor...

BeeOnRope · on June 11, 2019

Loads are only a single uop, and this uop is responsible for doing the address calculation and starting the fetch (the results are usually passed directly to the operations that consume them, the "receive the fetch" part is their job). So logically there might be multiple things going on, but it is reflected in a single uop dispatched to either p2 or p3, not breaking any "one port, one op" rule at that level.

Stores are different: they are two uops, STA (store address) and STD (store data) and they need different ports p237 for STA and p4 for STD and so they play within the rules as well. Basically the two ops are needed for the separate inputs they use: the store address, and the store data and they can execute happen independently. Loads OTOH have 1 input (the address) and one output.

Note that loads _can_ actually end up dispatching multiple uops, in the case of a cache miss!

If the load uop misses in L1 when it executes, it gets replayed 7 cycles later, with the idea that the data will be arriving from L2 if there is an L2 hit. If that also misses, the load will replayed a final time when the data arrives from L3 or DRAM (there is no additional replay for an L3 miss, because the latency is variable so the load just goes to sleep waiting for the result, whether from L3, L4, DRAM or wherever). The replayed uops still have to play by the one port, one op rule.