Because that's not how the ROB or any of the other "in order" structures like the load buffers, etc, work. Fundamentally, retirement happens in order.
uops only leave the ROB when they retire, and they retire in the same order as they appear in the dynamic instruction steam. The ROB is needed to preserve the ability to roll back if any instruction faults, to handle interrupts, to generate precise exceptions, etc: think about what would happen if you had retired a bunch of random instructions on either side of an un-retired instruction that ended up faulting?
Everything is speculative. The CPU always operates as if any unretired instruction is speculative. In reality it almost is: probably at least a quarter of all instructions can fault in some way, and as soon as there is any such instruction in the unretired stream all younger instructions are "speculative". I'm putting it in quotes because "speculative" as we're discussing it just something we are making up: CPUs don't have a "I'm speculating" flag: everything is treated as speculative all the time (and if you consider interrupts, maybe even the cases where you have a stream of cannot-fault instructions would also be "speculative").
There is a structure that works as you describe: the scheduler/reservation station. That's the place where everything happens out of order and slow instructions can stall and be passed by younger ones who leave the scheduler and make room for others - but this sits within the in-order allocate + retire machinery.
Thanks. So because of in-order retirement, the ROB is indeed limited to holding only consecutive µops. And since nothing can be executed until it is put in the ROB, this means that there is a hard limit on the distance in µops between the last unretired µop and what can possibly be executed. Does this also mean that a load is only ever retired after it has been successfully fulfilled? I haven't understood the role of retirement for loads and stores.
> Thanks. So because of in-order retirement, the ROB is indeed limited to holding only consecutive µops. And since nothing can be executed until it is put in the ROB, this means that there is a hard limit on the distance in µops between the last unretired µop and what can possibly be executed.
Yes. The ROB holds even things that will never be executed, such as nops and zeroing idioms.
Note that the ROB is not really the bad guy here: if you invent some way to make the ROB retire out-of-order to break that restriction, you'll just immediately run into the PRF limit, since almost all pending instructions need a destination register. Since these values are all live (from CPU's point of view) until all older instructions have retired, you'll get the same kind of limit from the PRF.
The PRF is a big structure using lots of area and power, so probably the ROB size is kind of determined from the PRF size: how big are going to make the PRF? X? OK, let's make the ROB size X * 1.5 since then the ROB will rarely limit performance. Note: we know the ROB increased dramatically in size from 224 to 354 in SNC, but we don't yet know how big the reg files are!
There are good papers out there about super-high ILP designs if you are interested about how this kind of stuff can be solved, but there are many problems.
> Does this also mean that a load is only ever retired after it has been successfully fulfilled? I haven't understood the role of retirement for loads and stores.
Yes. Loads and stores both go in the ROB, and also in the load/store buffers, which are in-order just like the ROB.
A load can't retire until it completes: only then are the physical resources, like the register it writes to and the load buffer entry, able to be freed.
Note that this is a huge difference between loads and prefetches: prefetches can retire immediately (well as soon as the load address has been calculated). They just kick off the load in the memory subsystem and then their work is done.
Stores are different: they can retire as soon as the store address and store data are known (i.e., as soon as their inputs are available). At this point they become so-called senior stores: stores which have retired, and hence are non-speculative, but haven't become visible to the rest of the system yet. They must eventually become visible, which means that on an an interrupt this part of the store buffer is preserved or drained, never thrown away like the rest of the OoO buffers. When the time is right (write?) they commit to L1.
uops only leave the ROB when they retire, and they retire in the same order as they appear in the dynamic instruction steam. The ROB is needed to preserve the ability to roll back if any instruction faults, to handle interrupts, to generate precise exceptions, etc: think about what would happen if you had retired a bunch of random instructions on either side of an un-retired instruction that ended up faulting?
Everything is speculative. The CPU always operates as if any unretired instruction is speculative. In reality it almost is: probably at least a quarter of all instructions can fault in some way, and as soon as there is any such instruction in the unretired stream all younger instructions are "speculative". I'm putting it in quotes because "speculative" as we're discussing it just something we are making up: CPUs don't have a "I'm speculating" flag: everything is treated as speculative all the time (and if you consider interrupts, maybe even the cases where you have a stream of cannot-fault instructions would also be "speculative").
There is a structure that works as you describe: the scheduler/reservation station. That's the place where everything happens out of order and slow instructions can stall and be passed by younger ones who leave the scheduler and make room for others - but this sits within the in-order allocate + retire machinery.