The 3 cycles latency casts massive suspicion on the bypass network. But I don't ...

eigenform · 2025-01-03T05:00:36 1735880436

> But I don't see how the bypass network could be bugged without causing the incorrect result.

Maybe if they really rely on this kind of forwarding in many cases, it's not unreasonable to expect that latency can be generated by having to recover from "incorrect PRF read" (like I imagine there's also a case for recovery from "incorrect forwarding")

phire · 2025-01-03T06:20:31 1735885231

Yeah, "incorrect PRF read" is something that might exist.

I know modern CPUs will sometimes schedule uops that consume the result of load instruction, with the assumption the load will hit L1 cache. If the load actually missed L1, it's not going to find out until that uop tries to read the value coming in from L1 over the bypass network. So that uop needs to be aborted and rescheduled later. And I assume this is generic enough to catch any "incorrect forwarding", because there are other variable length instructions (like division) that would benefit from this optimistic scheduling.

But my gut is to only have these checks on the bypass network, and only ever schedule PRF reads after you know the correct value has been stored.

Bulat_Ziganshin · 2025-01-03T13:35:50 1735911350

maybe, the bypass network doesn't include these "constant registers"? a bit like zen5 where some 1-cycle SIMD ops are executed in 2 cycles, probably for shortcomings of the same network