Replay: Unknown Features of the NetBurst Core (2005)

nkurz · on Nov 1, 2014

While it might sound from the title like this article is hopelessly out of date, I think it's still highly relevant to the current generation of Intel processors. I came across it in a footnote to Agner Fog's excellent http://www.agner.org/optimize/microarchitecture.pdf[1].

The article details (what I think is) an otherwise undocumented 'replay' feature that describes how the processor deals with data dependencies that aren't resolved on the expected schedule: among others, L1 cache misses, TLB misses, and failed Store-Load forwards.

[1] Footnote to self: Read the rest of Agner's footnotes!

raverbashing · on Nov 1, 2014

AFAIK the replay feature is present only on Netburst processors, basically, the P4 line (Northwood onwards, not sure it was on Willamette ones)

Modern Intel processors do not use the Netburst architecture.

nkurz · on Nov 1, 2014

Do you have specific knowledge about how more modern processors handle these cases? I was particularly excited to find this article because it was the only source that explained the hardware counter activity I found here: http://fastcompression.blogspot.com/2014/09/counting-bytes-f...

Yann was trying to write a fast histogram to record the number of occurrences of each character. But the simple version of the program was much slower than expected. After a fair amount of digging, it was determined that "impossible" store forwarding was a factor. Checking the performance counters seemed to confirm that many of the loads were being replayed (executed many times before being retired). Is there another explanation for this?

raverbashing · on Nov 1, 2014

From this it looks like it might be something similar really https://docs.google.com/document/d/18gs0bkEwQ5cO8pMXT_MsOa8X...

I'm not very familiar with the modern things in the Core architecture

nkurz · on Nov 1, 2014

Yes, I think so too! But I'm biased almost surely. You probably didn't notice, but the "Nathan Kurz" who wrote that email and the 'nkurz' that posted this are both me. That's why I was excited to finally find some sort of external confirmation. :)

raverbashing · on Nov 2, 2014

Aaah sorry I didn't notice :)

Sorry, my knowledge of the microprocessor internals have been going down these past years (and they are becoming more complex)