
Intel’s port 7 AGU blunder (2019) - nkurz
https://blogs.fau.de/hager/archives/8683
======
nkurz
Does anyone have a good answer to his "Why" at the end?

 _What’s a total mystery to me is why Intel chose to build an AGU that cannot
handle all kinds of addresses. In 2017, it was indicated to me that there “was
not enough space on the die.” I find this hard to believe, especially because
the problem prevailed in (at least) three further generations of Intel CPUs
after Haswell._

Is die space really a plausible answer as to why Intel would bother to put in
a second AGU, but cripple it so that it can only work on "simple" addresses
that even their own compiler is not "smart" enough to generate?

~~~
bdonlan
Others have pointed out limitations on register read bandwidth and other
possible hardware level issues, but I'll also point out that avoiding indexed
addressing modes in the generated assembly doesn't look to be that difficult -
if you're doing a big unrolled SIMD kernel it's probably not a stretch to have
the compiler emit some LEA instructions as well to avoid indexed addressing
modes in the loop. Apparently the compiler authors simply haven't decided that
it's important enough to do so yet, but compilers are much easier to change
than hardware...

~~~
waltpad
Or they didn't want to end up producing code with noticeable fluctuations in
efficiency, as it is described later in the article: sometimes, having
consistent (read predictable) computation times is better.

------
jeffbee
nkurz, just curious if this is on your mind due to our recent conversation
about whether or not the AGU even exists.

~~~
nkurz
Sort of. I did find this article when searching for evidence of whether the
AGU ever assists in LEA, but I'm mostly interested in the topic because I've
been working on-and-off with Daniel Lemire and 'BeeOnRope' on a research paper
that involves getting intimately familiar with the innards of the AGU's on
recent Intel. I have no doubt that the AGU exists!

Related, you might be interested in this StackOverflow, where among other
things, Peter Cordes (is he here?) and BeeOnRope conclude that the Port 7 AGU
cannot be acting as assist to the 3-cycle "complex" LEA:
[https://stackoverflow.com/questions/50557636/what-type-of-
ad...](https://stackoverflow.com/questions/50557636/what-type-of-addresses-
can-the-port-7-store-agu-handle-on-recent-intel-x86) (expand the comments on
the answer).

~~~
zwegner
Thanks for pointing out that link (I didn't click on it when I originally read
the article). Deep in the comments, there's a hint of the reason for this port
7 AGU weakness: not because of the extra gates for another adder, but due to
the wiring of the scheduler/bypass network, which is presumably much simpler
when port 7 only handles latency-1 instructions (at least if I'm understanding
that correctly).

Looking forward to that paper, too!

