
Low-level details of the Zen 2 microarchitecture [pdf] - ekoutanov
https://www.agner.org/optimize/microarchitecture.pdf
======
davidtgoldblatt
Of particular interest is section 20.18 -- "Mirroring memory operands";
extending (something like) register renaming to memory.

~~~
boxfire
I wonder what the limits are on that, and if it chains very deeply, e.g mreg
("memory register") to op to mreg to op to mreg.... What's the limit? Can this
be used to extend the optimization of some chained computations via a kind of
unrolling?

~~~
dkersten
In Agner's forum post linked a few days ago[1], it sounded like it was quite
limited. I mean, super impressive, but as far as I understood, it didn't
handle nesting (just the single pointer/memory access + offset)[2]. I'm not
very well versed on this stuff though, so maybe I misunderstood.

[1]
[https://news.ycombinator.com/item?id=24302057](https://news.ycombinator.com/item?id=24302057)

[2] > _The mechanism works only under certain conditions. It must use general
purpose registers, and the operand size must be 32 or 64 bits. The memory
operand must use a pointer and optionally an index. It does not work with
absolute or rip-relative addresses._

> _It seems that the CPU makes assumptions about whether memory operands have
> the same address before the addresses have been calculated. This may cause
> problems in case of pointer aliasing._

Or from the PDF:

•The instructions must use general purpose registers.

•The memory operands must have the same address.

•The operand size must be 32 or 64 bits.

•You may have a32 bit read after a 64 bit write to the same address, but not
vice versa.

•The memory address must have a base pointer, no absolute address, and no rip-
relative address. The memory address may have an index register,a scale
factor, and an offset no bigger than 8 bits.

•The memory operand must be specified in exactly the same way with the same
unmodified pointer and index registers in all the instructions involved.

•The memory address cannot cross a cache line boundary.

•The instructions can be simple MOV instructions, read-modify instructions,or
read-modify-write instructions.It also works with PUSH and POP instructions.

•Complex instructions with multiple μops cannot be used.

------
enchiridion
Can anyone comment on how secure this architecture is against speculative
execution attacks vs. Intel?

------
Reelin
I have a (likely silly) question for anyone well versed in low level CPU
stuff. §20.7 says there's a stack engine that optimizes manipulation of the
stack pointer. Does this only apply to the dedicated hardware register (ie
%rsp) or to other registers as well?

(Potentially related, assuming it's of benefit are modern compilers smart
enough to repurpose %rsp (is this even allowed?) if I use a block of memory as
a stack inside a hot loop?)

~~~
cepp
Take this with 2 cents since I'm not versed explicitly in Zen architecture,
but it's likely only the SP. Usage patterns are fairly easy to deduce at
compile time and are optimized, i.e. loop tiling, thus I think it's fair to
assume the optimizations are leveraged against this. For example, if you can
predict the loop pattern you can repurpose the SP.

------
foota
Funny enough, I ctrl fed for an optimization problem I've been wondering about
and found a couple mentions of it (branch vs conditional move)

------
jordiburgos
Could all these insights added to an Artificial Intelligence (AI)? Then the AI
would find the best way to compile, re-arrange instructions, etc...

Just thinking...

