
Test Results for AMD Ryzen - matt_d
http://agner.org/optimize/blog/read.php?i=838
======
fulafel
Interesting:

> The gain in total performance that you get from running two threads per core
> is much higher in the Ryzen than in Intel processors because of the higher
> throughput of the AMD core

~~~
geezerjay
I've seen the throughput argument being made continuously for the past decade,
including in academic papers where the Opteron's higher throughput has been
shown to beat Xeons in HPC applications.

Another interesting aspect is that in HPC applications, where floating point
performance is more demanding, only about 1 in 7 operations is actually a
floating point operation. The remaining 6 operations are there to move around
data, including requests to push data down the cache memory hierarchy. That's
one of the reasons why the performance of AMD's Bulldozer and Piledriver lines
scaled practically linearly up to the core count in spite of each floating
point unit being shared between a pair of CPU cores.

Consequently, HPC research tends to be focused on strategies to minimize the
amount of data being moved around, as well as minimizing cache misses, or to
take advantage of technology with higher throughput, such as GPGPU. As AMD's
Ryzen offer greater throughtput, performance also increases.

------
godmodus
So glad AMD's back.

~~~
arthursilva
We should all be, no mater what's your preference, that means lower price (and
better motivation for further development) ultimately.

------
arielweisberg
What is this about time measurement? Can you get frequency invariant
nanosecond resolution timestamps from Ryzen with a single instruction like you
can with Intel?

~~~
valarauca1
Intels "real clock" is `TSC`. This counts processor cycles. SandyBridge (and
up, I think) ensure the `TSC` value never fluctuates but remains constant (at
CPUID quoted clock speed _ish_ ). Generally this _real clock_ is called
`RDTSC` even tho its still `TSC` instruction.

Some rumblings from Intel suggest they may discontinue this. Either way this
is addressed

`APERF` seems to be AMD's version of `RDTSC` but that requires the code
execute in Ring-0. So it sounds like Agner was building a kernel module to
wrap the existing test suite.

~~~
zlynx
From the article I think that's backward. AMD's RDTSC is a constant-rate cycle
counter and APERF counts actual cycles. So with clock boosting from XFR, etc,
APERF will increase while RDTSC will not.

