
A Timely Discovery: Examining Our AMD 2nd Gen Ryzen Results - artsandsci
https://www.anandtech.com/show/12678/a-timely-discovery-examining-amd-2nd-gen-ryzen-results
======
notaplumber
For those unaware, The TSC is an on-die timer on modern x86 processors which
brings lower latencies than accessing the HPET, which is located on the
chipset. TSC on the other hand is also an instruction (RDTSC) and can be used
simultaneously by both the kernel and in user application code, if permitted.

TSC had some problems historically with SMP and for a time HPET was pretty
much the only high resolution timer available, but nowadays modern CPUs from
both Intel/AMD support invariant TSC or constant TSC, which if supported, will
be preferred by operating systems.

This was a good article explaining their findings, but it missed the industry
shift back toward favouring TSC.

~~~
dragontamer
I'm not sure if AMD properly supports invariant TSC. See this post from Agner
Fog with regards to the Zen cores:

[http://www.agner.org/optimize/blog/read.php?i=838](http://www.agner.org/optimize/blog/read.php?i=838)

> The varying clock frequency was a big problem for my performance tests
> because it was impossible to get precise and reproducible measurements of
> computation times. It helps to warm up the processor with a long sequence of
> dummy calculations, but the clock counts were still somewhat inaccurate. The
> Time Stamp Counter (TSC), which is used for measuring the execution time of
> small pieces of code, is counting at the nominal frequency. The Ryzen
> processor has another counter called Actual Performance Frequency Clock
> Counter (APERF) which is similar to the Core Clock Counter in Intel
> processors. Unfortunately, the APERF counter can only be read in kernel
> mode, unlike the TSC which is accessible to the test program running in user
> mode. I had to calculate the actual clock counts in the following way: The
> TSC and APERF counters are both read in a device driver immediately before
> and after a run of the test sequence. The ratio between the TSC count and
> the APERF count obtained in this way is then used as a correction factor
> which is applied to all TSC counts obtained during the running of the test
> sequence. This method is awkward, but the results appear to be quite
> precise, except in the cases where the frequency is varying considerably
> during the test sequence. My test program is available at
> www.agner.org/optimize/#testp

\---------------

EDIT: Odd. AMD Documentation of 17h (aka Zen) suggests that the TSC is
invariant:
[https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models...](https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf)

See bottom of page 82.

I can't say I've run tests as thoroughly as Agner Fog has done however, or
even have access to Zen cores.

~~~
wtallis
I think that's saying that TSC is invariant with respect to wall time, but
what Agner Fog wants is a counter that's invariant with respect to actual CPU
clock cycles, so that he can count how many cycles an instruction takes even
if the clock speed of the CPU changes during a test run.

~~~
dragontamer
That makes sense actually. I never thought of it that way.

------
josteink
We were wrong. This is how we were wrong. This is how we’re going to avoid the
same mistake in the future. We’re retracting our results and will publish
updated results when ready.

Clear and simple.

I wish more media could be so straight forward and honest as this.

------
amluto
> Instead of being a benefit to testing, what our investigation found is that
> when HPET is forced as the sole system timer, it can sometimes a hindrance
> to system performance, particularly gaming performance.

That doesn’t surprise me at all. The HPET is _awful_. Reading it takes
thousands of cycles. Reading it concurrently on multiple cores is even worse.
Even using it as a timer (to fire an interrupt at a given time) is highly
dubious — IIRC it only supports “interrupt of time == X” and not “interrupt
the first time that time >= X”, which means that setting a timer without races
requires manually checking that the timeout hasn’t already expired after
programming the timer, which is extra bad given how incredibly slow the HPET
is.

(I maintain the Linux vDSO, and the HPET has so many problems that I disabled
vDSO HPET support entirely a while back.)

~~~
cesarb
> (I maintain the Linux vDSO, and the HPET has so many problems that I
> disabled vDSO HPET support entirely a while back.)

For those who don't know: the vDSO is a "dynamic library" provided by the
Linux kernel to every process. System calls in modern Linux are made by
calling a function exported by the vDSO, which then does the real system call,
using the fastest method available. There are a few special cases in the vDSO
for time-related system calls: depending on the currently active clock source,
it can do a few calculations and return, without having to actually do a
system call. This makes getting the current time much faster.

------
ysleepy
Summary: HPET Timer has a significant performance impact, especially on intel.
Anandtech turned them to non-default forced-on and now has lots of results
which cant be compared to other benchmarkers.

I'm interested in first-generation Ryzen numbers. Especially considering AMD
explicitly advised turning HPET off while anandtech turned them on.

------
ComputerGuru
Really, the article should be titled “Intel CPU suffer performance hit with
HPET bug”

~~~
dogma1138
There is no actual performance hit the number of frames per a given unit of
real time is still identical the only thing that happens is that the frame
counter on the machine does not count it accurately.

~~~
wtallis
That may be true for some applications, but in general the FPS meter isn't the
only thing that cares about wall time.

~~~
dogma1138
Yes but were talking about this specific use case I'm not entirely sure if
this actually would affect more significant things HPET was always PITA.

I am wondering where did they get the Intel uses 24mhz timer from tho...

hpet0: 4 comparators, 64-bit 14.318180 MHz counter

~~~
wtallis
On my Skylake system, I have:

hpet0: 8 comparators, 64-bit 24.000000 MHz counter

~~~
dogma1138
I've tested on Broadwell/Haswell-E, KL and SKL-X I wonder wtf is going on.

24.0Mhz is also not an integer multiplier of the standard 8254/8253 PITs....

------
headsoup
What I don't understand is that AMD/Intel differ so much.

The subsequent question I haven't seen clarified is:

In any of these situations, are Intel's results actually accurate FPS or could
it be showing 'accelerated' FPS?

I.e. is Intel taking a performance hit, or are their 'non-HPET' numbers
cosmetically showing faster performance that isn't there?

~~~
wtallis
> What I don't understand is that AMD/Intel differ so much.

At least one of the factors seems to be the Meltdown workarounds that are
required on Intel but not AMD processors. My intuition is that this
explanation is sufficient to account for the entire discrepancy between
vendors, but I can't rule out other factors. (Querying the HPET incurs all the
Meltdown workaround overhead of accessing peripherals like storage, but with
none of the actual waiting on a peripheral device, so on a percentage basis
this is probably the most severely affected task.)

> I.e. is Intel taking a performance hit, or are their 'non-HPET' numbers
> cosmetically showing faster performance that isn't there?

When the system is forced to always use the HPET instead of the TSC, the
performance hit is real and not just an artifact of inaccurate timers. When
you're not overclocking, the TSC is usually reliable and TSC-based time
measurements should differ from HPET-based time measurements only by the
overhead of accessing the HPET. That overhead got much larger this year for
Intel processors.

Ian's initial benchmarks didn't include overclocked processors, just a test
configuration intended to be suitable for future comparison against
overclocked processors.

~~~
headsoup
Thanks for the explanation

------
brianolson
So, this is all about HPET _as it is used by Windows_, right? They talk about
'forced on in the OS', they mean Windows. Anyone know how the various PC
timers are used in Linux?

~~~
paol
I think Linux will use HPET if the hardware makes it available. You can
prevent that with "hpet=disable" in the kernel boot parameters.

~~~
cesarb
Yes, Linux will use the HPET if it's available, but it prefers the TSC when
possible, which is the case in most modern CPUs. For instance, you will see
the following in the kernel log:

    
    
        [    0.062020] clocksource: Switched to clocksource hpet
        [...]
        [    2.383237] clocksource: Switched to clocksource tsc
    

Here, it started with the HPET, calibrated the TSC, and switched to it.
There's no need to disable the HPET for it to chose something else as the best
timing source.

