
The Effects of CPU Turbo: 768X Stddev - ingve
https://www.alexgallego.org/perf/compiler/explorer/flatbuffers/smf/2018/06/30/effects-cpu-turbo.html
======
richardwhiuk
With credit to the author, it feels like they found out something interesting,
but the article feels like a mess.

On one hand, they made a performance optimisation to a library - cool.

On the other hand, the title and the conclusion talk about the difference
between turbo boost in a CPU, which seems completely missing from the articles
contents....

~~~
SiempreViernes
There is a bit of connecting transitions missing, but basically I understand
it thus:

He was using a slower buffer serialisation library (for reasons unexplained),
and when challenged set out to prove _his_ choice was the fastest, but pretty
soon found out he was wrong. Then he wanted to do the test himself, and along
the way he discovered how his processor made benchmarking very noisy in turbo
mode.

~~~
agallego
Thanks for summarizing it better than I did. I'll edit and make that a bit
more clear.

------
ianhowson
Plenty of other data points in the table show _reduced_ stddev with turbo
enabled. The 768x line looks like an anomaly.

Without knowing how many runs were made and the other test conditions, it's
difficult to know what actually changed. There's far too much noise in the
table to come to the conclusion that turbo was the culprit.

We're only talking about a 30% difference in clock rate for 3.2GHz vs 4.2GHz.
If there's more change than that across turbo vs. non-turbo, something else in
the benchmark setup is messed up.

~~~
agallego
~20%. The difference however is a max of ~2.3x in that direction. 100% of the
code is there, what do you suggest is 'messed' up. I probably ran the
benchmarks 100s of times over 7 days trying to understand. That particular I
posted seemed aligned with the others, but that was the stddev of 3 runs
provided by Google Benchmark --benchmark_repetitions=3 I made it 10 and not
much difference there. That was the point of 768X anomalies. The values are
never like that w/ capped frequencies.

~~~
ianhowson
Hang on, so we've got:

\- SD increase of 768x

\- Mean increase of 1.2 to 2.3x

\- But n=3?

You need a _lot_ more samples to make a reasonable statement about whether
Turbo had an impact or not. Noise is completely overpowering your
measurements.

AFAICT, you're measuring SD of the time (nanoseconds) to run your task, and
mean is around 9000ns. 1ns SD is abnormally low, and 768ns SD seems about
right (this _is_ a desktop machine, after all). But I couldn't really figure
this out from the article.

~~~
agallego
No no.

\- capped freqs: stddev 768x. in the non-turbo-to-turbo direction

\- in the other direction it is 2.3x. So was just acknowledging that there are
20% of samples in that table with your previous point

~~~
ianhowson
25% of 'samples' in the stddev table show absolutely no difference between
Turbo and non-Turbo. Assuming that this line of argument makes sense (it
doesn't) that leaves merely 55% of samples that support your case.

I appreciate what you're trying to do, and I'm intensely interested in this
issue (I have a feeling that we face very similar issues in our day jobs) --
but I can't see a reasonable way to interpret your data that supports your
assertion that Turbo meaningfully increases variance.

There's just not enough data.

~~~
BeeOnRope
Turbo and Hyperthreding gernate a ton of noise when benchmarking. Disabling
both results in a one or two order of magnitude reduction in outlier results
(maybe you want to call it noise - but it is not symmetric like classic
noise).

The primary way turbo introduces variance is due to the turbo transitions
forced by the "max turbo ratio varies by active core count" behavior. Eg on my
Skylake chip, the turbo speed is either 3.5, 3.3, 3.2 or 3.1 GHz depending on
whether 1, 2, 3 or 4 cores are active.

Let's say you are running a threaded benchmark with "nothing else" running on
the box. Of course other cores will still occasionally fire up, to handle
scheduler ticks, interrupts from you network card, background processes,
whatever. Every time this happens, the chip has to immediately undergo a
frequency transition which leaves it running at 3.3 GHz. The main problem
isn't the lowered speed, it's the fact that the transition itself puts the
chip into a halted state for 10 to 20 us, presumably to allow the multiplier
transition to occur, for voltages to stabilize, etc. Especially for short,
precise benchmarks, this shows up as a lot of noise.

Turning off turbo fixes it, but so does just setting the max turbo ratio to
the "all cores" value (3.1 GHz in my case) since then you don't have these
forced transitions.

------
tomalpha
_Ensure that your BIOS says performance when connected to AC_

This was the most interesting tidbit for me. It wouldn’t occur to me to run
benchmarks on a laptop (I guess due to my advancing years...)

~~~
agallego
Thanks. That machine has a server processor and ecc memory 64GB. Specs are
pretty similar to a small server. But yeah. Hardware has changed a ton :)

------
dagenix
Minor: ”Friday, June 28 2018". June 28th was actually a Thursday this year.

~~~
agallego
... i guess I was tired when i wrote it. I meant to write 29th! thanks!!

------
regularfry
That page seems to make Android Chrome hang.

~~~
shakna
If that's happening to you, then it's probably an issue with asciinema and
your particular version of Android's Chrome.

That's most of the JS on the page. That being said, the page transfers ~500Kb,
and loads into memory ~30Mb, which shouldn't be that substantial. Most of it
lies in the asciinema player (which should be mostly JITing), or an array.

~~~
vanderZwan
> _loads into memory ~30Mb_

Is that the asciinema player, or the "video" it plays? Either way: I thought
part of the point of asciinema was that it compresses better for terminal
capture? Just how much data was stored by that particular session?

~~~
mkhalil
I think it compresses file size/network transfer, but in order for playback to
occur, the video must be decompressed into memory, even though the terminal
colors will be mostly the same color hex.

------
praseodym
Duplicate of
[https://news.ycombinator.com/item?id=17485536](https://news.ycombinator.com/item?id=17485536)

~~~
ddorian43
That one is empty,old and with no upvotes.

