Time(1) and CPU Frequency Scaling

nonameiguess · on July 16, 2021

Try to compile ATLAS and you will become intimately familiar with this problem. It refuses to compile if you have any kind of throttling or scaling enabled because it relies on benchmarking all of the alternative compute kernels to determine which is best for your specific hardware. Just following the instructions in the README to set the governor from userspace is not nearly enough. I had to completely re-compile my kernel with the default governor set to performance, on-demand enabled, and all thermal throttling not even available as an option.

The basic issue is consumer kernels are all general-purpose and have to work on laptops and mobile devices, so a whole lot of options are set that make no sense on workstations, and you end up throttling when it isn't necessary, i.e. you're not on battery and you have plenty of fan power or a water loop to prevent actual damage.

Heck, in many cases, you may even need to disable throttling in the BIOS, which is doubly disappointing because a desktop BIOS definitely doesn't need settings that work on laptops and mobile.

zinekeller · on July 16, 2021

> Heck, in many cases, you may even need to disable throttling in the BIOS, which is doubly disappointing because a desktop BIOS definitely doesn't need settings that work on laptops and mobile.

For you, probably, but companies in Asia (due to shareholder and regulatory reasons) needs to power manage their systems hard and only will disable power management when really required. That option exists not for you but for them.

mcint · on July 16, 2021

Can you say more? I’m not even sure how I’d search for more information about this.

Because of the large and dense population centers, and therefore power requirements, governments require aggressive power management settings on computers??

nonameiguess · on July 16, 2021

Honestly, I had no idea that was the case, but it explains a lot, so thanks.

jareklupinski · on July 16, 2021

wow you weren't kidding

> If ATLAS's configure detects that CPU throttling is enabled, it will kill itself.

http://math-atlas.sourceforge.net/atlas_install/node5.html

a little dramatic but ok

toast0 · on July 16, 2021

I certainly understand the need for consistency, but the default throttling regime does offer some benefits.

On modern system, the peak speed is usually higher than the consistent speed, so if you're running software with short CPU peaks, you can have better results with throttling than not. Also, if your software doesn't use all the cores, you may be able to run those it does use faster than the consistent speed.

Power savings are still useful even when the power comes from the wall, and thermal savings often lead to less noise and can have secondary benefits in conditioned environments. (Of course, there are environments where computers make a good replacement for electrical resistance heaters)

zekrioca · on July 16, 2021

I understand why ATLAS would need to do something like this maybe up until 10 years ago. I also understand the time and performance hard constraints the library needs to provide applications with, specially in some scientific experiments.

However, this type of static linking is very weird in today's architectures, specially when considering that an application can run anywhere, from a light bulb to a top notch server hosted in the cloud and elsewhere. It specially bugs me to think that people leading some of these projects are so stubborn to the point of not accepting the new reality imposed by today's software and system architectures.

sharikone · on July 16, 2021

I agree that refusing to compile is stubborn on their part. But tailoring the build to your system is very valuable, at least as an option. And a big part of the blame lies with the CPU and OS manufacturers that fail to provide good consistent performance measurement options that traverse all the endless layers of abstraction that extend from the transistors to a user space program nowadays

genewitch · on July 17, 2021

On the face it seems like CPU manufacturers should be able to put actual performance numbers out; until we realize that the temperature characteristics - and therefore performance - are non-deterministic.

This fact allows hardware RNG from thermal emissions on consumer grade CPUs to output at least 3 megabits of 'entropy' per second. The same as 3,000,000 coin flips.

I may be misunderstanding what is going on, though.

marbu · on July 16, 2021

A minor nitpick: the author most likely doesn't use time(1), which usually is https://www.gnu.org/software/time/), but internal shell command time. See:

``` $ type -a time time is a shell keyword time is /usr/bin/time time is /bin/time ```

oandrew · on July 16, 2021

you can skip shell built-ins by pretending "command":

  $ time -V
  -V: command not found

  $ command time -V
  GNU time 1.7

siddhant09 · on July 16, 2021

Yes, and which is why most benchmarks today are somewhat misleading as well.

My personal laptop's CPU is Intel i7-9750H. I always run it with turbo boost disabled for predictable, sustained performance. Turbo boost is weird as in it can lead to a sub-par user experience (lag) when your processor throttles.

Interestingly enough I can simultaneously compile a ~250k LOC purescript project & attend a google meets call with TB disabled but not with it enabled. (This is on MBP-16)

magila · on July 16, 2021

The OS is doing something screwy if frequency scaling causes noticeable performance regression. A modern Intel CPU like the 9750H supports very low latency (on the order of microseconds) hardware driven frequency scaling which should pretty much always perform better than leaving the CPU locked at the base frequency.

Intel Macbooks are also notoriously thermally limited so I wonder if the settings you're using are just causing the CPU to run hotter than it would at stock.

hnuser123456 · on July 17, 2021

I have a thin gaming laptop with the same CPU, and stock settings let it climb to 100C and then figure out how fast it can run without getting hotter, usually also with far too much voltage. Decreasing the voltage, and even power limit, and disabling boost, lets you set that temp ceiling to something closer to 60-80C, which will make the CPU more comfortable running at higher frequencies, although with a theoretically slightly reduced stability margin due to less voltage overhead. However, long-term use at high temps and voltage can also degrade stability, so it's a delicate balancing act. The only way to guarantee sustained performance at low temps is to improve cooling, with higher fan speeds or a laptop cooler.

gruez · on July 16, 2021

>My personal laptop's CPU is Intel i7-9750H. I always run it with turbo boost disabled for predictable, sustained performance. Turbo boost is weird as in it can lead to a sub-par user experience (lag) when your processor throttles.

Does your laptop have such atrocious cooling that it can't sustain frequencies higher than base? When my laptop undergoes thermal throttling (ie. hits 95C), the frequency is still above the base frequency, so I'm still getting more performance than if turbo was disabled.

edit: tested with cinebench. On my laptop with intel cpu the sustained all-core turbo frequency is 45% higher than the base frequency. This is with throttlestop enabled, otherwise TDP/tau throttling kicks in first rather than thermal throttling.

siddhant09 · on July 16, 2021

It's a MSI-GF65 Thin which has excellent cooling. However with TB enabled both Ubuntu & Windows randomly starts a process (usually updates) which hits the CPU hard and consequently all the fans spin up.

At base clock, I see no visible loss in performance and a much quieter workstation.

The performance loss in compilation is offset by continuously compiling (watch) which I can do with TB disabled but not with TB enabled.

For gaming, I limit the turbo boost to 3.2Ghz and get a more consistent performance with no sudden drops.

user_7832 · on July 16, 2021

Wouldn’t this be due to thermal throttling? In which case a better heat dissipation system would help avoid it.

siddhant09 · on July 16, 2021

Not really. Intel tries to push the thermal envelop as much as it can for as long as it can. While the CPU is rated at 45W, TB actually pushes it to 70W+ for short intervals.

On Windows, instead of disabling TB, I just limit the CPU TDP at 40W via throttle-stop.

TB just makes the laptop loud and hot even when the added performance is not required.

genewitch · on July 17, 2021

As a reference, my laptop is a Ryzen 4800H and I only hear the fans... Well, never. I even have the laptop set to "agressive cooling" which turns an LED red indicating I am not being green.

I was using the laptop to mine verium during this past winter in the US south to keep my hands and arms warm in lieu of running HVAC more. I never bothered to switch it back to normal or "quiet" mode.

I take my first statement back, running blender benchmarks or some other benchmark that tries to tap out the GPU tends to make a bit of noise, but normal gaming and stuff like handbrake or whatever I can't hear anything.

rambojazz · on July 17, 2021

Why turbo boost disabled instead of always-enabled?

brirec · on July 16, 2021

See also https://news.ycombinator.com/item?id=27725286

My thoughts on this are that we could do with a better CPU usage measurement, which the author of this article found to be cycles.

ivegotnoaccount · on July 16, 2021

One issue with cycles is that the results you will get may not reflect what end users, who probably have throtling enabled if it is made to run on laptops or desktop computers, will experience.

All instructions do not have the same thermal cost, and even though my laptop's CPU is able to stand at 3.99GHz when running stress tests whereas its base clock is 2.59GHz, I doubt it would be able to do the same if it was running AVX2 instructions (or it does have quite impressive thermals for a laptop).

Also, does disabling throtling disable dynamic frequency scaling of the CPU's cache too ?

secondcoming · on July 16, 2021

> I really wanted to know:

> “How much work did the CPU have to do in order to complete this task?”

> But what time(1) tells me is:

> “How long did the CPU work to complete this task?”

Yes, exactly. It's essentially impossible to accurately time things, especially on a laptop. To answer the second question, the best bet is to use rdtscp() when micro-benchmarking, but even that has pitfalls.