
Measuring the system clock frequency using loops (Intel and ARM) - ingve
https://lemire.me/blog/2019/05/19/measuring-the-system-clock-frequency-using-loops-intel-and-arm/
======
alain94040
I don't think this is how you measure clock frequencies.

Don't assume each iteration will take once cycle, and divide the time by the
number of iterations.

Instead, you measure the time it takes for several loops that are likely to
have prime ratios in terms of execution time, and then you figure out the
common denominator, which is your clock frequency.

So you never have to do something ugly like:

    
    
      #ifdef __aarch64__
        frequency *= 2; // Many ARM processors need 2 cycles per iteration due to lack or flag renaming and fusion
      #endif

~~~
pjc50
The cycle time of all the instructions involved is specified in the
architecture reference manual. The issue here is that the Intel decoder will
fuse "dec %[counter]; jnz cyclemeasure2" into a single instruction. So the
Intel loop is doing one cycle per count and the ARM one is two cycles per
count.

~~~
alain94040
That's not true for any high-end processor from the last 15 years or so. You
just can't predict how instructions will interleave and how much parallelism
the processor decoder will be able to extract at run-time.

~~~
pjc50
In general no, but if you're only using two instructions, subtract and branch,
it's feasible. There's only one register involved, so each subtract is always
waiting on the previous one and they cannot run in parallel.

------
CaliforniaKarl
I wonder how this method responds to things like Intel® Turbo Boost™. My
understanding is that it boosts the clock speed when within thermal limits; I
expect that the CPU would be at its lowest temperature earlier in boot, and so
your loop would return a higher frequency than it would, say, after dealing
with a Hacker News flood.

