Benchmarking is just insanely hard to do well. There are so many things which ca...

porridgeraisin · 2025-03-10T09:44:52 1741599892

That linker lottery led to a 15% improvement? I'm surprised. Do you know in what cases you get such a big improvement from something like that? Is it rare? How did you end up reasoning about it?

MattPalmer1086 · 2025-03-10T09:56:51 1741600611

Various research has shown that the variation can be much higher than 15% due to things like this. It's not that rare; I keep bumping into it. Compilers and linkers do a reasonable job but fundamentally modern CPUs are extremely complex beasts.

I found Casey Muratori's series the best explanation of what is going on at the CPU level.

MattPalmer1086 · 2025-03-10T12:38:51 1741610331

Some additional context. I was actually writing a benchmarking tool for certain kinds of search algorithm. I spent a long time reducing and controlling for external sources of noise. CPU pinning, doing hundreds of those runs with different data, and then repeating them several times and taking the best of each score with the same data (to control for transient performance issues due to the machine doing other things).

I got the benchmarking tool itself to give reasonably repeatable measurements.

The tool had high precision, but the accuracy of "which algorithm is better" was not reliable just due to these code layout issues.

I basically gave up and shelved the benchmarking project at that point, because it wasn't actually useful to determine which algorithm was better.

eru · 2025-03-10T08:45:33 1741596333

I vaguely remember about some benchmarking project that deliberately randomised these compiler decisions, so that they could give you more stable estimates of how well your code actually performed, and not just how well you won or lost the linker lottery.

Mond_ · 2025-03-10T08:52:56 1741596776

You're probably thinking of "Performance Matters" by Emery Berger, a Strange Loops talk. https://youtube.com/watch?v=r-TLSBdHe1A

MattPalmer1086 · 2025-03-10T08:55:20 1741596920

There was Stabilizer [1] which did this, although it is no longer maintained and doesn't work with modern versions of LLVM. I think there is something more current now that automates this, but can't remember what it's called.

[1] https://emeryberger.com/research/stabilizer/

FridgeSeal · 2025-03-10T09:59:04 1741600744

The Coz profiler from Emery Berger.

It can actually go a step further and give you decent estimate of what functions you need to change to have the desired latency/throughput increases.

MattPalmer1086 · 2025-03-10T10:02:03 1741600923

Thanks, I was trying to remember that one!

McP · 2025-03-10T16:15:00 1741623300

LLD has a new option "--randomize-section-padding" for this purpose: https://github.com/llvm/llvm-project/pull/117653

MattPalmer1086 · 2025-03-10T16:33:17 1741624397

Interesting, thanks!

igouy · 2025-03-10T17:23:28 1741627408

"Producing wrong data without doing anything obviously wrong!"

https://doi.org/10.1145/1508244.1508275

igouy · 2025-03-10T20:14:12 1741637652

"Producing wrong data without doing anything obviously wrong!"

[pdf]

https://users.cs.northwestern.edu/~robby/courses/322-2013-sp...

alpaca128 · 2025-03-10T09:42:10 1741599730

As already mentioned this is likely Emery Berger’s project with the idea of intentionally slowing down different parts of the program, also to find out which parts are most sensitive to slowdowns (aka have the biggest effect on overall performance), with the assumption that these are also the parts that profit the most from optimisations.

throwaway2037 · 2025-03-11T05:15:33 1741670133

Aleksey Shipilёv, a long-time Java "performance engineer" (my term) has written and spoken extensively about the challenging of benchmarking. I highly recommend to read some of his blog posts or watch one of his talks about it.