Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Benchmarking is just insanely hard to do well. There are so many things which can mislead you.

I recently discovered a way to make an algorithm about 15% faster. At least, that's what all the benchmarks said. At some point I duplicated the faster function in my test harness, but did not call the faster version, just the original slower one... And it was still 15% faster. So code that never executed sped up the original code...!!! Obviously, this was due to code and memory layout issues, moving something so it aligned with some CPU cache better.

It's actually really really hard to know if speedups you get are because your code is actually "better" or just because you lucked out with some better alignment somewhere.

Casey Muratori has a really interesting series about things like this in his substack.




That linker lottery led to a 15% improvement? I'm surprised. Do you know in what cases you get such a big improvement from something like that? Is it rare? How did you end up reasoning about it?


Various research has shown that the variation can be much higher than 15% due to things like this. It's not that rare; I keep bumping into it. Compilers and linkers do a reasonable job but fundamentally modern CPUs are extremely complex beasts.

I found Casey Muratori's series the best explanation of what is going on at the CPU level.


Some additional context. I was actually writing a benchmarking tool for certain kinds of search algorithm. I spent a long time reducing and controlling for external sources of noise. CPU pinning, doing hundreds of those runs with different data, and then repeating them several times and taking the best of each score with the same data (to control for transient performance issues due to the machine doing other things).

I got the benchmarking tool itself to give reasonably repeatable measurements.

The tool had high precision, but the accuracy of "which algorithm is better" was not reliable just due to these code layout issues.

I basically gave up and shelved the benchmarking project at that point, because it wasn't actually useful to determine which algorithm was better.


I vaguely remember about some benchmarking project that deliberately randomised these compiler decisions, so that they could give you more stable estimates of how well your code actually performed, and not just how well you won or lost the linker lottery.


You're probably thinking of "Performance Matters" by Emery Berger, a Strange Loops talk. https://youtube.com/watch?v=r-TLSBdHe1A


There was Stabilizer [1] which did this, although it is no longer maintained and doesn't work with modern versions of LLVM. I think there is something more current now that automates this, but can't remember what it's called.

[1] https://emeryberger.com/research/stabilizer/


The Coz profiler from Emery Berger.

It can actually go a step further and give you decent estimate of what functions you need to change to have the desired latency/throughput increases.


Thanks, I was trying to remember that one!


LLD has a new option "--randomize-section-padding" for this purpose: https://github.com/llvm/llvm-project/pull/117653


Interesting, thanks!


"Producing wrong data without doing anything obviously wrong!"

https://doi.org/10.1145/1508244.1508275


"Producing wrong data without doing anything obviously wrong!"

[pdf]

https://users.cs.northwestern.edu/~robby/courses/322-2013-sp...


As already mentioned this is likely Emery Berger’s project with the idea of intentionally slowing down different parts of the program, also to find out which parts are most sensitive to slowdowns (aka have the biggest effect on overall performance), with the assumption that these are also the parts that profit the most from optimisations.


Aleksey Shipilёv, a long-time Java "performance engineer" (my term) has written and spoken extensively about the challenging of benchmarking. I highly recommend to read some of his blog posts or watch one of his talks about it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: