> deleting this one line of code can have a dramatic effect on seemingly unrelat...

> deleting this one line of code can have a dramatic effect on seemingly unrelated performance micro-benchmarks. Some numbers get better (e.g. +5%), some numbers get worse (e.g. -10%). The same micro-benchmark can get faster on one C compiler but slower on another

There's a fundamental disconnect that makes it difficult for humans to reason about performance in computer programs. Because the speed of light is so slow, computer architecture as we know it will always rely on cache and OoO to be fast. The human brain does seem to work out of order, but it's only used to thinking about a world that runs in order. When we use theory of mind, we don't model other people's minds, we use our own as a model for theirs; see mirror neurons[1].

Because of this, standard code benchmarks are not very useful, unless they can demonstrate order-of-magnitude speedups. Even something like a causal profiler[2][3][4], which attempts to control for the volatile aspects of performance, is of limited use; it cannot control for all variables and its results will likely be invalidated by the same architectural variation it tries to control for. Instead (with respect to performance) we should focus on three factors:

- Code maintainability

- Algorithmic complexity

- Cache coherency

Everything else is a distraction.

1. https://en.wikipedia.org/wiki/Mirror_neuron

2. https://www.youtube.com/watch?v=r-TLSBdHe1A

3. https://arxiv.org/pdf/1608.03676v1.pdf

4. https://github.com/plasma-umass/coz