
Stabilizer: Statistically sound performance evaluation [pdf] - fanf2
https://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf
======
peterwaller
Great, we can control for layout of code, heap, stack and other effects which
mess with a performance measurement. However, why do those things have a
(statistically significant?) impact in the first place? I guess that hints
that with some engineering you could in principle get a speed boost by
specifying the layout. "Worst case", you sample randomly and then pick the
fastest arrangement, where it is statistically significant.

It could be that the problem arises when trying to measure very small speed
increases (small relative differences => noise matters more). But in that case
the fact that such a small speed increase is wiped out by random layout
effects surely means that time would be better invested in finding a more
performant layout?

~~~
titzer
> why do those things have a (statistically significant?) impact in the first
> place?

In a word, caches. Not just the instruction / data cache, but also page faults
and micro-architectural features like micro-op caches, instruction TLB
entries, loop stream buffers, cache-line alignment, and aliasing in the branch
predictor tables (which can also be thought of as caches).

~~~
peterwaller
I suppose I was musing more along the lines of "why isn't this a solved
problem". Clearly, it isn't an easy one or compilers would already take this
into account and then the statistical variance would be reduced.

------
nestorD
> _We find that, while -O2 has a significant impact relative to -O1, the
> performance impact of -O3 over -O2 optimizations is indistinguishable from
> random noise._

~~~
gnufx
That's definitely not true generally for GCC on floating point computation. O3
gets you vectorization, especially with -ffast-math (and extra loop
optimizations recently). That's not to say it's always faster, of course;
measure, as ever.

