
Optimizing Clang: A Practical Example of Applying BOLT - ingve
https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.md#optimizing-clang--a-practical-example-of-applying-bolt
======
nkurz
Wow, that's an impressive result. I wouldn't have guessed that L1-icache-
misses were that significant of a time sink such that reducing them by 1/3
would be enough to knock 15% off the total runtime.

Most of the time when I read about something that provides this much
improvement on a major application, it turns out that the comparison isn't
quite what it seems (as in, they are using -O0 as a baseline or some such).
But from what I can tell, this is a genuine improvement over the best
alternative.

Congratulations to the authors on a great job!

~~~
cperciva
I doubt the improvement comes from icache misses. Rather, the improvement
comes from avoiding pipeline flushes by reducing mispredicts, and said
mispredict reduction also reduces icache misses.

~~~
nkurz
I'm a little doubtful that reduced prediction error can be the main driver
here. Hardware branch predictors are really good for anything that's
predictable after a short learning period, so layout changes are only going to
have an effect on prediction errors for the first couple iterations on a cold
start. Can there really be hundreds of billions of cold starts? Maybe, but I
suspect that optimizing the straight line code path to match the most common
branch is the greater effect, since consecutive instructions are almost always
going to be in cache. I'm still surprised the effect is this large, but I
think it's a little more likely, and it does match the blog post.

------
exikyut
MySQL isn't the best possible test. Compile Android instead. That builds
Chromium (twice, IIUC) and Java.

For bonus points, collect discrete perf data from each component build.

Apparently the conversion tool can fold in multiple datasets into a profile.
It would be interesting to see whether an aggregate profile would produce the
same level of efficiency as opposed to dedicated profiles for each major tool.

------
pyler
They should integrate BOLT into Clang/LLVM. Any BOLT dev here?

~~~
Joky
Bolt hasn’t much to do with clang/LLVM I think, unless you’re thinking it
could be integrated with lld, the linker that is developed under the LLVM
umbrella?

~~~
kibwen
I'm still reading the paper now, but I had the impression that part of its
pipeline involves disassembling binary artifacts back into IR that resembles
what the compiler used to produce the binary in the first place, which could
suggest that that step could benefit from tighter integration with LLVM.

