
Bolt: A Practical Binary Optimizer for Data Centers and Beyond - matt_d
https://arxiv.org/abs/1807.06735
======
compilerdev
Not to take anything away from this, it's great that such a tool is available,
but Microsoft had this kind of technology 20 years ago, known as BBT - still
used in some places, but overall systems moved to profile guided optimizations
done by Visual C++ (an overall improvement over BBT). It is mostly focused on
doing block/function placement optimizations to reduce paging and separate
hot/cold code. Some info here:
[https://blogs.msdn.microsoft.com/reiley/2011/08/06/microsoft...](https://blogs.msdn.microsoft.com/reiley/2011/08/06/microsoft..).

------
pella
discussions:
[https://news.ycombinator.com/item?id=17350122](https://news.ycombinator.com/item?id=17350122)
(173 points, 33 days ago )

other link : [https://code.fb.com/data-infrastructure/accelerate-large-
sca...](https://code.fb.com/data-infrastructure/accelerate-large-scale-
applications-with-bolt/?r=1)

------
pella
[https://github.com/facebookincubator/BOLT](https://github.com/facebookincubator/BOLT)

------
zokier
Seems odd to have "data center" pitched here, when, as far as I can tell, it
is really suitable for any good old large applications. Or does "data center"
have some further implications that I'm missing?

~~~
rbanffy
Having our laptops finish a workload in 15% less time is nice. Running
Facebook on 15% fewer computers is the power consumption of a midsized
country. For them, a 1% improvement is a great day. 15% is winning the
lottery. Every week. ;-)

~~~
smolder
It's worth saying that when tons of deployed clients are 15% faster, that's at
a scale of high aggregate energy and time savings, too. But yes, companies
tend not to care about that as long as their users machines can handle the
burden of something less optimized. Good enough is good enough. For them it it
makes economic sense, but logically it doesn't; to treat client side
optimizations as less valuable at similar scales.

------
dev_dull
> _We have also applied BOLT to GCC and Clang binaries, and our evaluation
> shows that BOLT speeds up these binaries by up to 15.3%_

How is this even possible? Doesn’t this mean that there’s probably just some
inefficient or unoptimized compiler settings?

~~~
mikepurvis
The new optimizations are about code layout and it seems are based on real-
world perf data being fed back into the compilation process. So it wouldn't
necessarily be apparent to an initial compile what the hot code path is, but
with that profiling information present you can arrange your binary in order
to maximize hits on the code cache.

~~~
hoosieree
This sounds a lot like JIT compilation techniques, but applied to an already-
compiled binary.

~~~
tgtweak
I would think it's closer to branch prediction but on a larger scale than just
instruction pipelines.

I'm curious if it modifies the original binary.

------
stochastic_monk
Finally. When the tool was first announced, it was only described in a press
release on Facebook’s website: no GitHub, no arxiv.

Now it’s properly released.

------
nwmcsween
So facebook runs Non-PIE binaries in production?

------
szemet
Are there any Linux/x86-64 binaries somewhere? (Apparently my 16GB laptop is
not enough to compile LLVM...)

~~~
orbifold
Are you not using ninja? Both of my private machines have 16GB and I can
compile LLVM without a problem.

~~~
rurban
I used ninja on my big desktop machine, not laptop, and half of the link
targets were killed by the linux oom killer. Linux really became a joke.

Only ninja -j1 helped. My other llvm builds always succeeded the old way:
cmake && make -s -j16

------
dmoreno
Anybody knows if this also can help improve performance of language
interpreters such as CPython?

~~~
rurban
Sure

~~~
RayDonnelly
If this supported non-PIE I'd try to use it on Anaconda Distribution in a
flash.

