
BOLT: Binary Optimization and Layout Tool - lelf
https://github.com/facebookincubator/BOLT
======
uwu
similar thing from google: [http://lists.llvm.org/pipermail/llvm-
dev/2019-September/1353...](http://lists.llvm.org/pipermail/llvm-
dev/2019-September/135393.html)

> Inspired by the performance gains and to address the scalability issue of
> BOLT, we went about designing a scalable infrastructure that can perform
> BOLT-like post-link optimizations.

> Our experiments on large real-world applications and SPEC with code layout
> show that Propeller can optimize as effectively as BOLT, with just 20% of
> its memory footprint and time overhead.

~~~
thedance
One thing I really wanted to be able to do with Propeller was a post-link
optimization of a Go program. Go's compiler is both fast and terrible. If you
glance at the text, opportunities for simple optimizations abound.
Unfortunately the layout of the program, references between the debug
information and the function entry points, and Propeller having been designed
for C++ programs kept me from getting very far. I did get a little ways, but
not far enough that I could keep a long-running Go program from crashing.

~~~
BubRoss
Did you mean bolt instead of propeller? Bolt optimizes binaries, propeller is
an llvm fork with extra information and a special linker.

Neither is made to do the optimizations you are talking about, that is what
regular llvm does.

------
EvanAnderson
It's like Nat Freidman's "GNU Rope"[1] from 1998 which, if memory serves, was
inspired by a tool that did the same thing in IRIX. I'd like to see this
functionality built into an OS loader such that the on-disk binaries can
remain unchanged (think verifying reproducible builds) but the loader will
relocate code in memory to maximize locality of reference.

[1]
[http://lwn.net/1998/1029/als/rope.html](http://lwn.net/1998/1029/als/rope.html)

(An aisde: I was at that talk in 1998 at Atlanta Linux Showcase and remember
it fairly well. I'm the guy who asked Nat if he'd considered using a genetic
algorithm for the optimization processing.)

~~~
tjalfi
GNU Rope never shipped but it looks like the source code is here[0].

[0] www.hungry.com/~shaver/gropt.tar.bz2

------
q3k
Okay, I don't get it.

Why not hook this into LLVM's compiler infrastructure, instead of recovering
the CFG from a binary - and not any binary, but one that has to built with
special flags anyway? This seems backwards.

~~~
marklacey
Because if you statically link libraries like the language runtime it allows
you to reorder the blocks in those as well to improve locality and instruction
cache utilization.

For example if your code calls a particular runtime function shortly after
starting, that function (or the basic blocks from that function that are
executed frequently) can be placed close to the call site.

~~~
ndesaulniers
If you're statically linking, then LTO would give you cross TU visibility to
do the same optimization at link time, not post link when you've lost a lot of
fidelity.

~~~
marklacey
Not if it’s a library that you’re not compiling, like language runtimes or
system libraries.

LTO requires that you’re compiling all the code to get the benefits.

I’m also not sure if either gcc or clang’s LTO does layout at the basic block
level or at the function level. There is a lot of benefit from doing it at the
basic block level. You can literally lay out all the code that runs at
application start time so that it is mostly adjacent and fall-through from
block to block during execution.

This isn’t something you necessarily do instead of LTO, but rather something
you can do in addition to it.

------
saagarjha
Some discussion from two years ago:
[https://news.ycombinator.com/item?id=17350122](https://news.ycombinator.com/item?id=17350122)

------
fefe23
Why would you use this over the profile guided optimizations that gcc and
clang already offer, that use the same data?

What kind of perf benefits can you expect?

What kind of application have you tested this with?

~~~
ahupp
From the paper:

"For datacenter applications, BOLT achieves up to 7.0% performance speedups on
top of profile-guided function reordering and LTO."

This is used for large binaries, like hhvm. There's a lot more details here:
[https://research.fb.com/publications/bolt-a-practical-
binary...](https://research.fb.com/publications/bolt-a-practical-binary-
optimizer-for-data-centers-and-beyond/)

------
rurban
Some usage description of mine:
[http://perl11.org/blog/bolt.html](http://perl11.org/blog/bolt.html)

Propeller looks much better now.

------
CalChris
I guess maybe the original version of something like this was _cord_ and
_cord2_ , done at MIPS in the mid-80s.

[http://www.bitsavers.org/pdf/mips/RISCos/3204DOC_RISC_os_Use...](http://www.bitsavers.org/pdf/mips/RISCos/3204DOC_RISC_os_Users_Reference_Manual_Volume_I_System_V_Mar1989.pdf)

I wonder if those have been open sourced.

------
psandersen
What would it take to use this on a complete linux distro like ubuntu? Seems
like it would be a respectable improvement.

~~~
thedance
I think it's a demonstrated fact that nobody cares about the out-of-the-box
distro performance, because recompiling Ubuntu for your actual CPU (or
anything newer than Opteron) gives large speedups on every program. Doing
whole-distro PGO and LTO would be challenging because there's no one true
workload upon which all users agree. An optimization for me might be bad for
you. Also PGO and LTO will tend to think initialization and startup code is
"cold" whereas in a thing like a shell pipeline that exec's many programs may
be bottlenecked by those functions. If I profile a long-running daemon and
rebuild libc with those profiles, it could make startup times slower for an
exec-heavy workload.

------
cwzwarich
The README doesn't link to the final version of their article:
[https://research.fb.com/publications/bolt-a-practical-
binary...](https://research.fb.com/publications/bolt-a-practical-binary-
optimizer-for-data-centers-and-beyond/)

~~~
CalChris
I believe the version published at CGO 19 is the final version. It is longer.

[https://www.ic.unicamp.br/~ra045840/bolt-
cgo19.pdf](https://www.ic.unicamp.br/~ra045840/bolt-cgo19.pdf)

