
Software Multiplexing: Share Your Libraries and Statically Link Them Too [pdf] - gbrown_
https://dl.acm.org/citation.cfm?id=3276524
======
glandium
TL;DR: This statically links multiple programs and their common shared
libraries into one program, replacing main() with a function that calls the
right entry point based on the name of the executable. With this, you can
essentially create a busybox-like from the sources of coreutils.

The speed advantage compared to static linking is not due to magic, but to the
fact their tooling does LTO.

It's all based on LLVM IR, so this not something that takes programs and
shared libraries, and links them all with some tricks. It builds from source
(AIUI from skimming).

~~~
phkamp
This is not exactly a new idea.

In 1994 we added crunchgen(1) to FreeBSD which does the exact same thing for
use on install and "Fixit" floppies.

I wrote the first prototype as a mess of shell-commands, and James da Silva
from Maryland Uni turned that into a really neat tool, which to this day
creates the "magic rescue binary" on FreeBSD systems:

    
    
        critter phk> ls -li /rescue/* | head
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/[
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/bectl
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/bsdlabel
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/bunzip2
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/bzcat
        2086658 -r-xr-xr-x  146 root  wheel  12245968 Dec  6 12:34 /rescue/bzip2
        [...]

~~~
vadve
Sorry for the delayed reply. We did find crunchgen well after the final
version of the paper was turned in, so we weren’t able to compare with it.

One key part of allmux does replicate what you did in crunchgen: combine
multiple binaries into one, and dispatching on argv[0]. There are also some
important differences (and I wish we had known about the tool in time to
discusss these in the paper):

\-- crunchgen works on binary code, whereas allmux works on compiler IR
(LLVM). Compiler IR enables much more sophisticated compiler optimizations to
be applied to the mux’ed (or crunched) program. For example, we are able to
apply link-time optimizations (LTO) across the application-library boundary
for both static and dynamic libraries. In fact, we’re able to get more than
45% code size reductions even for a _single_ application and its libraries
using LTO (e.g., see the top chart in Fig. 9 – Single Programs).

\-- Judging from the man page online, crunchgen works by parsing Makefiles to
understand dependences and by using the DSL to specify libs, libs.so, etc.
allmux works by adding passes to the compiler (Clang) and relying on the build
process to invoke clang for all compile/link steps for all relevant
components. This does limit allmux to needing source.

\-- crunchgen does not change the behavior of shared libraries, judging by the
man page at least. In contrast, allmux also includes shared libraries (linking
them in statically in the mix'ed binary) and deduplicates them across
applications. This has two benefits: (1) It speeds up program startup, which
can be a valuable win in some scenarios, e.g., using a Mux’ed compiler to
build a large system with 1000s of source files (see Fig. 1). (2) It achieves
large disk and memory reductions compared with static linking, and even some
reductions compared with dynamically shared libraries because of the added
benefits of LTO (note that LTO is essentially not applicable to dynamically
loaded libraries). The combination of _BOTH_ memory reductions and speed
improvements compared with either static or dynamic linking is a key benefit
of allmux that the crunchgen approach doesn’t aim to get. Of course, this is
limited to predetermined sets of applications and libraries.

Thanks very much for the note, and responses / comments welcome.

\--Vikram (on behalf of the authors)

------
heyjudy
The better approach is to do only dynamic linking with dependency isolation
(side-by-side multiple versions/configurations of a library that don't
conflict with itself. no more RPM upgrade dependency hell.), package managed
as idempotent transactions (preferably pre and post snapshotted with something
like zfs or btrfs), garbage collected (dependency tracked) library packages
like how habitat or nix does it: every unique, no duplicates library installed
in its own unique directory. This means no duplicates, no missing
dependencies, no breaking dependencies, no duplication of static and dynamic
code hacky toolchain kludges. It works with existing systems instead of
redoing things, adding more complexity instead of making it more organized.

tl;dr: Habitat (hab) is really neat. From the opscode folks.

