TL;DR: This statically links multiple programs and their common shared libraries into one program, replacing main() with a function that calls the right entry point based on the name of the executable. With this, you can essentially create a busybox-like from the sources of coreutils.
The speed advantage compared to static linking is not due to magic, but to the fact their tooling does LTO.
It's all based on LLVM IR, so this not something that takes programs and shared libraries, and links them all with some tricks. It builds from source (AIUI from skimming).
In 1994 we added crunchgen(1) to FreeBSD which does the exact same thing for use on install and "Fixit" floppies.
I wrote the first prototype as a mess of shell-commands, and James da Silva from Maryland Uni turned that into a really neat tool, which to this day creates the "magic rescue binary" on FreeBSD systems:
critter phk> ls -li /rescue/* | head
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/[
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/bectl
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/bsdlabel
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/bunzip2
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/bzcat
2086658 -r-xr-xr-x 146 root wheel 12245968 Dec 6 12:34 /rescue/bzip2
[...]
Sorry for the delayed reply. We did find crunchgen well after the final version of the paper was turned in, so we weren’t able to compare with it.
One key part of allmux does replicate what you did in crunchgen: combine multiple binaries into one, and dispatching on argv[0]. There are also some important differences (and I wish we had known about the tool in time to discusss these in the paper):
-- crunchgen works on binary code, whereas allmux works on compiler IR (LLVM). Compiler IR enables much more sophisticated compiler optimizations to be applied to the mux’ed (or crunched) program. For example, we are able to apply link-time optimizations (LTO) across the application-library boundary for both static and dynamic libraries. In fact, we’re able to get more than 45% code size reductions even for a single application and its libraries using LTO (e.g., see the top chart in Fig. 9 – Single Programs).
-- Judging from the man page online, crunchgen works by parsing Makefiles to understand dependences and by using the DSL to specify libs, libs.so, etc. allmux works by adding passes to the compiler (Clang) and relying on the build process to invoke clang for all compile/link steps for all relevant components. This does limit allmux to needing source.
-- crunchgen does not change the behavior of shared libraries, judging by the man page at least. In contrast, allmux also includes shared libraries (linking them in statically in the mix'ed binary) and deduplicates them across applications. This has two benefits: (1) It speeds up program startup, which can be a valuable win in some scenarios, e.g., using a Mux’ed compiler to build a large system with 1000s of source files (see Fig. 1). (2) It achieves large disk and memory reductions compared with static linking, and even some reductions compared with dynamically shared libraries because of the added benefits of LTO (note that LTO is essentially not applicable to dynamically loaded libraries). The combination of BOTH memory reductions and speed improvements compared with either static or dynamic linking is a key benefit of allmux that the crunchgen approach doesn’t aim to get. Of course, this is limited to predetermined sets of applications and libraries.
Thanks very much for the note, and responses / comments welcome.
> There are three specific limitations to multiplexing,
at present. First, the benefits of multiplexing are limited to a predetermined set of applications
combined together, unlike either shared libraries or Slinky, both of which share code across arbitrary
applications on an end-user’s system. As noted earlier, combining multiplexing with Slinky would
get both kinds of benefits. As a direct consequence, sets of applications to multiplex must be
predetermined, cannot be varied from one end-user to another, and adding new applications to
an existing set is difficult (short of replacing the entire multiplexed binary for the set). Second,
multiplexing makes it difficult or more cumbersome to update software by upgrading or patching
dynamic libraries. Third, the current design of multiplexing disallows introspection techniques like
the use of dlopen and dlsym.
So yeah, this is not something the distros could use, you would have to ship libs in allexe format(essentially llvm IR). It seems their motivation is systems running a fixed set of binaries on production server allowing some nice memory/disk usage reduction.
We do not ship muxed binaries in allexe format: that's just used during dev. time, and we generate and ship binary code.
Distros could use multiplexing any time a set of programs are commonly used together, e.g., the programs in Clang / LLVM, or an entire LAMP/LEMP stack. It does require distros to distribute, e.g., the LEMP stack as a separate package, but the benefits seem to justify that: e.g., ~120MB total size for LEMP when muxed, vs. ~220MB for dynamic linking. This usually leads to comparable benefits in memory usage, as well.
The better approach is to do only dynamic linking with dependency isolation (side-by-side multiple versions/configurations of a library that don't conflict with itself. no more RPM upgrade dependency hell.), package managed as idempotent transactions (preferably pre and post snapshotted with something like zfs or btrfs), garbage collected (dependency tracked) library packages like how habitat or nix does it: every unique, no duplicates library installed in its own unique directory. This means no duplicates, no missing dependencies, no breaking dependencies, no duplication of static and dynamic code hacky toolchain kludges. It works with existing systems instead of redoing things, adding more complexity instead of making it more organized.
tl;dr: Habitat (hab) is really neat. From the opscode folks.
The speed advantage compared to static linking is not due to magic, but to the fact their tooling does LTO.
It's all based on LLVM IR, so this not something that takes programs and shared libraries, and links them all with some tricks. It builds from source (AIUI from skimming).