Hacker News new | past | comments | ask | show | jobs | submit login
Static Linux (sta.li)
154 points by joseflavio on Dec 31, 2014 | hide | past | web | favorite | 57 comments

I wish Linux would replace dynamic libraries (especially ones referencing specific paths) with a system based on the library's hash. Then we could have a single lib folder with 1 copy of each library, and get ourselves out of dependency hell by just making dynamic loading act like static loading. We could even download libraries from the web on the fly, if needed. Heck it would even remove a lot of the need to compile every_single_time because apps could reference binaries.

The notion of being able to fix an app by merely upgrading a library it depends on has not worked out in practice. More often than not, when I upgrade a library, I find myself having to upgrade my app’s code because so much has changed. The burden of having to constantly backup, upgrade, manually tweak config files, over and over and over again for days/weeks/months was SO not worth the few hundred megabytes or whatever dynamic loading was supposed to have saved.

> I wish Linux would replace dynamic libraries (especially ones referencing specific paths) with a system based on the library's hash.

Given this scheme, how would you distribute a security patch? Is each user of the library supposed to re-compile against the patched library?

Also, a program A depends on library B and library C v1.1. Library B also depends on C, but v1.2. Which gets used?

> More often than not, when I upgrade a library, I find myself having to upgrade my app’s code because so much has changed.

To me, this is the point of major version numbers. If you break clients, you increment the major version number, resulting in libfoo.so.2 and libfoo.so.3. Then, the scheme becomes much like hashes, in that newer versions won't break older clients, except you get security patches and a single copy of the library. However, the responsibility of knowing when to increment the major is left to a human, and all the error that entails.

As a sibling notes, there are distros out there that do this. (They are not my preference, for the above reasons.)

Ya it would basically ignore names/versions and just use the hash as the identifier, a bit like BitTorrent.

Unfortunately you do bring up a good point that security patches generally won't work without a recompile of the parent binary. One possible way out of this is that the external interface/unit tests for a library could also have their own hash, so if a library fixes something like a buffer exploit without changing its interface, the parent binary could use the new drop-in replacement. In practice I’m skeptical if this would work reliably though, because binaries may be relying on idiosyncrasies.

I'm thinking that a simpler method would be to have patches use the hash system. So say curl uses libssl and libssl releases a security update, then someone could drop the new libssl into the curl project and rebuild it without having to touch any code, giving curl a new hash that could be installed by other users. I think we are used to this almost never working so we are hesitant to upgrade. But the idea would be that we’d upgrade binaries that depend on libraries (rather than just libraries) and it would be a really cheap operation compared to today.

I'm no expert, but based on the parent, I think it comes down to what he means by "library hash." If it's a hash (like a checksum) of the whole shared library, then different versions would have different hashes, theoretically.

Distro packagers generate a listing of public symbols (tagging a symbol with the version number when it is introduced) to catch incompatibilities. It's like static typing, it takes a bit of getting used to but it's very reliable.


>Given this scheme, how would you distribute a security patch? Is each user of the library supposed to re-compile against the patched library?

No, this is the responsibility of the server which distributes binaries to users via the package manager.

User, as in programmer who uses the library.

Programmers are used to recompiling all the time. What's the problem here?

> I wish Linux would replace dynamic libraries (especially ones referencing specific paths) with a system based on the library's hash.

This is how NixOS works: http://nixos.org/

> The notion of being able to fix an app by merely upgrading a library it depends on has not worked out in practice.

It works in practice if developers and maintainers adhere to semantic versioning. Unfortunately, there are numerous packages that don't adhere to this standard and that's when widespread breakage occurs.

An example of where this has worked (though admittedly the reason it has continued to work as long as it has is that the size of the community has dwindled down to a very small, manageable number over the last 20 years) is AmigaOS, where there are 30 year old libraries that are still updated, and where the updates are still expected to be drop-in replacements for the previous version.

And if developers would just write bug-free code then also life would be much simpler. Sadly neither of these things is going to happen, so we need to deal with it somehow.

Cool idea. Not enough people seem aware of executable packers like UPX http://upx.sourceforge.net/ though.

These are excellent tools to keep the size down when using large static binaries. By compressing the file on disk and decompressing in memory you often wind up with a smaller and sometimes even faster loading (depending on disk IO speed vs decompression speed) package. I got a static Qt binary from 4 MB down to 1.3 with upx --lzma. Very nice stuff.

The downside to tools like UPX is that the executable code is actively transformed on load. This limits the ability to use shared memory for multiple concurrent executions of the same executable.

If the OS loads executables by mmap and load on page-hit, you can potentially save memory by not ever loading unused parts of an executable. a transform-on-load requires the entire program to be loaded before execution begins.

'... never loading unused parts of an executable...' - this is one of the benefits of a paging system. Paging was (is?) the best way to keep memory requirement down. The way Multics worked was by having paged segments. http://www.osinfoblog.com/post/136/segmentation-with-paging:... The x86(-64) can support something like this, but as far as I know, no modern OS supports this feature.

Linux does this, as does basically every other modern OS.

Pick a random large process on a linux host - like your web browser. cat /proc/$pid/status.

VmExe is the size of the mapped executable; VmLib is the size of all the other mapped libraries and executable pages. Add those two numbers together to find the size of all the executable code mapped into this binary.

VmRSS is the amount of physical memory that the process is currently using. You'll find that this is a lot smaller than the code mapped into the binary. That's because the kernel hasn't loaded any of that into physical memory.

Thank you. I'm not at all sure why the 'old hands' here on HN have decided to so viciously downvote this. I didn't call anybody names, didn't violate any posting rules, did not violate etiquette, and provided a very nice link to Multics.

If there is a perceived error in what I wrote, then, like the one nice responder, explain, please.

HN is such a different community now than when I joined 1,835 days ago. It brings tears to my eyes.

Ah, and pklite pro and lzexe from the olden days for winDOwS shit once upon a time.

Though the goal should be generating less code. Link in fewer dependencies, reduce features, DRY up duplicate logic and cut LoC. Also compile with -DNDEBUG -O2 -g- and whatever LTO switches are available for whole program optimization if you're statically linking everything together. Also be sure to include static dependencies of other static dependencies like zlib (-lz), or you'll inevitably end up with missing symbol errors when compiling a final program. LTO cuts out all (most) of the shit that you don't need and attempts to optimize across translation units.

Furthermore a consideration against static linking, on most platforms, if the same shared library is already loaded, it's reused by mmaping it into a process. Not sure that duplicating code is going to reduce memory usage or the IO it takes to load from disk. Giant runtimes like Go, Ruby, Python and fucking Java shouldn't be duplicated N times... That's just wasteful. (I hate any language with an epic runtime or VM that includes the world to do anything.). Libraries should be reserved for the few redundant things that take tons of code to implement and change very little.

If anyone wants to compile a Linux system from scratch, try LFS and hackaround with static linking. It may take patches, extra flags to get what you want.

Hope Static Linux scales, because it's easier to upgrade static programs without dependency hell but the increased memory usage of duplicated code might not be so great of a tradeoff.

Another hack would be to statically compile every system program in each directory together (/sbin/, /bin/, and parts of /usr/bin, etc) into a single executable per directory that is then symlinked to itself to select which "program" to load via argv[0]. It will be one giant exe per directory, but it will be cached basically all the time and with LTO, there won't be much duplication as with N programs compiled separately. This would take a main which dispatches to other renamed original mains and renaming all symbol conflicts across all translation units.

    /bin/[ -> /bin/static
    /bin/false -> /bin/static
    /bin/true -> /bin/static
(Probably want to use hard links also)

Crunchgen [1] does that static compile, merge and symbol rename trick for you

[1] http://netbsd.gw.com/cgi-bin/man-cgi?crunchgen++NetBSD-curre...

I found a bug with an offline documentation reader (can't remember which) that if you pack one of the QT components (qwindows.dll) the application wouldn't load anymore. Still not sure whether to report a bug and who to report it to.

Yes UPX is cool, but I don't think it's 100% compatible.

You can usually set UPX options to fix that kind of thing. You may have stripped necessary linking information. I usually use it on statically linked things though, so no issue there.

I have been keeping an eye on this "project" for years and have yet to see anything come of it except lightning talks.

suckless.org seems to focus on their web browser and their xterm clone these days, judging by the listserv traffic.

Sin, the original author and maintainer of a few of the suckless utils necessary for Stali, has a similar project called Morpheus[0]. He has actually shipped a bootable image for testing and has more or less gotten the packaging sorted out[1].

If you're interested in the idea, definitely check it out.

0: http://morpheus.2f30.org 1: http://morpheus.2f30.org/0.0/packages/x86_64/

A similarly interesting project is the musl-based Sabotage Linux with its own tiny but concurrent package manager: https://github.com/sabotage-linux/sabotage

Much of the bootstrapping of Morpheus appears to have come from Sabotage Linux, especially the use of musl-cross[0]. I thought I remember seeing numerous references to Sabotage in the docs but now I can't seem to find them.

Anyhow, definitely another interesting project in a similar vein. The biggest difference I see, from a base perspective, is the choice of coreutils replacement: Sabotage uses Busybox while Morpheus uses a mixture of sbase[1], ubase[2], hbase[3] (rewrite of heriloom utils[4]), and 9base[5] (some utils from plan9port[6].)

0. https://github.com/sabotage-linux/musl-cross 1. http://git.2f30.org/sbase/ 2. http://git.2f30.org/ubase/ 3. http://git.2f30.org/hbase/ 4. http://heirloom.sourceforge.net 5. http://git.suckless.org/9base/ 6. http://swtch.com/plan9port/

The README for hbase characterizes it differently:

   hbase is a collection of programs that complements sbase
   and ubase. It's meant to be a temporary project that
   will shrink and die once sbase and ubase gets
   implementations of most of the programs included. hbase
   mostly contains programs taken from the Heirloom project,
   but also has other programs, such as patch taken from
   FreeBSD and mk taken from plan9port.
So it's not a rewrite, but rather a temporary code dump for "miscellaneous utilities". Skimming at the code confirms this.

Correct. When I said 'rewrite', I mean 'update of some'.

Sabotage is such a wonderful idea and I really hope it succeeds. It's not nearly usable right now, though.

What do you have a problem with?

Never even heard of it, amazing. #xmas

"Of course Ulrich Drepper thinks that dynamic linking is great, but clearly that’s because of his lack of experience and his delusions of grandeur." I find these kind of comments reflect veerryyy poorly on their authors..

  > Because dwm is customized through editing its source code, it’s
  > pointless to make binary packages of it. This keeps its userbase
  > small and elitist. No novices asking stupid questions.
I never understood why the authors of dwm thought this was a "nice" feature of configuration via source code.

They don't mind if DWM is only ever used by a handful of people if only a handful of people agree with the principles behind it.

Consider that their goal is not to have a huge community, but to have a piece of software that does what they want, and that dealing with a community is a lot of work. So they've chosen to use this as a filter in order to exclude people they perceive as causing more trouble than it's worth, and limit their community to people who pass through their "trial by fire" of having to be both willing and able to deal with compiling from source.

The other kinds of configuration aren't that much better.

The most common method of configuration on linux is to include a parser for one of many shitty text or markup formats (whatever is currently "hip", so JSON at the moment), then carefully bind each variable you might want to modify to a key/value mapping extracted from the config file - and if you want to keep the sanity of your users, include verbose error messages or even a debugger so they can fix their inevitable typos.

The way configuration works on Windows and Mac is largely the same, except you wrap a GUI around the text file to handle the validation of inputs, which is a slight improvement over text input.

The problem with those input methods is they don't exactly allow you to configure much. You have to decide ahead of time all of the possible variables that one might want to change - and even then, you can't even compute new values to set the variables to, unless you embed an interpreter into your configuration format. As the program grows and gains more features, the configuration format needs amending, and grows uglier - which is what leads to Greenspun's tenth rule. Configuration files have their place - but most of the time, they're used where it'd be best to just have a programming language available.

I don't necessarily think dwm's idea of configuration via C is a great idea though, since they're not interpreting it and recompiling the whole program to make and test changes is a headache. Configuration via source code is the way to go, except it should be interpreted while the program is running, such that you only need to recompile for major breaking changes. Xmonad is configured via source code, but they have a separate process for your configuration, such that when you change it, the config is recompiled and the program relaunched without restarting the whole system. I'd personally opt to embed a Scheme into a WM, but that would probably go against suckless's minimalist philosophy.

An advantage of dynamic libraries is that the memory used to hold the library's executable pages can be shared across processes. So using static only binaries will lead to less free memory on the OS.

That's the party line. It's often wrong. If two copies of the same program are running, they share memory for code. For a shared library to reduce memory consumption, there must be multiple different programs using the same version of the same library. That's not all that common, beyond very basic libraries such as "libc".

Linking to a shared library brings in and initializes the whole library, even if you only need one function from it. So you tend to get stuff paged in during load that never gets used.

That's not all that common

Isn't it? Usually distros target their packages to a single library version, and often people run suites (Gnome, KDE, etc) that use a similar set of libraries in their different processes.

Indeed. ldd any substantial GTK app and scroll past the dependencies. They are huge. Most of them are shared across applications.

Desktop would be crippled if every app was compiled with the whole stack of X, toolkit and Gnome libraries linked in statically.

I'd argue that libraries like GTK were only allowed to become so bloated because dynamic linking masked their true impact on the system. If static linking were the norm, we'd be using much simpler, cleaner libraries because people would think twice about adding 100+ megabytes to their binaries for basic GUIs.

Apart from the general system management advantages of dynamic libraries, they provide an important extensibility/customization mechanism (e.g. https://www.gnu.org/software/emacs/emacs-paper.html for an early mention).

High performance computing systems typically use dynamic linking extensively for that. One example: The hooks for profiling and tool support in the MPI standard for parallel programming pretty much depend on an LD_PRELOAD-type mechanism to be useful. Another: You can cope with the mess due to missing policy on BLAS libraries in Fedora/EPEL (unlike Debian) by using OpenBLAS replacements for the reference and ATLAS libraries; have them preferred by ld.so.conf and get a worthwhile system-wide speed increase on your Sandybridge nodes with EPEL packages.

Anyhow, rebuilding a static system to address a problem with a library ignores all its uses in user programs. The ability to adjust things via properly-engineered dynamic libraries really has a lot more pros than cons in my non-trivial experience. The use of rpath ("ones referencing specific paths"?) is mostly disallowed by packaging rules in the GNU/Linux distributions I know, so I'm not sure where that comment came from, and it tends to defeat the techniques above.

Static linking OpenSSL is probably not a good idea

You mean for security patches? i.e rebuilding all your binaries, instead of just openssl's shared libs

One downside to static linking is security vulnerabilities. Say, for example that all the programs on your computer that use OpenSSL statically link. When OpenSSL has a security flaw, you have to not only update OpenSSL, but all of those other programs too.

Are there any static BSDs?

You can build NetBSD to be statically linked [1]. It is not the default, but it is trivial to do, one line config change. Note that you can build NetBSD from source on any system, eg Linux or OSX and then boot it in a VM, so you dont really need a binary "distro".

[1] https://www.netbsd.org/docs/guide/en/chap-build.html#chap-bu...

Picobsd maybe?

I would like to see "static" USE flag in Gentoo for all packages with ability to statically link whole system. Also profiles or env with other libc's would be nice. Dunno why they need to crate new distribution instead of expanding ones that are already there.

Stali has been in 'planning' for as long as I can remember. At least 4 years.

Wonder if it helps with dep hell...

It would for a certain class of hell. You won't get issues w/ resolving symbols in dynamic libs, but you _would_ open yourself to feature disparity across different apps that use similar libs. For example, if you've got a Client-A that uses libfoo feature X(v1), and Client-B that shipped w/ and links against libfoo feature X(v2), it might be frustrating -- this scenario is part of the promise (and responsibility) of shared libraries.

Can't have your cake and eat it too.

And that feature disparity includes security updates. If a library is updated with a security fix, you'll need to update everything that uses that library to get the fix, rather than just the shared dynamic library.

The tools for updating security fixes should be applicable for the applications just as easily as the libraries. The applications would have to be rebuilt which should be automated. If the library changes anything that causes the build to fail, much better to find that out at that time than to have the failure occur when it dynamically links on end user machines.

There would be inevitable bandwidth costs in updating like this, but that is the trade-off that is Explicitly made by choosing to go with static.

> The tools for updating security fixes should be applicable for the applications just as easily as the libraries...

I don't think anybody would disagree, but you can't dismiss out-of-hand this required effort. The point is there are pros and cons. Its arguable that one really ought to have a build-server to mitigate the effect/work. For an OS/distribution, this would be a repository of binaries that are maintained, and you could do an (eg) apt-get update and have the proper software fixed (for your "enterprise" or similar software, a similar in-house mechanism) -- if everything is static, the act of replacing the binaries on the end-machine ought to be relatively simple for binary replacement, with the effort for library maintenance moved to maintaining an "out of band" record of what libs ea. app is using, so that when you have a flaw in libxyz that client-a, client-b, and client-c are using, you _know_ you need to update the source for client-[a-c] one way or another -- it boils down to a case of responsibility -- are you going to build safeguards into the link/run mechanism (dynamic libs) and have it adopt a certain amount of responsibility or move the cost upfront to build/maintenance and manage the responsibility yourself (with some other appropriate tooling)...

> There would be inevitable bandwidth costs in updating like this

I don't think that's true. You could transfer only binary differences with bsdiff or something and if there are a lot of them with the same security update - you could go even farther and establish a single patch as a base and all the other patches as differences with the base (or other appropriate compression algorithm). Bandwidth should be very tiny.

That's a problem for me - most of my stuff lives on sub-56k radio or satellite links. Thanks for the explanation!!!

There is a tradeoff between bandwidth and local processing in that you may download all updated dependent binaries or just get the new updated file and relink (or recompile) the affected programs locally.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact