The notion of being able to fix an app by merely upgrading a library it depends on has not worked out in practice. More often than not, when I upgrade a library, I find myself having to upgrade my app’s code because so much has changed. The burden of having to constantly backup, upgrade, manually tweak config files, over and over and over again for days/weeks/months was SO not worth the few hundred megabytes or whatever dynamic loading was supposed to have saved.
Given this scheme, how would you distribute a security patch? Is each user of the library supposed to re-compile against the patched library?
Also, a program A depends on library B and library C v1.1. Library B also depends on C, but v1.2. Which gets used?
> More often than not, when I upgrade a library, I find myself having to upgrade my app’s code because so much has changed.
To me, this is the point of major version numbers. If you break clients, you increment the major version number, resulting in libfoo.so.2 and libfoo.so.3. Then, the scheme becomes much like hashes, in that newer versions won't break older clients, except you get security patches and a single copy of the library. However, the responsibility of knowing when to increment the major is left to a human, and all the error that entails.
As a sibling notes, there are distros out there that do this. (They are not my preference, for the above reasons.)
Unfortunately you do bring up a good point that security patches generally won't work without a recompile of the parent binary. One possible way out of this is that the external interface/unit tests for a library could also have their own hash, so if a library fixes something like a buffer exploit without changing its interface, the parent binary could use the new drop-in replacement. In practice I’m skeptical if this would work reliably though, because binaries may be relying on idiosyncrasies.
I'm thinking that a simpler method would be to have patches use the hash system. So say curl uses libssl and libssl releases a security update, then someone could drop the new libssl into the curl project and rebuild it without having to touch any code, giving curl a new hash that could be installed by other users. I think we are used to this almost never working so we are hesitant to upgrade. But the idea would be that we’d upgrade binaries that depend on libraries (rather than just libraries) and it would be a really cheap operation compared to today.
No, this is the responsibility of the server which distributes binaries to users via the package manager.
This is how NixOS works: http://nixos.org/
It works in practice if developers and maintainers adhere to semantic versioning. Unfortunately, there are numerous packages that don't adhere to this standard and that's when widespread breakage occurs.
These are excellent tools to keep the size down when using large static binaries. By compressing the file on disk and decompressing in memory you often wind up with a smaller and sometimes even faster loading (depending on disk IO speed vs decompression speed) package. I got a static Qt binary from 4 MB down to 1.3 with upx --lzma. Very nice stuff.
If the OS loads executables by mmap and load on page-hit, you can potentially save memory by not ever loading unused parts of an executable. a transform-on-load requires the entire program to be loaded before execution begins.
Pick a random large process on a linux host - like your web browser. cat /proc/$pid/status.
VmExe is the size of the mapped executable; VmLib is the size of all the other mapped libraries and executable pages. Add those two numbers together to find the size of all the executable code mapped into this binary.
VmRSS is the amount of physical memory that the process is currently using. You'll find that this is a lot smaller than the code mapped into the binary. That's because the kernel hasn't loaded any of that into physical memory.
If there is a perceived error in what I wrote, then, like the one nice responder, explain, please.
HN is such a different community now than when I joined 1,835 days ago. It brings tears to my eyes.
Though the goal should be generating less code. Link in fewer dependencies, reduce features, DRY up duplicate logic and cut LoC. Also compile with -DNDEBUG -O2 -g- and whatever LTO switches are available for whole program optimization if you're statically linking everything together. Also be sure to include static dependencies of other static dependencies like zlib (-lz), or you'll inevitably end up with missing symbol errors when compiling a final program. LTO cuts out all (most) of the shit that you don't need and attempts to optimize across translation units.
Furthermore a consideration against static linking, on most platforms, if the same shared library is already loaded, it's reused by mmaping it into a process. Not sure that duplicating code is going to reduce memory usage or the IO it takes to load from disk. Giant runtimes like Go, Ruby, Python and fucking Java shouldn't be duplicated N times... That's just wasteful. (I hate any language with an epic runtime or VM that includes the world to do anything.). Libraries should be reserved for the few redundant things that take tons of code to implement and change very little.
If anyone wants to compile a Linux system from scratch, try LFS and hackaround with static linking. It may take patches, extra flags to get what you want.
Hope Static Linux scales, because it's easier to upgrade static programs without dependency hell but the increased memory usage of duplicated code might not be so great of a tradeoff.
Another hack would be to statically compile every system program in each directory together (/sbin/, /bin/, and parts of /usr/bin, etc) into a single executable per directory that is then symlinked to itself to select which "program" to load via argv. It will be one giant exe per directory, but it will be cached basically all the time and with LTO, there won't be much duplication as with N programs compiled separately. This would take a main which dispatches to other renamed original mains and renaming all symbol conflicts across all translation units.
/bin/[ -> /bin/static
/bin/false -> /bin/static
/bin/true -> /bin/static
Yes UPX is cool, but I don't think it's 100% compatible.
suckless.org seems to focus on their web browser and their xterm clone these days, judging by the listserv traffic.
If you're interested in the idea, definitely check it out.
Anyhow, definitely another interesting project in a similar vein. The biggest difference I see, from a base perspective, is the choice of coreutils replacement: Sabotage uses Busybox while Morpheus uses a mixture of sbase, ubase, hbase (rewrite of heriloom utils), and 9base (some utils from plan9port.)
hbase is a collection of programs that complements sbase
and ubase. It's meant to be a temporary project that
will shrink and die once sbase and ubase gets
implementations of most of the programs included. hbase
mostly contains programs taken from the Heirloom project,
but also has other programs, such as patch taken from
FreeBSD and mk taken from plan9port.
> Because dwm is customized through editing its source code, it’s
> pointless to make binary packages of it. This keeps its userbase
> small and elitist. No novices asking stupid questions.
The most common method of configuration on linux is to include a parser for one of many shitty text or markup formats (whatever is currently "hip", so JSON at the moment), then carefully bind each variable you might want to modify to a key/value mapping extracted from the config file - and if you want to keep the sanity of your users, include verbose error messages or even a debugger so they can fix their inevitable typos.
The way configuration works on Windows and Mac is largely the same, except you wrap a GUI around the text file to handle the validation of inputs, which is a slight improvement over text input.
The problem with those input methods is they don't exactly allow you to configure much. You have to decide ahead of time all of the possible variables that one might want to change - and even then, you can't even compute new values to set the variables to, unless you embed an interpreter into your configuration format. As the program grows and gains more features, the configuration format needs amending, and grows uglier - which is what leads to Greenspun's tenth rule. Configuration files have their place - but most of the time, they're used where it'd be best to just have a programming language available.
I don't necessarily think dwm's idea of configuration via C is a great idea though, since they're not interpreting it and recompiling the whole program to make and test changes is a headache. Configuration via source code is the way to go, except it should be interpreted while the program is running, such that you only need to recompile for major breaking changes. Xmonad is configured via source code, but they have a separate process for your configuration, such that when you change it, the config is recompiled and the program relaunched without restarting the whole system. I'd personally opt to embed a Scheme into a WM, but that would probably go against suckless's minimalist philosophy.
Linking to a shared library brings in and initializes the whole library, even if you only need one function from it. So you tend to get stuff paged in during load that never gets used.
Isn't it? Usually distros target their packages to a single library version, and often people run suites (Gnome, KDE, etc) that use a similar set of libraries in their different processes.
Desktop would be crippled if every app was compiled with the whole stack of X, toolkit and Gnome libraries linked in statically.
High performance computing systems typically use dynamic linking extensively for that. One example: The hooks for profiling and tool support in the MPI standard for parallel programming pretty much depend on an LD_PRELOAD-type mechanism to be useful. Another: You can cope with the mess due to missing policy on BLAS libraries in Fedora/EPEL (unlike Debian) by using OpenBLAS replacements for the reference and ATLAS libraries; have them preferred by ld.so.conf and get a worthwhile system-wide speed increase on your Sandybridge nodes with EPEL packages.
Anyhow, rebuilding a static system to address a problem with a library ignores all its uses in user programs. The ability to adjust things via properly-engineered dynamic libraries really has a lot more pros than cons in my non-trivial experience. The use of rpath ("ones referencing specific paths"?) is mostly disallowed by packaging rules in the GNU/Linux distributions I know, so I'm not sure where that comment came from, and it tends to defeat the techniques above.
Can't have your cake and eat it too.
There would be inevitable bandwidth costs in updating like this, but that is the trade-off that is Explicitly made by choosing to go with static.
I don't think anybody would disagree, but you can't dismiss out-of-hand this required effort. The point is there are pros and cons. Its arguable that one really ought to have a build-server to mitigate the effect/work. For an OS/distribution, this would be a repository of binaries that are maintained, and you could do an (eg) apt-get update and have the proper software fixed (for your "enterprise" or similar software, a similar in-house mechanism) -- if everything is static, the act of replacing the binaries on the end-machine ought to be relatively simple for binary replacement, with the effort for library maintenance moved to maintaining an "out of band" record of what libs ea. app is using, so that when you have a flaw in libxyz that client-a, client-b, and client-c are using, you _know_ you need to update the source for client-[a-c] one way or another -- it boils down to a case of responsibility -- are you going to build safeguards into the link/run mechanism (dynamic libs) and have it adopt a certain amount of responsibility or move the cost upfront to build/maintenance and manage the responsibility yourself (with some other appropriate tooling)...
I don't think that's true. You could transfer only binary differences with bsdiff or something and if there are a lot of them with the same security update - you could go even farther and establish a single patch as a base and all the other patches as differences with the base (or other appropriate compression algorithm). Bandwidth should be very tiny.