Hacker News new | past | comments | ask | show | jobs | submit login
An alternative to shared libraries (kix.in)
30 points by Fenume on Apr 26, 2015 | hide | past | web | favorite | 22 comments

This just illustrates that the real problem is versioning, not linking. It's nice to hand wave off the versioning problems of this approach, but in practice this isn't really different from the mechanisms used to version shared libraries and suffers all the same pitfalls. You've just replaced ld.so with a bunch of running daemons and turned it into a distributed computing problem. Don't get me wrong, I love plan 9 and its namespaces and long for them in the practical world, but this is 6 of one, half dozen of the other.

Also, filesystem namespaces in Linux are a privileged operation, so these kinds of approaches don't work at all like they do in plan 9.

What fascinates me is how close this synthetic filesystem is to making REST calls with a small client library. All your application needs to know is the path to call and what HTTP verb to use. Even the way the author introduces versioning sounds familiar: using a versioning file is similar to having a version at the beginning of your URL.

I appreciate how good architectures all resemble each other at some point, with only the transport layers differing between applications.

REST is just CRUD over HTTP. The file interface and Unix everything-is-a-file concept are one of the earliest well-defined CRUD interfaces in computing. The author (and Plan 9) are taking these concepts to their logical conclusion.

An excellent read. Moreover, most linkers these days have the ability to strip out unused code from your executable if you are using static linkage.

That means if your tiny executable uses just some methods of a gigantic library, it can do so an it will still stay tiny. Contrast that with dynamic linkage where in practice the whole gigantic library has to ship with your executable because you can never be sure that the host will already have the right version of libGigantic installed. What a mess.

Shared libraries are the reason a lot of app installations are several GB these days. And I dare say that 99.x % of the code that's shipped with an installation package never gets used by the app installed and that most compiled app code would comfortably fit on a couple of floppy disks if static linkage was used rigorously.

> if your tiny executable uses just some methods of a gigantic library, it can do so an it will still stay tiny

How many other functions do the ones you explicitly call drag into the executable? How do you propose to deduplicate them and their resources between processes? I'm specifically thinking about UI code. Having 30 different copies of your UI toolkit and its resources would be silly, even if each was stripped to 1/3 the size of the original. "Just use IPC and do UI in the window server" sounds an awful lot like X11, whose architecture we were just starting to migrate away from!

> Shared libraries are the reason a lot of app installations are several GB these days.

What are you referring to? vcredist is 7MB. Direct X runtimes are ~100MB. Qt is 20MB. Not small, but not the majority of several GB. Sure it's silly to ship an app with a shared library rather than statically link it, but I think you're exaggerating the scale of the problem and understating the number of profitable examples of library sharing, especially when it comes to UI libraries.

The X Window System architecture solves a specific problem - allowing clients to use graphical applications no matter where the app actually runs. This was important in the 1980s and is still worthwhile today - no matter how much compute power you can stash under your desk or wear on your wrist, you can cram orders of magnitudes more into a data center. Having a uniform interface that's independent of where the heavy lifting actually happens is a crucial accomplishment for X, and throwing that away with the focus on Direct X is ultimately a mistake.

Now, only if X actually worked when you want it to run in a data center and see it on your laptop. X is horrible when the latency goes above 5ms.

Not saying X is anywhere near perfect. But losing network transparency is a mistake.

Personally I always preferred MGR. I suppose that evolved into Plan 9's window system, but I've never tried the latter. My attempt to bring MGR into the modern age is clunky at best. https://github.com/hyc/mgr But there's a lot to be said for a lightweight network-transparent protocol with a braindead-simple runtime.

> losing network transparency is a mistake.

Why? X11 competes with VNC to provide remote interaction. VNC wins, and not by a small factor, on the apps I use day-to-day.

Don't forget, X11 is the reason why Unix got dynamic libraries.

"If you think about it, if your code is small and clean, you wouldn’t feel the need for shared libraries."

Yes you would, for any sane definition of "small and clean". Code can't be made arbitrarily small; some problem spaces are fundamentally complex.

For what it's worth I have played around some with dynamic libraries and with FUSE, and I found the former incomparably easier to work with. Maybe that speaks more to FUSE in particular than to the idea in general (or maybe it's just me being bad at FUSE), but that's been my experience.

OS X never[1] seems to require more than 8 GB, no matter how many applications you have running - I've got 28 right now, including the Office Apps. Google Earth, Various Browsers, etc... and system is ticking along nicely. I've got to believe that is a testimony to the power of shared libraries.

[1] Yes, I'm aware there is an infinite range of work loads (Video, Audio, PhotoShop, Virtualization, Oracle, etc.. that can use up as much memory as you throw at it - I'm talking about the joe average user workloads here

"In summary, the answer is to write lean, efficient and small pieces of code..."

What if the user could avoid "non trivial" programs, i.e. the ones that purportedly make it impossible to avoid shared libraries?

To put it another way, what if a user could have a system containing only trivial programs that each do one thing and then use them in combination to do "complex" tasks?

The term "non trivial software" is one I see continuously used as an underlying assumption and hence a justification for maintaining the status quo of all manner of existing software problems.

I do not want more "non trivial" software. I want simplicity and reliability. Not to mention comprehensibility. I get those things from so-called "trivial" software.

When some the "non trivial software" I am forced to use becomes too reliant on too much resources or too many dependencies, I stop using it and find an alternative.

This strategy has worked beautifully for me over the years.

Shared libraries was a useful concept in its day.

In my humble opinion, those days have passed. GB of memory is more than enough for me personally.

I like to use crunched binaries in my systems. As such, I do not seek out "non-trivial" software and am always looking to eliminate any existing dependencies on it.

Web browser is an extremelly non-trivial program and still I am assured we both use it.

Not all of them. Some folks still browse the web via Emacs, Lynx, and Surf - all rather simplistic browsers which work fine for a large number of sites, including this one.

Simplistic, but not non-trivial. Neither of them use curl-the-binary for http requests, I think.

Isn't Lynx 175,000 lines of code?

Shared code is the only sane way to manage secure systems. Not only do I not want to wait 6 months for an lzo or libpng RCE to be fixed separately in 28 different pieces of software (some are no longer maintained so I'll have to wait for a kind package maintainer or modify/compile/distribute myself), I also don't want the job of finding which of my software constitute the list of programs making my system vulnerable and require this attention - or don't, if they've already been fixed (how do you check?).

Performance and disk space have almost nothing to do with why we use shared libraries, IMHO.

I'm afraid that VFS-based APIs will be even more fragile as the life cycle goes, and even harder to reason about.

Abstracting away the parts that are Plan-9-implementation-specific, this article seems to be advocating replacing shared libraries with remote procedure calls / a network API, or more fundamentally, calls that can cross address spaces. It's worth nothing that this was an approach that, as I understand it, predated the advent of shared libraries on UNIX. Terminal handling (termios) and the X Window System protocol both come to mind, and we've been slowly moving away from that at least for X (libGL, Wayland/Mir, etc.). It's also strongly reminiscent of Mach's approach of message-passing between daemons, which was a decent idea, but ultimately failed because of performance.

There are definitely advantages to address-space isolation: an unintentional mistake in one component is much less likely to affect the other, the two components can pull in conflicting versions of dependencies, etc. But versioning and ABI compatibility remain issues. I think this post briefly touches on the versioning problem and assumes that providing both the old and new version of the library-daemon would solve it: that's probably technically true, but you'd need to keep every version of the library around to avoid the problem of libc introducing bugs in the process of fixing other bugs (the only concrete problem mentioned here). So yes, there's definitely more flexibility to solve problems than in the current implementations of dynamic linkers, but the problems themselves remain hard.

Meanwhile, you've also introduced the difficult constraint that libraries have to operate on copies of all your data. The hypothetical crypto library here is copying every block of ciphertext over an inter-process call, decrypting it, and copying it back to the original program. Apart from making security folks generically twitchy at all the copies of secret data running around, this is going to be awful for performance. And each side either has to trust the other side not to be trying to exploit it (which reduces the benefits of address-space isolation), or verify the data structures' integrity (which makes things even slower). It's possible that with good implementations of cross-process shared memory and low-overhead, secure message encodings (like Cap'n Proto), you could make this better, but it'll be a bit of a project.

I'm happy to admit that the implementations of dynamic linking are all less than awesome. Fundamentally, there's no reason that you can't design a shared-library system with all of the properties in this design, including the ability to load two copies of the same library that differ only by minor version, to satisfy dependencies of two different components. Even the current GNU linker (which is not my favorite dynamic linker) supports symbol versioning, so it could offer both the GLIBC_2.18 and GLIBC_2.19 versions of a function in the same library, although this facility isn't used very much.

My experience from Windows is that DLL's are both blessing and pain. I'll give a concrete example, a not so small Qt application.

If you use the DLL version of Qt you get the following benefits:

  - Faster link time (More on this later). This is the big winner.

  - Minor-versions can be updated apart from the executables or other DLL's using it. This requires the library itself to be well written and respect that (Qt follows good procedures, sqlite is awesome example, but there are some terrible ones - like the P4 C++ api which constantly adds/removes virtual members, enums, etc.)

  - Fully optimized (/LTCG) dlls, if Qt allows to mix all in one DLL then even better (calls between QtCore and QtGui could be further reduced, code inlined, etc.). Okay you don't get full whole app (only full static link would do), but you get still good link times with overall good optimization.

  - Exceptions kept there, not propagated (controversial whether this is a good idea, but I like it).

  - No clashes with other (usually) statically linked libraries - like png, zlib, etc. Unless you really want both QtCore and your app to use exactly the same versions (for one reason or another).

  - Smaller executable size (this does not matter lately, but may come)

Minuses of DLL, pluses of static linking:

  - Deployment madness no more. You push one executable, everyone is happy. You don't need to make sure that pushed DLLs (.so) won't break other executables. You might be able to rollback or work on specific version if things go badly (this could be also done with Dlls', .so, but they have to reside along with the executable).

  - Somewhat faster execution time (less time to resolve symbols, load DLLs, etc.)

  - You get RTTI, exceptions, and in older versions of certain systems __declspec(thread) and other things working correctly.

  - Real full whole code optimization. But you should have other release targets (for development).

My biggest pain with DLLs (and executables) was on Windows where people would sync directly from P4 (perforce) and ran directly the executables from where they were synced (this way, there is no extra step - "Press or Run this thing after syncing"). But this comes with price - on Windows you can't replace running executable with another (or DLL). I've tried the various hacks where you set the executable/dll to be loaded from the NET or CD so that it's fully put in the swap, but still does not work. A correct way seems to be a proper deployment - you sync, you ran some kind of "Deployment" tool and then work.

From Windows point of view I really liked the idea of loading the DLL by first looking at your executable path, then for other places. Seems like UNIX is not this way, but then on UNIX people had stabilized places and locations for things (/usr/lib, /usr/local/lib, etc.).

Another problem is if you want to have simulatenously 32-bit and 64-bit dlls/.so. I like OSX's solution most of fat binaries, but seems like Linux folks do not like that (there was a proposal time ago), and on Windows that's out of the question.

It doesn't scale really when you get more architectures/models, but if you have mainly 2 it might work pretty well (apart from being pain for the build system).

Regarding deployment via a single executable: that's only true if your application consists solely of code (or resources which can be deployed in code, such as XBMs). As soon as you start having resources which aren't code, you end up having to deploy them as well, which means you have to start versioning them, etc, etc. The version of Chrome I'm typing this into is 120 files, one of which is the executable.

Regarding faster execution time: that's debatable. You certainly won't necessarily get faster application startup, because you're going to have to page in all that code, where with a DLL it's probably already in memory and used by another process. You're also likely to use a lot more memory, because each process will have its own Qt instance, and they won't be shareable.

Re RTTI and exceptions: works for me on Linux! Does this really not work on Windows?

Re whole code optimisation: that I'll grant you. And DLL code is typically terrible (because of hacks needed to allow text pages to be shared between processes).

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact