Hacker News new | past | comments | ask | show | jobs | submit login

dynamic linking to me seems like solving a problem that filesystems should. deduplicating data.



They are complementary. If you want to reuse the shared parts of binaries, then you need a way to separate the binary image into parts that are specific to the application and libraries that can be reused across binary images, plus some metadata how to reconstruct the image from its parts. That's what exactly what dynamic libraries provide.

In general, it's much easier to link a binary with its libraries than go in to the opposite direction (extracting common code from static linked binaries), because once you statically link a binary the library code will vary slightly due to differences in memory addresses, compiler optimizations, unused code that has been omitted, different input library versions, etc.

Even if you were, in theory, able to write a complex filesystem driver that is able to extract the common parts of statically linked libraries so they can be deduplicated, then to reconstruct the original binary in memory, you'd have to perform something similar to dynamic linking, except now in the kernel, which really isn't an improvement.


> If you want to reuse the shared parts of binaries

But aren't we, when we use OCI images as a packaging mechanism, using containers to essentially throw away that sharing and arrive at a complicated version of static linking, where everything dynamically linked is shipped with the program?

Same goes for arguments about ease of patching things. When the software's package is actually an image, you are patching each image individually that is running on the system.


No, because you can share common libraries across containers by putting them in a separate layer.

For example, if you have a complex service that consists of multiple binaries all written in C++ using boost, then for each binary you can create a container that contains a layer of a base OS (shared), C++ libraries (shared), boost libraries (shared), application binary (unique).

All the services can now share their common libraries, both on disk and in memory, which reduces I/O and memory use. That's one of the main advantages of containers over virtual machines (VMs): each VM instance has a distinct region of memory that is not shared with others even if they happen to load bit-for-bit identical binaries into memory.

(I know, VM memory deduplication exists to ameliorate this problem, but here my previous comment applies: it's much easier to start from shared components and link them together than extract the shared data after the fact. And typically VMs have lots of nonsharable state that containers do share, like pretty much all writable kernel pages.)


Are you saying that two containers running the same image will share their common libraries in the host kernel's memory?

Based on my understanding of cgroups, that seems unintuitive to me. Are you certain that's the case? I may try testing this out when I get a chance.


Yes. And even containers running different images will share the libraries so long as they come from shared layers.


I guess thinking about it more, that does check out. The kernel loads shared libraries, and containers share the kernel.


> All the services can now share their common libraries, both on disk and in memory, which reduces I/O and memory use.

Wow, how is this possible using layers? How does docker handle it if I subsequently modify one of the files of my layer in my container?


A docker image consists of several layers, each of which contains only the modifications to the layers below it. Each layer and the final image is immutable. Docker uses OverlayFS to provide a unified view of the various layers.

A running container is based on an immutable image and a single writable layer. That writable layer is unique to the container which contains all modifications made to the immutable image by processes running in the container.

Docker relies on the immutability of layers to share them between containers. This is not much different from how regular Linux processes all share the readonly contents of binaries and libraries that they load, while each process has its own private heap space that is not shared with other processes.

That means that deleting a file from a base layer, either when building an image or at runtime from the container, doesn't actually modify the contents of that layer. It only adds a tombstone marker to the writable layer, that indicates the file was deleted, and OverlayFS creates the illusion that the file no longer exists inside that container.

(The flipside is that deleting files from immutable layers doesn't actually free up space because the actual file contents don't go anywhere, but that's rarely a problem.)


Someone did once joke that Docker is just static linking for millennials. I chuckle because there's a kernel of truth in there. ;)


Yeah, maybe, but the cute thing here is that docker also "statically links" the config files, history db, image directory, etc etc.


In the same way a zip file is static linking, I guess, since Docker images are just tarred gzips. But .deb packages can include files and scripts that create directories, too.


Available in static linking via resource files.


I think a better analogy is, with containers you maintain more "servers" in the end, and patch and reboot them all.

So there's no free lunch and everything is a trade off. People thinking that containers are no more work than managing servers or services are in the wrong.


OCI is not the problem, you can create reproducible images using NIX that reuse all the packages between different images.

The problem is the way we usually package the apps with Docker-Dockerlike builders


They do more than that. For instance, they can be swapped easily. Think for instance a security library being updated by the distro security team. It also makes it easier to depend on an LGPL library, nicely allowing the users to modify the LGPL library without having to recompile your program.


>>> Think for instance a security library being updated by the distro security team.

And when your distro doesn't update? When the distro is slow to change?

Flatpak, Appimage, Snaps (why?)... upstream devs are bypassing distro maintainers in a lot of cases.


I am not saying that they solve all problems in the world. I am just saying that they have merits.

In your particular example, if you are not happy with your distro, you are free to find another one. Some distros are extremely fast to change.

> upstream devs are bypassing distro maintainers in a lot of cases

IMO, for open source projects, upstream devs should not distribute their library. They should let package maintainers do it for their distro.


Then there's probably a reason, like the changes haven't been tested yet or verified to make sure they don't FUBAR your machine. It's okay to go a little slower to ensure reliability. Plus, do you really want the latest bugs in HEAD anyway?


As a bonus, shared libraries are deduplicated in RAM, too.

Plus you don't need to recompile the universe because of a patch.


No, mostly it solves the problem of runtime library loading and swapping. E.g., Linux VDSO can only work as a shared library.


That's an interesting way to look at it. If all dynamic libs lived on a read-only location, then the file system could actually only store the libs in one place and the other "copies" would be just symlinks to that... and when the OS loaded such lib, it would automatically know that despite being in different locations, the libs were the same (they're all symlinks to the same place). Is this something that has been attempted before?


You can store the files as if you name the actual location of the file as a hash of its contents and symlink the file to that location, you naturally get deduplication. Fuchsia does this [1]. You still end up wanting to try and coordinate your packages to share as many deps as possible for resource optimization reasons, but you no longer depend on it.

[1]: https://fuchsia.dev/fuchsia-src/concepts/filesystems/blobfs


Slight tangent, but in the nodejs / npm ecosystem, that's one of the things that makes pnpm unique (and IMHO far superior to npm or yarn) -- its node_modules are deduped using symlinks.


This is exactly how nix works, except instead of symlinks it actually modifies the binaries at buld-time to point its dynamic library paths to absolute paths in the store which includes a source-derived hash in place of a version.


This is basically dynamic linking: you have foo.so sitting somewhere on disk, and multiple processes can just load it whenever. It can't be read-only, though; you need to add new libraries to that directory eventually.


Traditionally this is what /usr/lib is for.


> Is this something that has been attempted before?

Yes, some file systems implement "deduplication".


Deduplication, in the sense of physical space reduction, seems like the least strong argument one could make for dynamic linking these days.

The strongest argument I can think of, is enabling system managers (distro maintainers, even end users) to update dependencies. This might be to apply a security patch, enable some kind of tracing for profiling an application, and so on.


In practice, dynamic linking might not be achiving that much in the sense of deduplication: https://drewdevault.com/dynlib.html


Filesystems solve linking insofar as the libraries themselves are on disk; otherwise, this is squarely the responsibility of the loader.


Plugins have nothing to do with filesystems.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: