Hacker News new | past | comments | ask | show | jobs | submit login
Linux Container Internals (docker-saigon.github.io)
201 points by deepakkarki on Dec 21, 2016 | hide | past | favorite | 34 comments



If you want to learn about linux container internals, I can recommend just trying to implement one yourself in some random language for fun. I wrote a basic runtime in python that can run docker images:

https://github.com/kragniz/omochabako

Quick demo: https://asciinema.org/a/77296?speed=2&autoplay=1


I second this! I watched Liz Rice's Golang UK 2016 talk and was inspired to write something to create containers. I was also interested in x86_64 assembly so decided to go with that.

In the end it is realy simple to create something to run containers even in assembly since it is just a few syscalls to set things up. Ended up with: https://github.com/archevel/quic

The main difficulty was figuring out how to do the netns part.


Mild side note, this technique is the only way I finally wrapped my head around lisp ... Building a scheme. It's definitely a good way to learn things about a system if you can find an appropriate toy level plan to implement.


that sounds pretty cool. Sorry for ignorance (heavy Windows background) but where does one start with something like that? Is there a RFC type of thing for these images or runtime spec?


https://lwn.net/Articles/531114/ and https://www.kernel.org/doc/Documentation/cgroup-v1/ (or the newer https://www.kernel.org/doc/Documentation/cgroup-v2.txt )

Actually all that was said (and somewhat explained) in the article in question. Although i would recommend to try cgroups directly with the kernel.


I started by watching https://www.youtube.com/watch?v=HPuvDm8IC-4 and reading https://www.infoq.com/articles/build-a-container-golang

Also "man namespaces" on a linux machine will give some deeper documentation.


The url is correct, 'Docker-internals', but the title, and further equivocation of 'Linux Containers' and 'Docker', is a bit confusing. I'm a happy lxc/lxd user, and had to stop reading because of the cognitive dissonance.


Good point, updated the title. Thanks for the comments (and yes it's almost a year ago I wrote this post)


Thanks, feels like the nitpick, but I gave it another go, and I see some other changes now? Seems like a great amount of information.



neatz, I guess I just need to be more patient and set my expectations :)


I also thought it was about LXC. Sad.


Why is it sad?


Because LXC on a technical level is much simpler, doesn't make mess in network and filesystem setup, and is easy to understand. Docker does some heavy magic for things to work for programmers (the ones who can't be bothered to learn how ethernet bridging or IP routing work), so the whole thing feels brittle.

In short, LXC is vastly underappreciated.


OpenVZ is dated 2005 in the document, which makes FreeBSD look a bit alone back in 2000, but that's not really accurate. OpenVZ was just a re-branding of Virtuozzo, which was released in 2000, in an effort to upstream it.

Virtuzzo got really popular quickly in the cheap webhosting market, as a more secure and powerful alternative to shared hosting. I worked with it a lot in the early 2000s. I don't think they ever outgrew that market however, and when "real" virtualization came with VMware and friends, that's where all the money went.

It's only fair to mention where it really started in the Linux world. (Also a bit funny to see the pendulum of tech swing back again. It's about time!)


Cool! thanks for info. I've added a link back to these comments to the blog source (haven't re-generated the static html yet though)


This is from February, which is ancient in Docker years, but the container history and references are quite useful.


yet it covers containerd ;)



I have some small critiques of some of the hyperbole in the article:

"Package managers failed us due to shared libraries version differences causing dependency issues"

Incorrect. The software administrators (read: The Users) failed to understand that installing duplicate incompatible software does not work, was never intended to happen, and shouldn't even be possible. But users are stubborn and will force a conflict if at all possible.

Containers allow users to side-step package management. It doesn't replace it or help it at all, because it completely ignores all the work gone into the package. Imagine putting on tennis shoes, and then trying to put on snow boots. Containers give users a second pair of feet.

And this is not a container innovation. Chroot environments have been providing the exact same functionality (installing side-by-side conflicting packaged software in a simple manner) for decades. You don't even need any extra software to use it.

"Docker provides a self-contained image that is exactly that same image running on your laptop vs in the cloud while i.e. Puppet/Chef are procedural scripts that need to rerun to converge your cluster machines. This enables approaches also know as Immutable Infrastructure or Phoenix Deploys."

Unless you designed your software to be immutable, it probably isn't. Software changes as it runs, and different hardware changes software differently, so at the best this claim is disingenuous. Different networks and systems interacting in different locations add complications. If you tested it on your laptop, do not expect it to run the same in production, period.

"Before Docker, LXC would create a full copy of FileSystem when creating a container. This would be slow and take up a lot of space."

Loop and COW filesystems (Unionfs, Aufs, Overlayfs, etc) on Linux pre-date Docker by a long time, and were used with containers and container alternatives.

--

I thought i'd see more about linux container internals, not a description of how Docker works, but I guess the host name should have been a dead giveaway. Don't read this if you want to know about the kernel.


> Incorrect. The software administrators (read: The Users) failed to understand that installing duplicate incompatible software does not work, was never intended to happen, and shouldn't even be possible. But users are stubborn and will force a conflict if at all possible.

Why doesn't it work? By whom was it never intended to happen? Why should it not even be possible?

I've shipped production software that - very carefully - links multiple versions of OpenSSL within the same process, so it's not a matter of some law of physics that I can't have two versions of OpenSSL on my system used by separate binaries. It's a design choice that this is how things are going to work. You don't need containers to pick a different design choice, yes, but neither do you need chroots - just careful use of shared library versioning and symbol versioning.

Containers won because containerization tools made all of this easy. Nobody wants to piece together shell scripts to do things in chroots any more than they want to piece together shell scripts to set LD_LIBRARY_PATHs. (And way more commercial software actually does the latter, because they want to side-step package management because they have no idea what libraries are on your system.)


* > Why doesn't it work? By whom was it never intended to happen? Why should it not even be possible?*

It doesn't work because it's incompatible, and so it's complicated. If I build A with B1, and you build C with B1.1, and the user wants both A and C, they need both B1 and B1.1. Which is fine - IF they built B* with unique symbol names, and built their apps against those unique symbol names, and if everyone else in the world follows exactly the same convention. Of course, if anything else changes (cpu architecture, features, ABI, whatever) everything may break anyway. But in general the biggest problem is not everyone builds software the same way.

Both the software developers and the package managers never intended for incompatible software to be installed at the same time. The software devs could make it handle these cases, but they usually don't, so it doesn't work. The package managers could package their software uniquely every time, but that would be annoying, cumbersome and not very useful for managing systems ("do i need to remove db3 before i install db4? what are all the packages called? what's the order? what else will be affected? do i rename everything and rebuild everything with names specific to this one library package name?" etc).

It shouldn't be possible to install conflicting software because the package should be built to fail to install if conflicting software exists, or remove the conflicting software before install. But sadly there also exists the ability to remove all these safeguards, or to install unpackaged software.

Containers are just a wrapper around existing tools, such as package managers. They don't add functionality, they just simplify it. With Docker, you aren't linking to multiple versions of openssl within the same process: you're running one process in one environment with one version of openssl, unless you intentionally get really fancy, which really isn't easy. Package managers never failed, they simply weren't being used right.

Containers won because someone finally realized users don't care how they do what they want, as long as they get to do it without having to know how it actually works. Devs get to pretend they know how to deploy software or manage systems and Ops people get less responsibility because they didn't build the shit so they don't support it. It's a win-win, but it's still a mess, and none of it is new or novel.


> IF they built B with unique symbol names, and built their apps against those unique symbol names*

This isn't necessary. If you're not going to load both versions into the same process, they can overlap symbol names. This is how Linux distro version upgrades work: the system installs libfoo2, then upgrades binaries that use libfoo1 to versions that use libfoo2, then removes libfoo1 when nothing needs it any more. At all times, the system is in a working state; any given binary will load either libfoo1 or libfoo2.

The trouble is that Linux distros tend not to want to provide more security support for libfoo1 than they have to, so if you have software that still requires libfoo1, the easiest approach is to use a container/chroot/VM/whatever with an older distro release, possibly from a different vendor, that's hopefully still under security support.

(If you do care about loading both libraries into the same process, you need symbol versioning / two-level namespaces / direct binding / whatever your ld.so wants to call it, which means that every reference to a dynamic symbol specifies which dynamic library the symbol comes from. The names themselves remain unchanged, but they're referenced by a tuple of library and name. This works. Again, I've shipped software that would crash horribly if this didn't work.)

> Both the software developers and the package managers never intended for incompatible software to be installed at the same time.

I'm not sure that's true for software developers: I can't imagine that, say, the OpenSSL developers do their work by replacing their system OpenSSL every time they recompile. They already know full well how to test an OpenSSL in ~/src/openssl and keep it separate from the one in /usr/lib, without using chroots.

It's true for package managers, but that just means that package managers are failing at delivering a thing users want.

In particular, forcing upstream software to follow conventions and building all software the same way, and patching things as necessary, is the entire job of a distro. If two versions of a distro package conflict, that's because the distro chose not to make them coinstallable. If only one version of a library is available in a distro, that's because the distro chose not to make other versions available. They might have reasons for this (e.g., security support effort) but none of it is fundamental impossibility.

(Also, if you mean B1.1 in a semver sense, or equivalently a libb.so.1.1 sense, upstream is promising that it's backwards-compatible with B1, such that A can dynamically use B1.1 despite being compiled against B1. If that's not true and B1.1 is ABI-incompatible with B1, either upstream or the distro needs to rename B1.1 to B2 / rename libb.so.1.1 to libb.so.2.)

> Devs get to pretend they know how to deploy software or manage systems

I submit that the only valid measure of whether you know how to deploy software or manage systems is whether systems get deployed or systems get managed.


I'm not saying you can't run software with duplicate libraries installed. I'm saying there is conflicting software, both on individual distros and across distros, that is simply not currently created in a way that can be installed side by side and run without extra steps involved. Specifically conflicting file names, but also conflicting functionality which extends beyond just shared library conflicts. And i'm saying that Docker serves the function of "fixing" a problem which package managers did not create.

> I submit that the only valid measure of whether you know how to deploy software or manage systems is whether systems get deployed or systems get managed.

If you don't care at all about the result, sure.


You're going to far the the other way.

    > The Users) failed to understand that installing
    > duplicate incompatible software does not work
I still think that means package managers failed, if only because that's the perception - that package management solves more that it actually does.


Nope, It is partially right.

There already exist mechanics like soname to differentiate libraries from each other.

Package managers do not take this into account, and insist on there only being one version present pr package name.

That said, containers are shooting twee twee birds with AA guns.


> insist on there only being one version present pr package name

That's a partial truth, at best. It is common for multiple versions of popular libraries to be installed at the same time. The whole of Debian or Red Hat isn't necessarily compiled with the same libcpp or Boost, for example, and that's expected. Some software run on Python 2 and some on 3. The packager would need to make sure they don't conflict, but Linux handles different sonames just fine and as long as you separate module paths you'll be just fine.


No it is not. If you want to install two minor versions of glib side by side, it can be done on the soname level (not to say that it is a smart idea, because Gnome devs are crap at keeping API stable). But it can't be done on the package level without playing musical chairs with package naming to work around collisions.


This is primarily why Debian and derivatives use the so version in the package name. Fedora is designed on the assumption you'll only ever have one version of a library installed and everything is built against it, special cases then get made for compat packages as needed.


Sadly more and more dev time is spent with Fedora as the target, leading to massive myopia. This while Debian has to adopt more and more Fedora-isms because they do no longer have the man hours to go their own way.


Well, even though Debian supports this I rarely see it used to install multiple so's side-by-side. Fedora will let you do it, but the entire development process tries to avoid compat libraries where possible (you are supposed to send a message out to the mailing list if a package you maintain is getting a soname bump so others that depend on it can rebuild during the rawhide cycle, this stops being an issue once Beta hits since versions get locked down).


Totally agree, the post was written almost a year ago across 3 days while preparing a talk (see intro) and while still learning about Docker. Thanks for the feedback :)


I've been struggling with fully understanding containers. This article helps but it's a little too low level for me.

A quick question for HN'ers: If you've got a machine running say 4 docker instances, does it help resource usage if all instances are running the same Linux distro?

Or, since the kernel is the only thing shared between them does it even matter?


Site looks broken, because CSS is loaded over HTTP which is disabled if site is loaded over HTTPS.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: