Linux Containers: Parallels, LXC, OpenVZ, Docker and More

cmhamill · on Dec 5, 2014

So, the question that jumps to mind reading this is:

At what point do we acknowledge that we're re-inventing Plan 9 poorly?

I don't mean to be glib (well, maybe a little), but all of this (with some exceptions — zones and jails, mainly) feels incredibly hacked-together.

Perhaps someone paying more attention could tell me if, say, the folks working on the Linux kernel are learning the lessons from Plan 9? kernfs seems promising, at a glance, but I haven't really looked into it.

api · on Dec 5, 2014

Outside of build and test systems and extreme legacy software support, virtualization itself is in the category of things that should not exist.

I should be able to put up a server and sell accounts on that server to the general public. People should be able to log in, install and run services, etc. The box should not get instantly pwned, and people should be able to use it without interfering with each other. All resources -- disk, memory, IP addresses, bandwidth, etc. -- should be able to be assigned to users and/or user groups and managed by the box's true super-user(s).

You can't do that because Linux/Unix and every other OS has a woefully incomplete and outdated model around things like security, user and group management, software installation and library organization, quotas, and privilege isolation.

The popularity of virtualization and containerization are an admission that OSes are broken. While they are "multi-user" in a 1970s computer lab sense, they are all fundamentally single-user OSes from a modern perspective. There are no true multi-tenant OSes on the market; virtualization is an ugly hack to make single-tenant OSes host multiple users.

Nevertheless I think it's a situation we're stuck with due to the massive legacy software investment we have in these platforms. Plan 9 is full of cool ideas but nobody uses it because nobody uses it, and nobody will use it because nobody already uses it.

The only way I can see this situation getting better is if someone were to put serious money behind a really well-engineered alternative. But I don't see that happening because there'd be no profit in it. OSes are now in the category of things everyone expects to be free, so there is no longer any incentive to invest in them.

skywhopper · on Dec 5, 2014

Nothing as complicated as an OS capable of running modern software will ever be "well-engineered". Unix/Linux systems are the culmination of over 40 years of work by thousands of people across decades on tens or hundreds of millions of lines of code. You can't replace all of that no matter how much money you pour into it. Most of the lessons about what's good and bad in the history of engineering Unix are baked in to the existing code, and no humans actually know any of the reasons why it works as well as it does, if any ever did. A new platform may fix some issues, but it will encounter new ones, and it will inevitably encounter many of the old ones as well.

These systems appeared hacked together because that's the only way real-world systems work. The only way to build a complex working system of any kind is to start with a simpler system and add to it. But systems have minds of their own that are beyond the control of any engineer contributing to the project, and anything big and effective enough to compete with Unix/Linux will have just as many problems, if not more.

api · on Dec 5, 2014

That is the standard conservative incrementalist position about systems. I think it's a denial of the efficacy of conceptual thought.

It often stems from an analogy to biological evolution, but evolution is a geological-timescale process that occurs over aeons. To use it as a guide to cultural, social, and engineering progress is the naturalistic fallacy not to mention a bit of a category error.

That being said -- I do consider the challenging of the "crappy old OS + virtualization" paradigm unlikely due to the lack of a strong financial incentive to do the work. The amount of work required is waaaaay beyond amateur open source hacker thresholds.

It's possible that this lack of a financial incentive betrays a lack of overall value incentive. Maybe containerization + virtualization, while ugly and ham-fisted, is "good enough" and a more elegant approach just wouldn't have enough "win" to it. A similar situation exists with languages like D, Go, and Rust vs C++. They're better, but they're probably not better enough to displace the incumbent. Peter Thiel's rule on competition (from the incredible book Zero to One) is that an upstart alternative usually has to be 10X better to "disrupt" an established market. I can't imagine a polished-up Plan9-ish OS being 10X better than Linux+Docker+KVM on important metrics. A new OS would have to be 10X as productive to program, 10X less time consuming to admin, 10X more efficient at the use of hardware, 10X more secure, or some combination thereof that amounts to a 10X win.

cgh · on Dec 5, 2014

Does Plan 9 meet your requirements for a "true multi-tenant OS"? If not, I'd be interested in hearing your ideas about how such a system might be implemented and how it would look to end users.

api · on Dec 5, 2014

I haven't looked deeply enough into Plan 9, but from what I have seen it's a step in the right direction.

A major weakness of all existing OSes in this area is the network subsystem. The outdated "privileged ports" restriction means that most services must run or at least by started by root, making the OS effectively single-tenant by default. Simply removing that legacy cruft would improve the situation a lot. There's also no way to assign IPs or network interfaces to users. Network interfaces should have uid/gid and permissions like files, and a user should have a default interface. (Sharing of course would be possible.) Firewall rules (e.g. iptables) should also be per-user as well as global.

Another major weakness is libraries. DLLs should be implemented using either per-user or even better with cryptographically secure content-addressable lookup of binary objects. This would allow the OS to cache symbols globally but in a secure way. The whole binary/DLL paradigm is outdated and needs a modern rethink.

Software installation is broken. There should be no such thing, except perhaps in the case of hardware or system drivers. The idea of "installing" software beyond just unpacking an archive needs to die, period. Apple's .app bundle system is pretty close to the right thing.

I'm not knowledgeable enough about Plan 9 to know if it addresses those two issues.

Containerization sort of accomplishes those things, but in a stupid ham-fisted way that involves a ton of duplication and resource wastage. Virtualization is even more wasteful and ham-fisted.

vezzy-fnord · on Dec 5, 2014

To the best of my knowledge, Plan 9 eschews dynamic linking entirely and does everything statically.

As for package management... the entire model of Plan 9 is such that most conventional things about Unix package management that we take for granted are simply irrelevant. You don't typically deal with things on the package level, but on the file server level - which is inherently versioned, archivable and introspectable through simple tools. You can do things like immediately swap in a library from the file system cache just to test a program to see if it runs on it, then replace it back, all trivially.

There have been some bolted on approaches to package management as of more recent [1] [2]. Conceptually, they're no more advanced than Slackware's shell script-based pkgtools, largely because they don't need to be.

A Plan 9-ish way to package management would probably involve something like mounting a networked file system to a local share and then maintaining a replica on it. Actual management could then be done through simple mkfiles. A layer on top of that which precompiles and simply union mounts the contents of an archive is certainly possible, too. It's still pretty similar to ports, though, but with less headaches.

Ironically, your ideal form of package management is pretty similar to Slax's use of union mounting compressed file system archives, or even Slackware's tarballs that hold directory contents which are unpacked to install. The Linux community doesn't seem to want that. Dynamic linking complicates things.

[1] http://man2.aiju.de/1/pkg [2] http://www.9atom.org/magic/man2html/1/contrib

a8da6b0c91d · on Dec 5, 2014

> All resources -- disk, memory, IP addresses, bandwidth, etc. -- should be able to be assigned to users and/or user groups and managed by the box's true super-user(s).

Can't you pretty much do that with Z/OS and maybe some other mainframe OSs?

Seems what people are shooting for with all this virtualization stuff is almost a redo of the mainframe ideas. Unix is for minicomputers.

api · on Dec 5, 2014

Sure, but those old mainframe OSes are utterly archaic in every other way.

vezzy-fnord · on Dec 5, 2014

Linux will never even be anything close to Plan 9 for the simple reason that the Research Unix model is incompatible with that of Plan 9. It's become fashionable among certain segments of Linux userspace developers to deny the fundamental grounding Linux has in Unix and try to go "beyond the antiquated Unix model"... into OS X, another Unix. There's no other way it could be, really.

That said, the Linux kernel developers have had some ideas trickle down. They use virtual file systems quite extensively for both internal and external interfaces. Without per-process namespaces, their take on procfs simply isn't the same, but it is an approximation. A proper mechanism for union mounts called overlayfs was merged recently, after a rather protracted ordeal.

The kernel has also had a 9P client called v9fs for a while now, but I'm not sure how much it's maintained these days.

On the other hand, there's nothing like an auth server similar to Factotum (which deprecates superuser) because it contradicts the Unix permission model, there is nothing like plumber(4) inter-application communication system (D-Bus is a completely different thing altogether), the Plan 9 approach to cross-compilation isn't present (though this is not specifically a Linux issue at all), it's still conventional to use POSIX shells instead of cleaner designs like rc(1) and so forth.

The Fossil file system and Venti archival file server are also pretty unique to Plan 9, though their general functions have been emulated in more complicated manners by the myriad of Linux file systems.

voidz · on Dec 5, 2014

I am not sure if '(Free) BSD Jails' should be in here, because it's not a Linux container, it's something else entirely. Especially when you look at the details. But having said that, this article does read as a nice overview, especially for those who are still undecided with various options of resource separation.

angersock · on Dec 5, 2014

What are the advantages/disadvantages of using a BSD jail instead of these other options?

nisa · on Dec 5, 2014

Jails are fine for the most part I think. A problem is resources... FreeBSD lacks something like cgroups - But I think besides disk io throttling every other feature of cgroups is already there in FreeBSD but implemented in a different way.

ademarre · on Dec 5, 2014

A side point observed while reading this article:

> Google + Containers

> Not (google+) but rather google using linux containers.

This is the kind of confusion that results when you name a product with a dangling binary grammatical operator. Marketers really ought to stop doing that kind of thing.