In theory, though, containers are simply an optimization of virtual machines, and are best understood that way. If you can use virtual machines to solve a problem--and your virtual machines are all generally based on a recent Linux--then you can achieve the same thing, using far fewer resources, by using containers.
Sharing pagecache means that one user thrashing the disk can make the system unusable for everyone, well, to the extent that no system without pagecache is usable.
Xen and KVM mitigate this by giving you your own ram and thus your own pagecache.
Personally? I think this makes a way bigger difference than any cgroup limiting of iops. Pagecache is not free ram, and should not be treated as such.
I think containers are one of those tradeoffs; they are more efficient when it comes to hardware, but they are less efficient when it comes to sysadmin time tracking down users who are using more resources than they ought.
On the other hand, some find managing multiple containers easier than managing multiple complete virtual servers.
I still think right now the best way forward is:
- Hosting provider: Xen/etc
- Internal clusters: containers
Containers merely bring commodity computing closer to HPC operational techniques. Containers are not useful for you at prgmr. Using every drop of RAM on the machine for page cache is a positive for my use case, not a negative.
Virt overhead makes no sense in internal clusters, in my opinion. I've worked in an all virt environment and an all container environment. The latter is way better, and these are scales where Puppet burning a core for several seconds across the fleet has a demonstrable impact on Opex.
Oh yeah. More fair is almost always preferable to more efficient when you are in a multi-user situation, and certainly, containers are more efficient.
I... have always had a problem, though, with this idea that you should virtualize clustered production applications; virtualization takes a big server and cuts it up into little servers (at a cost) - whereas clustering takes a bunch of little servers and combines them into a big server, again at a cost. It seems to me like once you are out of testing and actually need a full server worth of capacity, you don't want to use virtualization anymore. Multi-tenancy is a feature I need, sure, but for the cluster syadmin? it's all downside.
containers/chroots do make a lot more sense than Xen or Kvm if you must virtualize your production clusters; I guess that's the idea behind docker; make a clean way to pass around directories full of binaries.
I mean, to be fair, making packages /is/ a pain in the ass, so I can understand wanting a clean way to pass around your production code without using packages.
Really, the more I think about it, the more I see containers as an easy way to roll back all the benefits and problems of using shared libraries without going through the incredible pain of statically linking everything. Distribute your "binary" along with all the libraries and the environment it requires. Hah. And actually? I can understand why you would want to do that; Before I was a service provider I was a cluster sysadmin, and I've put in my time resolving shared-library conflicts.
And as someone who uses virtualisation like this extensively: it is about separating functional concerns. I have images of our "standard" web frontends, and our "standard" database backends and their slaves. We can deploy one of those with a few commands to any of our VM hosts (currently using openvz, but looking at both LXC and KVM and combinations). I don't want them co-mingled because it complicates migrations as needs change.
E.g. some customers could easily fit on a single server when we start deploying our apps for them, but may a year down the line require a complicated multi-server setup. When they do, and it's all in containers, we can migrate the web container to another machine (or duplicate it and load balance), and database container somewhere else, and it's all generic tools. We don't need to start messing around and untangle dependencies on a single server.
Having them neatly isolated in small manageable chunks also means that we are free to change hardware specs, and move containers around depending on what is most cost-effective rather than having to make sure everything fits.
And there's the dependency issue. In my experience, I don't trust for a second that you will end up with a working setup if you ship packages and install them on a pre-existing OS install. It takes about 5 seconds from someone other than a very experienced sysadmin gets an ssh login to a server before it deviates from the servers it is supposed to be identical to, at which points all bets are off whether or not your app will run once deployed to your production servers at all. VM's and containers provide the opportunity to easily guarantee that the images you deploy are unchanged from what you tested on.
Once you are at scale, each node, doing one thing, should be able to fill a server,(at scale, you will need a lot more than one server for each service you've got, assuming the app has enough scale to support a team of sysadmins.) - the exception I can think of is 'lopsided' services that require, say, more cpu and less ram than your servers were bought with; in which case, yeah, if you can find another service that is lopsided in another way, you are in good shape.
Until then, sure, you need less than a server, at which point some sort of virtualization makes sense. (And if you own and sysadmin all apps, containers could make a lot of sense, as they are rather lighter weight.) And hey, sometimes even if just one app doesn't have the scale to support a team of sysadmins, the whole lot of them do.
If you are saying that virtualization and containers make a lot of sense for small-scale stuff, or any use case where you need servers that are smaller than the physical servers you buy, we are in violent agreement.
>E.g. some customers could easily fit on a single server when we start deploying our apps for them, but may a year down the line require a complicated multi-server setup. When they do, and it's all in containers, we can migrate the web container to another machine (or duplicate it and load balance), and database container somewhere else, and it's all generic tools. We don't need to start messing around and untangle dependencies on a single server.
This makes it sound like you are as much in the 'service provider' realm as I am. That is an interesting way to think of the cluster administrator role; thinking of the different development groups you are running code for as customers, helping them scale up and down. I actually don't have as much experience with that style of cluster administration. At all my cluster administration jobs, the products I was sysadmining were already many-server affairs by the time babysitting them became my job. In those cases, I (and/or the team) pretty much owned the application, with a little bit of help/support from the devs when we couldn't figure something out, which is different from the customer/provider setup.
>And there's the dependency issue. In my experience, I don't trust for a second that you will end up with a working setup if you ship packages and install them on a pre-existing OS install. It takes about 5 seconds from someone other than a very experienced sysadmin gets an ssh login to a server before it deviates from the servers it is supposed to be identical to, at which points all bets are off whether or not your app will run once deployed to your production servers at all. VM's and containers provide the opportunity to easily guarantee that the images you deploy are unchanged from what you tested on.
Sure, sure.... but generally speaking, if you are using whole-server images? you wipe and re-install the whole goddamn thing every time you get half a chance because of this. With PXE this isn't much more difficult than copying over a new tarball. Most large clusters have some software that let you burn-in and re-install thousands of servers by typing one or two lines.
Now, you can argue that copying a new chroot to a thousand hosts is generally easier, or at least faster to recover from if you screw it up than pxe-installing a thousand hosts, say, if you accidentally target the wrong servers.
>Having them neatly isolated in small manageable chunks also means that we are free to change hardware specs, and move containers around depending on what is most cost-effective rather than having to make sure everything fits.
Sure, different hardware needs different drivers, but I think the difficulty of dealing with that is overstated.
The hard part of having different hardware specs isn't the drivers, it's the different performance characteristics, and virtualization doesn't always help you with that. You've got some servers with 15K sas disks and others with 'intelipower' No amount of virtualization is going to make the two servers perform the same. But then, sometime Virtualization does help with that, for instance if the problem is ram to cpu (or ram to disk) ratio differences; if you have one app that requires a lot of ram and disk but little cpu, and another app that requires a lot of cpu and nothing else, if your new boxes with the best cpu happen to have a lot of ram, you could gain benefit from running them both on the same servers.
The author of the article explicitly says, "This isn't to say that they're going to replace virtual machines." and I agree - there's no point in pretending like they're solving all of the same problems. I guess you could say that containers are an optimization of some use cases of VMs.
Really, the only reason to use virtual machines is if you truly need different OS kernels.
LXC is mainly a way to manage multifunction servers in a clean way, and Docker is a nice tool to help you leverage LXC without having to learn all the details. The nicest thing about Docker is that a developer on Windows could set up a VirtualBox VM with Linux, run all the same Docker Scripts as the production server, and have a reasonable facsimile of production to do testing.
I still do not understand the use case of live migration though...
E.g. You may have multiple VMs running applications like sharepoint, exchange, zimbra, or whatever, all on a host. You want to perform some planned maintenance on the host, e.g. update the firmware, add memory, upgrade the hypervisor, etc. You place the host in maintenance mode and the VMs are migrated off the host onto another one within the cluster without having to resort to planned downtime.
Each container gets its own network namespace (along with other namespaces like hostname, pids, users, ipc, filesystem mounts). Anything not handled by one of the 6 namespaces is the same across all containers. That includes things like what kernel modules are loaded, the system clock, etc.
Because a user with root can manipulate the kernel in many ways, I wouldn't give root to an untrusted user and assume containers were enough to contain them. Certainly if they can load a custom kernel module it's game over, but I'd bet there's plenty of other ways to break out too.
To answer your time question: AFAIK there is no namespace for system time in Linux. If you don't want processes within a contaier to be able to set the system clock then don't launch them with the CAP_SYS_TIME capability.
openvz,lxc - containers
kvm,xen - virtual machines
A better analogy is LXC = Solaris Zones
KVM and XEN are hypervisors for virtualization.
VirtualBox is a non-hypervisor virtual machine.
edit: yes derefr, I agree. My point was also if there was a distribution or kernel specific Ruby or Ruby library bug, for example, then you might have to account for it in your code if you ran in containers (if you don't control the host on which your code is running), whereas you control the whole stack in the virtual machine.
I suppose what you mean is that containers aren't applicable in every instance that VMs are--like, say, providing VPS instances, where some of the instances might want to do funny fiddling things with their virtualized "hardware." This is true.
But for the situations where the applicability of VMs and containers intersect--using them to run Heroku-like isolated "app slugs", for example--then you can think of containers as lighter-weight VMs, and you won't really go wrong.
Now how do your users' service accounts share resources? Someone will need a web server running under one account and a job processor running under a different account (different accounts in order to allow for limited privileges). Are they restricted to communicating through named pipes (hope no one else binds the pipe)? What if they want to share some disk-based resources? Shove them all in a group and set someone's home directory to 770?
How do you handle it when one of your users needs a SQL server? Do they need to install this in their home directory, too? And lock down their install so that hopefully other users on the box can't call it?
What about port binding? Just luck of the draw, and everyone binds on what they want? Then they file a request for you to forward to the port they happened to grab? And they hope that they never lose the port due to a server reboot, or a service crash?
People could work in the environment you describe, but it would be miserable. And you (the server owner) would have to sink a massive amount of resources into maintenance. This isn't a shortcoming of the OS. It's a fundamental mismatch between the goal of sharing a server among many users and giving those users sufficient control to build what they need.
NixOS can handle situations like this. It enables per-user package management with deterministic, immutable installs. If multiple users install the same package, it only puts one copy on the system. Each user gets their own environment where they can customize the package without affecting the other users.
This sounds like a nice system. I'm just wondering how much of the core issue is really addressed.
ZFS also has real-time, block-level deduplication. This will dramatically cut down on the space usage when users install multiple different versions of the same package.
By combining all this, you can give every user their own nix store with its own set of packages and a quota as well as a minimum reservation of space. Deduplication will take care of all the wasted space from multiple users having the same/similar packages installed.
It would be trivial to do this with LXC too.
You can make process groups without making the container look like a VM ... BUT. That's hard and time consuming for little gain.
Closer to 6 decades, this is old hat for IBM mainframes.
For me the question is tho', why virtualize? And the answer I keep coming back to is that a) it is too hard to retrofit good manageability onto processes as the basic unit of applications and b) people are used to it, the idea of having 1 machine per app, when really it isn't cost effective to do that.
Desktop virtualization using the NX protocol and Linux containers is also really friendly. Our company uses old Pentium D boxes for desktops. The performance gain from using a thin client + lxc + lubuntu is insane! I can actually watch youtube videos and draw in OpenOffice Writer. (Also great for disaster recovery scenarios).
I use KVM/Qemu VMs at home and work(dev box), I have a dozen always running VMs. However, I started to read more and more about LXC containers, and still cannot grasp their importance... for example, I manage a few VPSs too, each has it's own purpose (web server, database), how could LXC help me?
Thank you in advance.
> to change the text of this page send 0.1 btc to 16JMNc3B5vCkuuPxNcNj388gmhP8UDKBuW
Well done, it didn't take more than 1 hour to someone find a loophole.