
Containers, Not Virtual Machines, Are the Future Cloud - dave1010uk
http://www.linuxjournal.com/content/containers%E2%80%94not-virtual-machines%E2%80%94are-future-cloud
======
lsc
Having sold both containers and VMs to untrusted users? I disagree. Now, my
container experience is old, and containerization technology is improving all
the time, so over time, what I say here becomes less true, but that's the
thing; I'm willing to pay a fairly high cost in (cheap) ram for relatively
small gains in compartmentalization. (which is to say, I'm willing to spend a
fair bit of money to make my pager go off less.)

See, the expensive part? it's not the ram, it's sysadmin time. That's why VPSs
are killing shared hosting; it's actually easier to manage a bunch of VPSs
than a bunch of people with shell accounts on the same box. Same goes for all
the containerization technologies I've tried; it's easier to just throw
xen/kvm on there and pay for the extra ram than to manage one user stepping on
resources required by another user.

For me? that was the beautiful part of switching from jails to Xen; suddenly,
I could just ignore customers who tried to burn all the cpu and ram they
could. Now, if I can figure out how to do the same with disk? I'll be a very
happy man.

That, at least, is where I'm putting my money; in more ram.

~~~
davidstrauss
> Now, if I can figure out how to do the same with disk?

Hi, I'm the author of the article.

You should check out cgroups. You can isolate disk access between services and
containers using block I/O shares with cgroups.

The way shares work in cgroups is that there's unlimited access until the
resource comes under contention. Then, the kernel schedules activity based on
which cgroups have shares remaining. This allows scheduling to be "fair" in
the sense that no cgroup always wins; access just becomes proportional.

We use block I/O shares (and other cgroup methods) for isolating containerized
MySQL instances on Pantheon, the platform I work on as CTO.

~~~
boryas
Doesn't the blkio cgroup not know anything about buffered writes? Or do the
shares work differently and that limitation is only for accounting?

~~~
davidstrauss
Yes: "All the buffered writes are still system wide and not per group. Hence
we will not see service differentiation between buffered writes between
groups." [1]

[1] [https://www.kernel.org/doc/Documentation/cgroups/blkio-
contr...](https://www.kernel.org/doc/Documentation/cgroups/blkio-
controller.txt)

~~~
lsc
huh. interesting. this was the primary reason jails sucked so hard for me. The
heavy users would push the light users pagecache out of ram.

However, using xen or kvm solves that problem, by giving each guest their own
ram that nobody else can fuck with. I wonder if then using cgroups for the
remaining writes would work well? seems like the 'shares' approach might work
better than the 'priority' approach that ionice uses.

~~~
davidstrauss
> However, using xen or kvm solves that problem, by giving each guest their
> own ram that nobody else can fuck with.

It gets hard to say whether a shared page cache is a _good thing_ or not, even
though it may be unfair. I say this because I/O bandwidth getting exhausted is
a huge issue.

For example, using cgroups to limit memory allocations for groups or processes
seemed like a great way to fairly distribute memory. But, doing so forced such
cgroups into swapping when they tried to exceed their limits, even when there
was available memory on the host system. The swapping was so bad in terms of
saturating disk I/O that we had two choices to maintain quality of service:
(1) set the hard limit and OOM kill (or equivalent) within the cgroup when it
gets exceeded or (2) not treat it as a hard limit and monitor usage
separately. We chose the latter.

So, I honestly wonder, is it better to enforce separate page caches in the
cause of fairness, even if it results in less efficient disk I/O? Or is it
better to have a unified page cache and dedicate system-wide resources to
increasing effective disk I/O bandwidth? (Do we focus on slicing the pie more
fairly or increasing the size the pie while cutting sloppily?)

~~~
lsc
>Do we focus on slicing the pie more fairly or increasing the size the pie
while cutting sloppily?

In my case, (well, really in the flat-rate multi-tenant case in general)
fairness is what I care about, more than overall efficiency. From a business
perspective, it's okay to not be all things to all people, if you are a cheap-
ish flat-rate service, it's okay if your heavy users don't get as much
performance as they'd like, as long as the reasonable usage customers still
feel like they are getting what they are paying for.

My core customer, the hobbyist who wants something like a shell for IRC
idling, personal mail server, DNS for personal domains and a place to
experiment or run a development project... generally buys enough ram to cache
all the disk they normally access. I mean, 512MiB goes a long way for your
average unix sysadmin. So even if the disk is getting thrashed by heavy users,
light users? once their pagecache is warm, they have a fairly responsive
system for everyday sorts of things. Under Xen or something else that doesn't
share pagecache? once the pagecache is warm, it stays warm. They can login a
week later (assuming they aren't running a bunch of background stuff that's
reading/writing disk and churning pagecache) and their /etc/shadow is still in
pagecache, and the login is pretty fast.

Back when I was using FreeBSD jails? if the user hadn't logged in for a few
hours, the /etc/shadow in their jail had been flushed from cache, and they had
to read it from disk again. They were getting terrible service.

From a economic standpoint? in general, if you have a flat-rate service, your
light users are where the money is. They pay just as much as your heavy users
(as it is flat rate) and they use fewer resources. And really, I think it's
fair that if you are a light user and you are paying just as much as the heavy
user, you should promptly get your resources on the occasions that you do need
them. Heavy users on flat-rate services are going to have less than perfect
experiences, and this is fair too, I think, so long as expectations of service
level were set ahead of time. Heavy users use flat-rate services because it's
cheaper than pay per use services, usually... but the downside of that is that
they then only get as much resources as they can get without impacting the
light users.

Now, in a pay per use model? the opposite is true. In a pay per use model
(like, say, google app engine) the heavy users are the real customers. The
light users are sales prospects. So yeah; in that case, you focus on the heavy
users (and charge accordingly.)

Personally, I believe this is why shared hosting is seen as so far inferior to
platform as a service; shared hosting is usually billed flat-rate, so the
service provider really only has incentive to look after the light users; they
are better off if the heavy users go elsewhere. (and really, for five bucks a
month, what do you want?) - whereas platform as a service usually means you
get billed per use; in that case, it makes a lot of sense for the provider to
focus almost all it's energy on making things better for the heavy users. Of
course, they also charge those heavy users accordingly.

------
api
Where I work we use Parallels Virtuozzo for a private cloud. I can see why
more companies don't use it for public clouds-- you do run into a few weird
edge cases with containers vs. full VMs and these would be a tech support
issue for a large-scale hosted. But the density you achieve with it is at
least one order of magnitude better than KVM, possibly two, and the
performance is also better. There is zero I/O overhead since you are in fact
running bare-metal, so it's very good for high-performance computing (one of
our use cases).

Most of the edge cases you do encounter are due to poorly-written software.
The most common thing is server software that depends on Linux's default
overcommit behavior and malloc() never failing and then uses that to try to
badly implement a memory-sensitive cache in competition with the kernel.
Containers change overcommit behavior and malloc() CAN fail, so such software
will eat all the memory in a container and barf. But there are hacks to get
around this, and it's mostly an issue with POS rube goldberg machine web
stacks like Apache/mod_php.

I've been following LXC for a while and see that some people are now using it
in production. Once Linux has a main-line kernel container solution I think
you'll see it used quite a bit more. Eventually the developers of crappy
server code will be forced to fix their code to behave properly.

~~~
sedachv
> But there are hacks to get around this, and it's mostly an issue with POS
> rube goldberg machine web stacks like Apache/mod_php.

It's a common technique used by a lot of virtual machines and memory
management code to get a large contiguous region of address space to work with
(most JVMs work this way, SBCL does too). I'm not exactly sure why Xen can
handle overcommit fine and Virtuozzo can't, the latter does need a custom
kernel so in principle they have access to all the low-level virtual memory
stuff.

~~~
api
JVM's always worked fine for us in Virtuozzo. PHP is what we have issues with,
but we've found workarounds.

------
susi22
I'm surprised the article doesn't mention SmartOS. It's a superb OS which
let's you spin up zones and KVMs.

[http://smartos.blueprint.org/home/why-smartos-in-my-
lab](http://smartos.blueprint.org/home/why-smartos-in-my-lab)

[http://dtrace.org/blogs/brendan/2013/01/11/virtualization-
pe...](http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-
zones-kvm-xen/)

[http://www.cloudcomp.ch/2013/04/openstack-on-
smartos/](http://www.cloudcomp.ch/2013/04/openstack-on-smartos/)

~~~
wmf
I wouldn't expect much Solaris[1] coverage in Linux Journal.

[1] Illumos/SmartOS proponents seem to go out of their way to avoid saying the
S-word.

~~~
helper
Illumos has KVM support, Solaris does not. The distinction is made because
Solaris and Illumos have diverged into different operating systems.

------
csense
What's ironic is that the entire _point_ of an OS that supports multiple
processes is to provide them each with an illusion that they're totally in
control of their own machine.

This abstraction was kind of leaky -- different processes typically share
various namespaces, like filesystems, process tables and user accounts. As a
result, virtualizing an entire OS became a way to make a tight abstraction of
having an independent machine at your disposal.

The reason virtualization happened was because emulating the system hardware
was easier than changing every OS to adequately separate their namespaces. Now
that virtualization has the attention of OS developers, you have development
like LXC / cgroup stuff in Linux, bringing isolation into the OS, where it
should have existed all along.

AFAIK the mainframe world -- which, by the way, still exists -- has never
gotten out of the early-computing mindset of having an expensive machine which
must be shared between applications, so isolation and containerization has
always been a feature of mainframe OS's.

------
ebbv
Container virtualization certainly has its uses, and we will see hosting
products based on it; no question.

But to say that it is "the future" and virtual machines are on their way out
is a naive view.

There's plenty of reasons why someone might want a virtualized OS;
customization.

What if my application is built to run on Debian but yours is built to run on
CentOS? With container based virtualization we have to be put on different
nodes. With VMs, we can share a node.

This is why VMs exist. This is why they will continue to exist.

Disclaimer; I work in the hosting industry.

~~~
vidarh
> With container based virtualization we have to be put on different nodes.

That's only true when the OS's in question require different kernels. For
OpenVz/Virtuozzo for example their RHEL based kernel has long been the
preferred one for a lot of people for non-Redhat Linux versions too.

I agree with you there are still lots of reasons people will want VM's though.

We've picked containers at work, but only because while we host customer
sites, we run and manage all of them, and so have a lot of flexibility a
typical hosting provider doesn't have.

~~~
dedward
I don't see VMs and Containers at odds.

Virtual Machines let us abstract away our hardware resources. They also made
deploying new instances much easier than it was previously, so we started
using more instances than we would ahve without it, just to provide
configuration and resource separation.

The first part is still necessary - and we still need multiple instances to
take advantage of multiple VM hosts when it comes to scaling and redundancy.
The second part is what containers address.

I have a feeling we will end up mixing and matching the two as appropriate.

~~~
vidarh
I don't get what you're saying here. You don't need VM's to abstract away
resources - containers do that just fine.

Containers are - depending somewhat on the container technology used - from a
user perspective just like a VM that happens to be restricted to running the
same kernel as the host and use less resources. Some container systems may
leak more in terms of abstractions and/or might not be designed to be as
secure, and so there's certainly scenarios where certain types of containers
are not appropriate, but that's an issue of specific implementations.

In the systems I deploy, containers takes care of all of what you describe all
by themselves.

------
amirmc
Folks interested in this should also take note of the projects that target Xen
[1,2 are ones I'm aware of]. Startup times that can be tuned to be on the
order of 10s of milliseconds (see comment in Mirage paper).

[1] Mirage - [http://openmirage.org](http://openmirage.org) (paper at
[http://anil.recoil.org/papers/2013-asplos-
mirage.pdf](http://anil.recoil.org/papers/2013-asplos-mirage.pdf))

[2] Zerg - [http://zerg.erlangonxen.org](http://zerg.erlangonxen.org)

~~~
contingencies
I believe your two examples are actually two entirely different paradigms. The
first one is based on the _exokernel_ notion. A bit muddy, but it seems that
nobody has ever used a pure exokernel successfully at scale in industry,
presumably because it seems to call for the kernel exposing hardware directly
to userland code whilst simultaneously managing multiple tenants. Basically,
the idea seems like a sort of early and poorly articulated call to modern OS
level resource management (eg. _cgroups_ ).

Your example two, the zerg system, seems very similar to LXC. Other than the
exoticism of _Erlang_ , their only big feature vs. normal paravirtualization
seems to be startup time, which they see as a groundbreaking change. Well, I
regularly start whole systems in LXC in fractions of a second. See
[https://github.com/globalcitizen/lxc-
gentoo](https://github.com/globalcitizen/lxc-gentoo)

In summary, these two projects are _not really any different_ to the topic
under discussion, ie. LXC (~ _cgroups /namespaces_). Well, except that unlike
cgroups/namespaces they're not necessarily able to run everywhere Linux runs!

~~~
vidarh
The Mirage link is confusing matters. Mirage appears to be an exo/micro kernel
for running in a para-virtualized system.

That's very different from an exo-kernel running on bare hardware. With
Mirage, what the apps run on seems to be "para-virtualized bare metal", so
rather than using the POSIX syscall interface as its ABI, it's using the
paravirtualized "hardware" as its ABI.

That makes Mirage very similar to Zerg - it's the same concept, only OCaml vs.
Erlang.

As for LXC, I agree LXC offers a lot of this, especially since you don't need
to boot full Linux environments - you can spawn individual applications and
fine-tune what isolation you need. Or you can even build applications that
explicitly use cgroups themselves to isolate sub-processes. So their "spawn
one Xen Dom-U per request" demo could similarly have been a "spawn one new
process isolated in its own cgroups".

But these systems _are_ appealing to me in that they bring VM level isolation
with startup performance and resource utilization more in line with container
systems.

That's particularly appealing if we start seeing cloud systems that can handle
it. Imagine being able to create new EC2 instances at sub-second times...

------
j45
Three thoughts:

1) I'm not sure if this article has accounted for the one massive achilles
heel of paravirtualization (containers):

Corrupting the core OS by way of upgrade or patches, can take all containers
all out.

2) I have used VMware since the early betas in 98. The motivation was simple:

Replacing servers sucks.

The day you buy or lease a server, in essence, you purchase it's death and
dealing with a hasty or painful migration.

3) From my 7th or so generation of server migrations in doing complex hosting
since 95, one thing is clear:

The single most valuable thing is a hard separation between the bare metal and
the OS.

So much so that you can move the VMs at will anywhere, because, you just
bought the death of your server the day it's brand new. I ran a CentOS box
with a BSD VM as a firewall, with Windows and Linux VM's behind it to serve
one project because each did their unique and highly valuable part.

Ultimately, for me: Paravirtualization not only uses the host os, but shares
it between multiple configurations. This is in my mind still introduces risks
that should be minimized, if not avoided.

I'm sure PV is improving all the time. It's simply newer to the game than what
VMware started. If VMWare isn't for you, suites like ProxMox are serious
options to consider. Hosting isn't somewhere to have preferences and opinions.
If I'm wrong, I'll change at any moment.

Issues with PV might be decreasing, but I've been ultimately burnt by PV every
time I used it in production for a prolonged time.

An example of where sysadmin time cost way more than a little bit of more ram
for Virtualization: Parallels Virtuozzo, corrupted itself so badly during an
update to the core OS that all the containers on the server were rendered
inoperable. Parallels support had one of their high level techs login from
Russia to manually install corrupted DLL files so the server would at least
boot, and we could get the heck off it. At the end of the repair we were told
the VM was no longer upgradeable and all updates would have to be done
manually.

The customer had taken on PV at the behest of the hosting company who insisted
it was amazing despite my concerns. Said hosting company also now has switched
over entirely to VMware.

The unforgivable kicker of PV for my customer: tons of customization and
creating re-usable containers were all lost when leaving, because moving to
something like VMware meant having to set them up from scratch again.

~~~
alinspired
Your comment must be entirely about Containers for Windows (by Parallels),
which is not what the article and most of the discussions are about.
Containers for Linux on the contrary are not prone to host "corruption"

~~~
j45
Definitely not just speaking about Windows, it was one example.

The main thing that is a non-neogotiable for my uses is one container should
not be able to affect another.

In this way, Linux is less prone to "corruption" as you may see it, but given
your implied familiarity with PV, I think you know what I'm referring to with
there being plenty of gotchas floating about, especially in regards to sharing
or using multiple kernels.

Anyone who's run multiple systems originating from different sources and being
moved between hosts over time, know the reality of running different kernels.

Linux PV is not 100% issue and risk free as you implied, it certainly gives
you a nice boost for some trade-offs. They are just trade offs I am not
prepared or interested in handling for a small amount of ram.

------
amscanne
Containers have been mature for longer than the article implies (freebsd
jails, Solaris zones).

Technology just doesn't work this way. Things don't simpler over time (and you
don't shed support for different kernels, OSes, better isolation, etc.)

Not to mention the fact that virtualization has gotten dramatically better
since EC2 was launched, since that seems to be the benchmark here (they are
still running Xen 3.X first released in 2005). Modern virt
hardware+hypervisors is blazing fast and (I believe) will eventually outpace
bare metal for specific workloads on big boxes due to the improved isolation
for many performance critical operations (I.e. separate TLBs for flushes,
smarter cache invalidation). Even if they don't reach parity, the gap
certainty wouldn't justify giving up all the benefits of full virtualization.

~~~
davidstrauss
Hi, I'm the author of the article.

> Containers have been mature for longer than the article implies (freebsd
> jails, Solaris zones).

Mature containers have been around since the the days of mainframes. The
recent (say, last decade) disparity has been lack of mature containers on the
platform of choice, which is Linux for most cloud-based projects. I don't know
many people who would switch their projects to run on FreeBSD or Solaris just
to use containers instead of VMs.

Also, the container types you mention don't support independent, fine-grained
choices of isolate-or-not for networking, the file system, user IDs, process
IDs, and scheduling of system resources.

> Modern virt hardware+hypervisors is blazing fast and (I believe) will
> eventually outpace bare metal for specific workloads on big boxes due to the
> improved isolation for many performance critical operations (I.e. separate
> TLBs for flushes, smarter cache invalidation).

It's unclear why you think such optimization would appear in hypervisors but
not kernels, especially given how much more insight a kernel has into the
workloads it directly runs compared to a hypervisor running a kernel running
workloads.

Even if hypervisors achieve CPU performance parity for running systems, you
still haven't addressed the memory and storage overhead (in terms of resident
footprint) of running full OS images instead of containers.

> Even if they don't reach parity, the gap certainty wouldn't justify giving
> up all the benefits of full virtualization.

You're portraying virtualization as having benefits versus containerization
while ignoring that containerization also has many benefits over
virtualization, especially w.r.t. cost efficiency.

~~~
aliguori
Hi David,

> Mature containers have been around since the the days of mainframes.

Citation needed.

I'd go as far as to say that there is no such thing as a mature container
technology. The fundamental problem of containers is that mainstreams kernels
are not designed to be multi-tenant. They support multiple users quite well
but when you try to have multiple root users, badness ensues.

Even the best container systems today for Linux still have fundamental gaps.

> Even if hypervisors achieve CPU performance parity for running systems, you
> still haven't addressed the memory and storage overhead (in terms of
> resident footprint) of running full OS images instead of containers.

Re: memory, same page merging can eliminate a lot of that overhead when
running mostly homogenous workloads. Re: storage using CoW not only makes
provisioning instant (not 5-10 minutes) but also addresses the storage
concern.

If you mean the overhead of running two kernels, well, actually using
namespace has a fair amount of practical overhead too.

Of course, benchmarks speak louder than words here. No one has published a
container based result for SPECvirt because I'm quite sure it's not faster
than virtualization.

~~~
justincormack
Linux containers are pretty immature, they only started being implemented
recently; mainframes, Solaris, FreeBSD are much older. The Linux
implementation is interesting though in terms of granularity.

~~~
makomk
If you ignore stuff like OpenVZ, maybe, but there's probably a reason why that
was mostly displaced by full virtualization in the first place.

------
kleiba
A short (5min) talk by Solomon Hykes of dotCloud about how their "docker" tool
makes use of containers:

[https://www.youtube.com/watch?v=9xciauwbsuo](https://www.youtube.com/watch?v=9xciauwbsuo)

------
aetimmes
Solaris/illumos have been running containers for almost ten years now. They're
just a layer of abstraction on the other side of the OS layer - a different
tool, not necessarily a better one.

------
j_s
Is it possible to have the same degree of security with a container as is
provided by a virtual machine?

~~~
susi22
Yes, the security is there. Though, resource fairness is not (since you have
basically bare metal access). SmartOS is trying to fix this:

[http://wiki.smartos.org/display/DOC/Tuning+the+IO+Throttle](http://wiki.smartos.org/display/DOC/Tuning+the+IO+Throttle)

[http://dtrace.org/blogs/brendan/2012/12/19/the-use-method-
sm...](http://dtrace.org/blogs/brendan/2012/12/19/the-use-method-smartos-
performance-checklist/)

~~~
chubot
That's not true -- Linux has had local root exploits pretty much at ALL times.
If you're in a container, you can break out with one of these exploits. You'll
be root and have access to all other containers on the machine.

In a VM, you don't share a kernel with other users, so the local root exploit
doesn't buy you anything.

I prefer containers and think VMs are a huge hack for most problems. But as it
stands now, the number of local root exploits vs say remote exploits on Linux
is probably greater by a factor of 100 or so, so you're taking much more risk.

Security is one of the problems with Linux containers that still needs to be
resolved. (see [http://mattoncloud.org/2012/07/16/are-lxc-containers-
enough/](http://mattoncloud.org/2012/07/16/are-lxc-containers-enough/), Google
"lxc security")

~~~
davidstrauss
> That's not true -- Linux has had local root exploits pretty much at ALL
> times. If you're in a container, you can break out with one of these
> exploits. You'll be root and have access to all other containers on the
> machine.

Isolation strategies involving syscall filtering and mandatory access control
(MAC) tools like selinux dramatically reduce the attack surface. For example,
I know several recent root exploits were not possible to run on Fedora with
selinux enabled.

People also discover hypervisor exploits at a reliable clip, too.
Virtualization isn't a panacea of security isolation.

~~~
chubot
Well that is what I hoped the article would cover in more detail :) That is,
what actually works out of the box with LXC and what doesn't. As of 6-12
months ago I don't think the LXC on any distro was particularly secure.

AFAIK you have to add SELinux yourself, and all the PaaS providers are
probably doing something custom (or not). The newer thing seems to be seccomp
filters ([http://lwn.net/Articles/494252/](http://lwn.net/Articles/494252/)),
which are motivated by ChromeOS. I would like to see a comparison; from what I
can tell seccomp is a lot simpler conceptually, although there are fewer user
space tools for it.

The article didn't actually say "LXC" but it seems to be what most PaaS
providers are using. When I tried LXC, while lighter than VMs, it also seemed
too heavy, because you end up with 5 or 6 processes (starting with an "init")
for every process you want to run. Using just raw namespaces and cgroups seems
to be feasible although again there are few tools to do that. Apparently
systemd has support, although I don't use any distros with it.

Docker is also a new thing, but I think it is just on top of LXC, so it is
only as secure as your distro's LXC is.

~~~
susi22
Thanks for you answers so far. Q: Why is it that Linux is >10 years behind
with Containers/Zones?

From what I understand is that ZFS was so slowly ported to Linux due to the
license problems. But why were Zones not quickly adapted?

~~~
chubot
What's interesting to me is that it seems history is repeating itself. As
mentioned, OpenVZ and Linux VServer existed in the 90's, but they never made
it into the mainline. So really this is the second try for Linux.

Basically I think it is a consequence of containers/sandboxing being a very
"commercial" technology, even though they are open source. The main users are
hosting providers, and there's a significant amount of money in that business.

In the 90's there was a hosting "land rush" with all of these companies like
1and1 and dreamhost selling shared hosting on Linux. They were the ones that
developed Linux VServer and OpenVZ apparently, and I think the pace was too
great to get it into the mainline. Interested in any first-hand knowledge
people have.

And in the 2010's there is a PaaS "land rush", with all of these companies
building on AWS and other IaaS, while needing containerization like LXC. The
OP's article is calling for increased support in distros -- I think the same
lack of time for cooperation is happening. Heroku, Cloud Foundry, dot Cloud,
ActiveState, etc. are all using the same thing essentially, but there's a big
land grab, so they are all maintaining proprietary and complex user space
configuration.

The kernel features like the various namespaces are just about finished
trickling in I think; that doesn't mean they're secure though.

------
jpollock
Sun/Oracle have been championing containers (Zones) for a long time now. They
have their places. However, businesses still want full virtualised systems
(LDOMs).

Businesses like treating applications like appliances. They want to start up a
system and not touch it again. Migrating an application to a new OS release
costs them a lot of money. A zone forces an upgrade - everything on the
machine needs to be rolled forward to the new release.

Since businesses typically have a patchwork of software in various levels of
support, continuous upgrades can be an expensive (and possibly impossible)
task. For example, the application vendor may require a license purchase in
order to get support for the new OS release. The vendor may even have gone out
of business, and the replacement is still 6-12 months away.

In a full virtualised platform, you're not forced to continually pay that
maintenance cost. You are free to store it as technical debt and pay it in a
more managed fashion.

~~~
wmf
But both Solaris and Linux let you run old userspace in a container on a newer
kernel.

~~~
jpollock
Actually, there are limits, as I was informed in the Solaris 11 launch. I
didn't pay too much attention because it didn't affect me, but the guys it did
affect weren't that impressed!

------
ewams
Good article that discusses something I have not touched in years, simply
because of a lot of the drawbacks he talks about. Also is a wonderful sales
pitch, someone has been practicing.

In the virtualization world we argue often about how to do things “right” and
what the “industry standards” are. I started a series a while back to help
define and explain virtualization and various related topics. Specifically I
even talk about this subject (
[http://ewams.net/?date=2012/01/04&view=The_4_Types_of_Virtua...](http://ewams.net/?date=2012/01/04&view=The_4_Types_of_Virtualization#b)
).

The main problem with saying VM’s are “no longer the future” is they are
mainstream and have been for a while, so they probably are not not cool enough
for hackers but they are not going anywhere fast. Containers have a place and
full blown virtual machines have a place. The trick is finding which one will
benefit your organization and for which purpose. Most likely both can work
depending on the requirements.

The other topic I notice that is usually overlooked when talking about
virtualization is the actual OS. Here in this article it discusses mostly Red
Hat based distro’s. Some organizations strictly use OSX, others Ubuntu, others
RHEL, while others use some sort of mix. Each is different and even for
closely related like Fedora and RHEL, what works on one may not work on
another without hacking it or some other black magic. (Or it may work just
fine, you know how it goes). The other point of the OS talk is where is
Windows? We can pretend to be ostriches but Windows server is out there and in
large numbers. Lots of people make lots of money writing applications that
work strictly on Windows (same for *nix). Just do not forget about it.

If we move forward with containers, and since the topic has come up at least
twice in the last 6 months, let’s start to see recommendations on how to run
containers. How to pick workloads. How to optimize applications and kernels to
run them. What kind of hardware to run them on. How to code for them. Etc.

------
alinspired
Containers are efficient(and most likely the future) for PaaS/SaaS type of
applications, where individual Instance (VM/Container) is not treated as a
"computer" or Infrastructure.

Another benefit for PaaS/SaaS is that you don't necessarily sell RAM or DISK,
you sell an application/service and thus can take FULL advantage of
containers, like: \- more efficient resources sharing(on host) \- dynamic
resources re-balancing \- quicker start/stop/etc

Containers can be used to offer IaaS, but as was already mentioned in this
thread, a few edge cases (custom kernel or kernel module, any filesystems
support, any firewall support) prevent it due to support/operations overhead.

~~~
gundy
They are not really efficient. All they do is add an extra layer of security
so you can encapsulate that specific process in a chroot with its own
networking stack. Other than that the CPU is still fairshared and the memory
still has the same limits as if you ran it without the container. It adds
nothing but security.

~~~
wmf
In the cloud, KVM/Xen overhead is considered the baseline so containers are
definitely more efficient by comparison.

------
cornet
Containers are zones are great if you're building your own platform that you
control.

However I'm not sure building a multi-tenancy hosting environment based on
containers or zones is the best idea.

Suppose the tun driver has a bug in it which can cause a kernel panic under
some circumstances and that someone running OpenVPN in a container or zone
hits this condition. You can now wave bye bye to all the zones running on that
host.

[https://github.com/joyent/illumos-
extra/commit/9412039a18f2f...](https://github.com/joyent/illumos-
extra/commit/9412039a18f2f52b24b18ad8c8642a55d3b50d93)

------
gnel
I currently have a OpenVZ vps on the internet to host my projects and I chose
it for its better performance in comparison to a real VM (xen,kvm,etc) but my
what I'm still to find it's a provider that offers OpenVZ containers (or
virtuozzo containers) that can scale programatically, like the VM's on Amazon
EC2.

I think it should be possible, that I could program if my container cpu load
is over 70% for more than 40 seconds to launch a new copy of my container in
the same or other machine..

This lack of a service to scale containers it's the first thing that needs to
be done to compete with common VM clouds.

------
grogers
Maybe, but the linux kernel has had its share of local privilege escalation
bugs. If I were a cloud provider (or user) I'd be hesitant to trust containers
for hard isolation.

------
fridder
Also see FreeBSD Jails:

~~~
davidstrauss
Hi, I'm the author of the article.

FreeBSD Jails lack the same fine-grained isolation choices versus the base
system that the Linux kernel exposes through namespaces and cgroups.

That's not to say that Jails don't capture most of the value I argue
containers have, but they're a different generation of design.

Also, I wrote the article for the Linux Journal, which obviously affects the
solutions explored.

~~~
oijaf888
What can cgroups and namespaces do that rctl and capsicum can't? I assume with
namespaces you can have multiple processes running with the same PID?

~~~
davidstrauss
RCTL can enforce specific limits, which is good if you either want to divide
resources such that there can't be (or is unlikely to be) contention.

cgroups offers hard limits for some things, like memory, but it mostly opts
for a model using "shares" that determine the fractional access to resources
versus other cgroups holding shares against the same resource.

For example, assume there's CPU contention. cgroup A has 10 CPU shares and
cgroup B has 90. Processes in cgroup A will get 90% of the CPU time, but it
will not starve cgroup A because cgroup A will still get 10%.

This shares-based model also has a major effect when there isn't contention.
Shares-based resources are burstable. Even cgroup A (with 10 CPU shares) can
use 100% of CPU if nothing else needs it.

This "burstable" nature can be good or bad. It's good in the sense that most
users will probably get more CPU than their shares guarantee most of the time.
It's bad because users can start expecting more than their shares guarantee
and get a nasty surprise when resources get under contention.

It's time to drop some analogies.

cgroups are very much like a highway with an HOV lane (or more): anyone can go
very fast when there's no contention. But, during rush hour, lanes get
distributed as "shares" of the road to the HOV and non-HOV groups. Neither the
HOV nor the non-HOV drivers get starved for road access (though responsiveness
may not be equivalent, by design).

Traditional "nice" is like emergency vehicle traffic. An ambulance every now
and then works fine as "-20 nice" traffic. But, if you filled the road with
ambulances, it would starve normal traffic of roadway access.

RCTL is sort of like a person riding reserved right-of-way public transit.
From the time the person hops on the train at point A to when they get off at
point B, it will be the same duration any time of day. They don't get to go
faster during low-traffic times, but they also don't have to worry about a
significantly worse experience during rush hour.

Capsicum seems focused on intra-application isolation; I'm not sure how to
compare it to other OS-level containers.

------
AndrewGaspar
It's pretty bad they don't even mention Windows Azure - their Web Sites
service is already using containers, though only on the free tier for right
now.

~~~
davidstrauss
I'm the author of the article. It doesn't mention Windows Azure because I
wrote it for the Linux Journal, and a full discussion of containerization
outside of Linux wasn't possible in the allotted article space.

------
thelarry
Linux containers have been awesome for helping me sandbox user generate code.
Has anyone actually had experience with using Mesos in the real world?

~~~
gundy
That's all that they are good for adding security that would otherwise not
exist. Outside of that under a PaaS, I don't see what they add.

------
CptCodeMonkey
Just got a client off a container based provider and it wasn't too great an
experience ( dramatically slow networking, unreliable file I/O ). Granted I
could, and can continue to, say the same thing about virtualization but
virtualization is here and reliable enough that it seems like a hard sell to
want to try something that is even less performant.

------
bmullan
One future for LXC is with ARM based servers such as Calxeda. Approx 800 core
with gigabit Ethernet fabric & I/O. And sata built-in for each core. About
size of a Cisco catalyst 5000 box but less than 1000 watts.

Runs ubuntu or Red Hat and uses LXC for virtualization. Last I saw it was
<$100k

------
Nux
Containers certainly have their uses, but what I think will happen is hardware
will become better and better at virtualisation just as the (K)VM software
will become better.

LXC is good for stuff like OpenShift (PAAS), but for "VPS" offerings the VM
will continue to reign supreme.

~~~
zb
Actually, OpenShift doesn't use LXC (it uses SELinux to separate gears)
because containers are not secure in the way that VMs are.

~~~
lil_cain
In the near future, it'll use kernel namespacing (like LXC) _and_ SELinux.

------
mark_l_watson
A question: which hosting companies sell container hosting services for
specific platforms like Sintra+Postgres, Java+Hybernate+Mysql, etc?

If I understand this correctly, you would want to be able to rent container
slices with the specific stack you are going to use, already set up.

~~~
wmf
If you want a container with a specific stack that's usually called PaaS.
Personally I hope that tools like Docker allow some companies to focus on
hosting containers (hopefully not inside Xen VMs...) and others to focus on
building great stacks and container templates.

------
ausjke
LXC will never replace XEN/KVM/whatever. LXC has its places, but saying it is
the future for all VMs is just naive. Since it's from linuxjournal I want to
check who dares to make such bold statements, then realize this is probably
the real purpose of that article, a twisted ads for its Drupal business? In
short, the article is pointless and shallow, and you can't stop checking who
is making such claims, and then you find yet another drupal hosting site, this
sucks.

------
davidgerard
This is totally the case. Back before it got Oracled, Solaris 10 zones were
one of the finest things ever - and completely controllable in resource usage,
with a few percent CPU overhead at most (for comparison, running ZFS takes
more). Zones for Linux would win, comprehensively.

------
toolslive
IMNSHO: the future should be both VM-less and container-less.

meanwhile, containers are more efficient.

~~~
davidstrauss
How would you suggest we efficiently partition and sell computing resources,
then? It's not cost-effective or even power-efficient to run separate hardware
for each scale of computing.

For example, many projects only need a container or VM with 512MB of memory
and one CPU core.

~~~
VLM
Can't speak for the original poster but my guess is the obvious drop in
power/cost required for computing.

"Just wanna email, word process, and some web and some gaming" has relatively
recently gone from a kilobuck P4 couple hundred watt space heater to a
hundredbuck tablet that needs charging every couple days.

Unless you all can find a way to make silicon more expensive or dramatically
increase processing demand, all these elaborate schemes to dole out processor
are doomed to something like the per minute long distance business model where
eventually more money was being spent on detailed billing than on actually
providing the service... When the TCO of an elaborate virtualization system
and its provisioning and billing and firewalling system exceeds the TCO of
just spinning up a ubiquitous cheap low power system on a chip...

------
gundy
In other news OPs startup not the future of PaaS but OpenShift, Redhat's open
source alternative is.

