Sorry - but the phusion images are unnecessarily bloated. The existence of the t...

SrslyJosh · on Dec 23, 2015

> Containers are no vm's

A container is whatever you want it to be. Single process? Sure. Full OS? Sure. Somewhere in between? Sure.

Containers are not new technology, and they were not invented by Docker or Linux. An artificially-constrained view of what a container is (or should be) that's driven by one tool's marketing (Docker) isn't helpful.

koffiezet · on Dec 24, 2015

Sorry, but it's not only Docker using 'containers' that way. I'm no fan of systemd for various other reasons - but that is one thing it does correctly: use namespaces aka 'containers' to separate processes.

It simply makes no sense to add additional unnecessary overhead and complexity to something that is essentially very lightweight. If you want a full-blown OS - a VM is much better suited at that, and modern hypervisors come with a ton of bells and whistles to help you manage full-os environments.

curryst · on Dec 24, 2015

LXC is using containers in the same manner as VMs. There are still reasons to use a container over a VM. To name a big one, application density. There's a Canonical page about it I can dig up if you want that claims you can get ~14 times the OS density with LXC containers that you can with KVM VMs. That allows you to provide a high degree of separation while still allowing you to use more traditional tools to manage it.

Not everyone is of the caliber that tends to browse HN. Not everyone adapts to new technology as quickly as people around here tend to, especially if that new technology requires a huge upheaval in the way that things have been done for the last 10 or 15 years. Using containers the same way we do VMs provides a lot of the benefits of containers without requiring a drastic change from other departments.

bmullan · on Jan 4, 2016

Scalability of LXC vs a HW VM was written up by a Canonical engineer here:

https://insights.ubuntu.com/2015/06/11/how-many-containers-c...

I've had upto 512 LXC nested containers running quagga for bgp & osp to simulate "the internet". My machine is an i7 laptop and this used less than 8-10 gigs of ram to run.

fyi the github of "The Internet" setup was from the 2014 NSEC conference where they used it so the participants had a large internet routing simulation available to test security.

The github for "The Internet" simulation is here:

https://github.com/nsec/the-internet

"The Internet" creates 1 single LXC parent/master container and then 500+ Nested LXC containers each running quagga & setup for the simulation used.

_yy · on Dec 26, 2015

Containers also have a massive attack surface in comparison with VMs. Modern KVM has a comparable density to containers (except for memory).

I agree on the advantages on LXC though. Many hosting companies use it. Why fix it if it ain't broken?

curryst · on Dec 30, 2015

They're supposedly coming along quite nicely with the security of containers. Can you run docker containers in userspace? It's been a while since I did much with it, I know LXC can with a fair bit of customization. That would do a lot to help with security, and if you're following good containerization principles you should be able to set a really finnicky IDS that shuts down containers on even the slightest hint of a breach.

> Modern KVM has a comparable density to containers (except for memory)

It does, but the memory can make a big difference if you're running microservices. If I'm guesstimating I'm thinking there's probably about a 200MB difference in memory usage between a good container image and a VM. With microservices that can grow quite a bit. Let's say 4 microservices, needing at least 2 of each for redundancy, you're already looking at a difference of 1.6GB of memory. If you need to massively scale those that's .8GB of memory for every host you add, not including any efficiency gains from applications running on containers rather than VMs (which is going to be largely negligible unless we're talking a massive scale).

bmullan · on Jan 4, 2016

You can create either privileged or unprivileged LXC containers. Creating Unprivileged containers only requires a very simple configuration that takes 60 seconds to do.

Here's Stephane Graber's blog on it: https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-con...

Also, note that with LXD/LXC the "default" container is now unprivileged. Also with LXD/LXC the LXC command syntax is now simplified even more than it was with traditional LXC but with the added power of being able to orchestrate and manage LXC containers either remotely or locally.

https://linuxcontainers.org/lxd/getting-started-cli/

_yy · on Jan 4, 2016

> Can you run docker containers in userspace?

Yes, and it increases the attack surface even more in some scenarios. Now, an unprivileged user can create new namespaces and do all sorts of things which were previously limited to root.

With "clear containers" (very minimal KVM VMs), you get the overhead down to <20MB:

https://lwn.net/Articles/644675/

Also, RAM is cheap.

bmullan · on Jan 4, 2016

Today you can run Docker in LXC and you can run KVM in an LXC container.

LXC also supports Nested LXC.

The scheduled release of LXC 2.0 and LXD 1.0 sometime around mid to late January.

This will also include support for live migration/CRIU.

bmullan · on Jan 4, 2016

LXC (www.linuxcontainers.org) supports Apparmor, SElinux, Seccomp and what’s probably the only way of making a container actually safe LXC has supported user namespaces since the LXC 1.0 release in 2014.

kstenerud · on Dec 23, 2015

Yeah that's cool, but my main point is that images which make use of the stable debian package system and are actively maintained are a better approach than an image that makes use of more obscure technology that could be abandoned, or worse, maintaining your own container infrastructure.

> No well, written software won't spawn zombie-processes - sorry.

And yet it happens.

> The ssh server... Containers are no vm's, if you have to log in on a container running in production - you're doing something wrong

The SSH server is incredibly useful for diagnosing problems in production, so I for one applaud it (although it's not really necessary anymore with docker exec).

> Cron - again - same thing: run in a separate container and give access to the exact things your cronjob needs.

Or just run it in-container to keep your service clusters together.

> That is for me the essential thing about containers: separate everything.

It's a question of degree. Where you draw the line is almost always a personal, aesthetic choice.

krisdol · on Dec 23, 2015

>And yet it happens.

I can understand that argument. It's an edge case, and building a sane Dockerfile on top of Alpine that runs applications through S6 (or runit), which developers use for their applications is the way to go for me. This is what phusion baked in?

>The SSH server is incredibly useful [...] (although it's not really necessary anymore with docker exec).

It's an additional attack vector and, by your own admission, it's useless. docker exec has been baked into docker for over a year.

>Or just run [cron] in-container to keep your service clusters together.

Per-container cron sounds painful. Then you have to deal with keeping every container's system time in sync with the host (yes, they can deviate). Not only that, if you have a periodic cron job that runs an app to update some database value, scaling becomes bottlenecked and race conditions (and data races) can get introduced. You are prevented from running multiple instances of one application to alleviate load because the container has the side-effect of running some scheduled job. Cron should be separate.

One can also choose the degree to which they want to throw out good practices that prevent them from repeating others' mistakes.

ffk · on Dec 23, 2015

Have you ever seen a container's system time deviate from a host? This makes sense with boot2docker since it runs in a VM but I can't think of a reason this would happen in a container.

_yy · on Dec 26, 2015

Yes, time keeping is up to the host kernel. The time can't deviate in the container.

koffiezet · on Dec 24, 2015

>> No well, written software won't spawn zombie-processes - sorry. > And yet it happens.

Strange, I have been running software in docker for almost 2 years in production on 6 docker hosts running a ton of containers these days, and yes - a lot of this software spawns child-processes.

In all this time I have never seen zombie processes with one major execption: Phusion Passenger to run our Redmine instance. If you run this under supervisord as 'init' process - you indeed notice the init process cleaning up "zombie processes" at startup like this:

2015-12-24 01:00:32,273 CRIT reaped unknown pid 600) 2015-12-24 01:00:34,774 CRIT reaped unknown pid 594) 2015-12-24 01:00:35,802 CRIT reaped unknown pid 610)

So that case for me is the exception, and I do use an init process (supervisord) to run only apache with passenger. Note that using Apache with PHP or plain does not leak zombie processes.

waffle_ss · on Dec 23, 2015

Some things you really can't split into one-process-per-container. Like how WAL-E needs to run alongside the Postgres daemon (or at least, I was unable to get it to run otherwise). You might argue you shouldn't run Postgres in a Docker container, but that's just one example of IPC you can't delegate to shared files / TCP ports.

The real problem with splitting things into a bunch of containers is that the story around container orchestration is still poor. Kubernetes is the leader here, but running a production-ready cluster takes some work (besides Google Container Engine, there are some nice turn-key solutions for spinning up a cluster on AWS but they come with short-lived certificates and rigid CloudFormation scripts which create separate VPCs; so you have to setup your own PKI and tweak CloudFormation scripts).

koffiezet · on Dec 24, 2015

I see no reason why it couldn't run in a separate container. You'd probably have to mount the postgres socket directory and the WAL archive dir into it, but it could be tricky - true. But containers are just a tool. Some things are not suitable to run in containers, don't try to shoe-horn everything into them.

Other than that, there's no problem running postgres itself in a container - as long as your data is stored in a volume ending up being bind-mounted on the local disk, and not on the layered filesystem - otherwise performance will suffer badly.

And yes - orchestration - especially on small-scale - is still a sore point. All the tools like kubernetes seem to focus on large scale and scaling multiple instances of the same containers - which is not what I and many people need. Something like docker-composer, but in daemon form would be nice.

waffle_ss · on Dec 25, 2015

Personally, I've run into weird issues sharing sockets and other files that need to be read+write on both containers. One thing is you have to set up the users very carefully/similarly in both containers, due to file ownership issues with bind mounts (UIDs have to align in both containers).

Agreed about not shoehorning things into containers. Redis, for instance, should be ran with custom kernel parameters (transparent huge pages disabled), so doesn't fit well in the container paradigm since containers share the same kernel.

detaro · on Dec 23, 2015

Agree in general, but you can overdo it with splitting services up. E.g. would you really run a extra container just for a cronjob that runs once a night to e-mail some data from a database? Esp. if you run on a platform where you essentially pay per container that seems like a waste.

koffiezet · on Dec 23, 2015

Most of the things I described assume you have full control over your host's OS.

For stuff like you mention - you should maybe reconsider not using containers if you're on a pay-per-container platform? They are just a tool, and certainly don't fit every single use-case. Also - paying per container seems like a silly thing to do - since containers can be very short-lived. Resource-based billing would be a better fit - although that could be tricky to measure I guess.

detaro · on Dec 23, 2015

I'm currently toying with IBM bluemix (mostly because they have a relatively big free tier) and they have resource-based billing, but you since can't make containers arbitrarily small and you pay for RAM reserved for a container, it is effectively per container. So even if you only need 1 GB for 30 min every night, you either build something that starts a worker container on schedule or you pay for resources you don't use 98% of the time. I guess other platforms are similar.

But of course, if you can afford to use that in production it probably doesn't matter very much, and you might choose a different platform if it bugs you. Just came to mind because I just was wondering how to split stuff up.