It used to be you ran everything on a real server. Then it moved to VMs in the cloud. So spawning and managing VM was the the new thing to with a large IaaS,PaaS,...aaS industry around it.
The product would consist of one or multiple VMs. All possibly running multiple applications as different processes.
Then I guess inter-dependencies between applications and OS versions got to be complicated, so the idea was that each application should run in its own lightweight VM (with LXC as long as they share the same base OS kernel at least).
Isn't this just pushing the problems into managing dependencies between more little VMs while also constraining the architecture? It increases the difficulty of synchronizing 10 or 20 separate isolated applications start,upgrade,fail. Maybe all that living on yet another VM guest machine (so conceptually having 2 virtualization levels). Handling more complicated network setup (bridging, firewall rules at multiple levels). Handling effects of disk and other subsystem interacting with each other in strange, sometimes sub-optimal ways.
One idea was that ok, this is good for security. One can build secure containers. But doesn't SELinux do that better? It even has a multi-level security mode. It sure is complicated but it is used.
The maintainers of Docker appear to strongly believe that 1 application per container is the way to go.
At sufficient scale, they are pretty much correct. You grab Y bare metal hosts and spin up X docker containers for X processes.
> Isn't this just pushing the problems into managing dependencies between more little VMs while also constraining the architecture? It increases the difficulty of synchronizing 10 or 20 separate isolated applications start,upgrade,fail. Maybe all that living on yet another VM guest machine (so conceptually having 2 virtualization levels). Handling more complicated network setup (bridging, firewall rules at multiple levels). Handling effects of disk and other subsystem interacting with each other in strange, sometimes sub-optimal ways.
A handful of people advocate for immutable role-based containers [e.g. https://devopsu.com/blog/docker-misconceptions/] for that reason. In that use case, its really replacing running something like Xen + Chef, KVM + Ansible, or whatever.
You grab a host machine, you stand up your X containers with Z processes per container on Y hosts.
I think this is really equivalent to the Microservices vs. Normal SOA vs. Monolithic argument. Given sufficient scale/requirements, each makes sense. However, none of them are optimal for all situations.
Can you elaborate why they are correct with this view? What is inherently superior about 1 process per container? What is wrong with having two cooperating processes per container? And why should Docker containers only run on bare metal?
The "point" of Docker is to be a lightweight replacement for virtualization in some cases. Layering Docker on top of virtualization, while fine, adds layers rather than removing/replacing them.
As for running on bare metal, I don't think the implication is they always better, but that if your goal is huge scale minimizing the number of layers is better for efficiency (though not necessarily better for other things - e.g. you may want to group/sandbox some containers in VMs for increased security)
At sufficient scale, horizontally scaling is complicated and involves implementing the things I avoid until I have no choice [Dynamic Service Discovery on a per-process basis, etc.]. I'm mostly deferring to stuff I've read about it since I haven't really worked at companies that are larger than $XX million a year.
It generally is things like principle of least privilege, the fact you already have to bite the bullet on managing your service discovery through yet another tool, etc.
Personally, I've never dealt with anything larger than 2-3 datacenter locations with a rack of equipment in each.
> And why should Docker containers only run on bare metal?
That you can just ignore.
I really just meant don't do:
bare metal -> Spin up Xen VM -> Spin up docker containers
Any of these are valid imo:
AWS -> Docker Containers
Bare metal -> Docker Containers
[Insert hosting option X] -> Docker Containers
To me, for my use, Docker is really a lightweight role-based VM I can put on a provisioned host of some kind [another VM, dedicated server]. In other words, its a really simple way to deploy X identical instances of an entire service/application. If any component in that instance of the application is non functional the container is dead and you route requests to a different container.
The problem with the "one process per Docker instance" logic is you need a great deal more service discovery logic as a result. There are a bunch of projects/methods to simplify this but at the end of the day it adds complexity to operations. Instead of a single health check to monitor, you have X containers to monitor. You have X containers to discover, etc. Sure, this lets you get out of running Supervisor and SSH on a Docker container...but I think it adds alot of application complexity that you can avoid 90% of the time.
SOA makes sense when you have a large deployment but if you are deploying to a cluster of 5 machines...its overkill. Many [likely the majority] of projects are at the scale of a basic cluster for redundancy.
Every instance of the app has static locations for every service it talks to so it doesn't need to care what location. Every docker container has a health check for the load balancer and registers itself with the load balancer on startup.
I don't run something like Cassandra or MySQL in docker.
Yes, but managing multiple containers is more effort than 1.
The main practical difference vs. running them all in one vm is explicitly documenting dependencies - both which service relies on which package sets etc. (assuming you're strict about what you put in the Dockerfiles) and which part of your system needs access to which other component or needs to share access to which directories.
You can also easily start them e.g. via systemd unit files or via a tool like fleet, and pull the ip/port of dependencies on startup of the container and pass them as arguments. This also makes deploying them as a unit easy, but still decouples it:
If you need to move one component to a different server, or suddenly realize your foobar server needs lots of resources and want to load-balance it across multiple machines, you can easily do so by forwarding a port via stunnel or load-balancing via haproxy or what have you without affecting the actual containers.
As for your health checks, you can still have a single all-encompassing health check if you want - whether it's in one "vm" or a set of separate containers does not really change that in any way, though the bigger the setup, the more additional checks you'll probably want to add. You have X containers to monitor, but you had X server processes to monitor previously. If your use case made it ok to depend on just e.g. hitting a page on a website that depends on all X services being up, then your single old monitor still does the job.
The overhead can be made really minimal and beneficial even for a single server setup, yet help you if/when you suddenly want to scale part of it. Personally, I run about a dozen docker containers on my home server so far, and I expect that to increase several times over as e.g. every web app I experiment with ends up running in its own container, with the docker setup in a git repo as documentation of exactly which project caused me to pull in which additional packages etc., and keeping the host itself as pristine as possible. Makes me lot less nervous about distribution upgrades etc.
Load Balancer w/ Builtin HTTP Health Checks <-> Application Containers [nginx+php-fpm]
Load Balancer w/ Builtin HTTP Health Checks <-> Container [nginx] <-> Container [php-fpm] <-> Additional health check you have to build out to check that php-fpm is working correctly <-> Additional Tooling to make your process work
For every single service in your architecture you are adding an additional level of redundant complexity because it "helps you scale".
I can spin up as many app containers as I want, as often as I want without a second thought or a moment's consideration. I don't care if I have 1000 of them or 1, as long as the load balancer can handle the connections.
The scale you are talking about is when you need more than just a couple load balancers...and you aren't going to need that for 90% of projects.
As someone constantly wearing my "ops hat", I much prefer to see them treated separately for that reason.
> There are two tricky parts though: getting both underlying processes to log to the container's stdout/stderr, and getting them both to shut down cleanly in response to a SIGTERM to the supervisor.
I was simplifying it for purposes of example. The reality is I'd have a log-shipper as well shipping logs. I don't need it to dump to the container's stdout/stderr because its all in [insert log management application here].
They shutdown as cleanly enough as-is without worrying about it too much as well.
If you have a health check that checks that php-fpm works, it should need no or minimal adaption to handle the new scenario. The only "new" thing is that the processes have a different parent process than usual and is running in different namespaces. If you didn't care about healthchecks for php-fpm before, there's no reason to care about them now.
> For every single service in your architecture you are adding an additional level of redundant complexity because it "helps you scale".
You see it as adding complexity, I see it as reducing complexity: By wrapping each bit in a Docker container, I gain:
- Explicit documentation of the dependencies necessary for each components. I know it covers at least what is needed, because otherwise the component will fail; as opposed to documentation which easily becomes obsolete.
- Each container is limited in what it can access, ensuring that interdependencies between the containers are documented and explicit.
- Versioning of e.g. system packages can evolve separately for each component, without needing separate (heavier) full vms.
> I can spin up as many app containers as I want, as often as I want without a second thought or a moment's consideration
That's great, as long as all components of the app use sufficiently few resources, and are easy enough to run in parallel. E.g. have fun if one of the components of this container is a Postgres install and all the instances needs a consistent view of the database. Or if one of the components needs tons of memory and you end up running out of memory even if there was no need to run more of that component in parallel.
For some apps that's not an issue. For others it is, and even starting a second instance of the full stack becomes a huge waste of resources or a huge added complexity vs. starting more of a specific component.
> The scale you are talking about is when you need more than just a couple load balancers...and you aren't going to need that for 90% of projects.
The scale I'm talking about is anything from a single, small home server, where you may want to spin up more web servers for experimentation but not need more than a single database server, to large clusters. I use Docker for 100% of my small home projects because it makes life easier and saves me time.
opendais-www1 -> fails -> shoot in head, replace.
vidarh-www1 -> fails -> shoot all of the containers in the head, replace.
At this point, its the same atomic unit I originally started with. Now, I have multiple executions to perform that I have to build out. This is the very definition of wasteful added complexity. The high level process is effectively identical but you have to do more things to make it work.
> - Explicit documentation of the dependencies necessary for each components. I know it covers at least what is needed, because otherwise the component will fail; as opposed to documentation which easily becomes obsolete.
My explicit documentation also exists in the form of a Dockerfile and my configuration files. Its a self documenting process.
> - Each container is limited in what it can access, ensuring that interdependencies between the containers are documented and explicit.
How is my container not limited exactly? Because nginx and php-fpm can access the same files? o.O I don't think this is really an argument of any kind beyond the theoretical.
> Versioning of e.g. system packages can evolve separately for each component, without needing separate (heavier) full vms.
You've never seen this go wrong before in production I take it when Person X updates a component that breaks Person Y's application I take it? I have and it is not pretty. This is added complexity to manage.
> For some apps that's not an issue. For others it is, and even starting a second instance of the full stack becomes a huge waste of resources or a huge added complexity vs. starting more of a specific component.
The reason I said "To me, for my use," in the original parent comment is this is for my usage. I don't claim to speak for everyone who might use Docker. I'm sure there are people with different requirements. My requirements involve not building the sort of products you mention in the second group.
> E.g. have fun if one of the components of this container is a Postgres install and all the instances needs a consistent view of the database.
If you are storing state in an application container, it isn't a role-based container. I'm going to assume this is because I said something that was unclear.
Please explain to me how you concluded I stored state in this container:
> The scale I'm talking about is anything from a single, small home server, where you may want to spin up more web servers for experimentation but not need more than a single database server, to large clusters. I use Docker for 100% of my small home projects because it makes life easier and saves me time.
It doesn't save me time and adds complexity. Maybe everyone who is arguing with me is just a better developer than me. Maybe they are just people who have never worked in an environment where every added layer of complexity leads to a higher frequency of breakage because Random Team Member X forgot something.
You have a handful of lines of dependencies to document, for example using systemd unit files. You need to do the basics once, and then reuse. What you end up with are simpler containers (no separate process monitor per container etc.) at the cost of a couple of lines of extra configuration that also happens to document a dependency. I don't see that as a net increase in complexity.
> My explicit documentation also exists in the form of a Dockerfile and my configuration files. Its a self documenting process.
It documents the dependencies of the whole container as a unit. It does not without a lot of extra effort enforce capturing the interdependencies between multiple components in the same container. This is why I like to keep the containers as small as possible.
As long as you're talking about a couple of components with few interdependencies, that's not a big deal.
But once you start piling on additional components, the complexity explodes. I once worked on a project where we were considering running round-the-clock network capture for a few days to reverse engineer the interdependencies between various components because nobody were quite sure exactly what they were any more.. That's the kind of scenarios that drives me to want to encapsulate everything as tightly as possible even if I don't need it right now - complexity has an ugly tendency to creep up on us.
> How is my container not limited exactly? Because nginx and php-fpm can access the same files? o.O I don't think this is really an argument of any kind beyond the theoretical.
As long as nginx and php-fpm are the only things installed, things remain fairly simple (see below, it seems we've had a misunderstanding). Based on that specific example, I agree it's a value judgement and not a big deal either way.
> > Versioning of e.g. system packages can evolve separately for each component, without needing separate (heavier) full vms.
> You've never seen this go wrong before in production I take it when Person X updates a component that breaks Person Y's application I take it? I have and it is not pretty. This is added complexity to manage.
I've seen that many, many times. This is complexity that largely goes away when you enforce a fine grained split into containers that are built and tested as a full unit. It's exactly because I've managed systems with 100+ VM's where people have had conflicting needs for upgrades that I love being able to encapsulate each component separately, so the developers gets the freedom to make decisions about even core system packages without stepping on each others toes (as long as the developers working on a specific component can agree, anyway...)
> If you are storing state in an application container, it isn't a role-based container. I'm going to assume this is because I said something that was unclear.
I store the state in separate volumes, but the database server process still needs to run somewhere. But it does seem that we've been talking past each other. My impression was that you were advocating keeping the full stack in a container:
> Please explain to me how you concluded I stored state in this container
I saw your nginx+php-fpm as just one example since you earlier had written "In other words, its a really simple way to deploy X identical instances of an entire service/application." To me it's not "an entire service/application" unless you also include all the auxiliary processes, including database server and anything else the application depends on.
I was making a general point about putting an application stack in a single container, not specifically addressing your nginx+php-fpm example. Since I clearly misunderstood, we disagree much less than I thought. I'd still opt for separating php-fpm and nginx into separate containers too, but as I said above, that's a value judgement.
To me that is what a role based container is. Admittedly, sometimes I package an entire one-off service inside a container to make it easy to do dev work...but not in production.
> I store the state in separate volumes, but the database server process still needs to run somewhere. But it does seem that we've been talking past each other. My impression was that you were advocating keeping the full stack in a container:
Okay so we'll chock this one up to my shitty communication skills. :)
> I've seen that many, many times. This is complexity that largely goes away when you enforce a fine grained split into containers that are built and tested as a full unit. It's exactly because I've managed systems with 100+ VM's where people have had conflicting needs for upgrades that I love being able to encapsulate each component separately, so the developers gets the freedom to make decisions about even core system packages without stepping on each others toes (as long as the developers working on a specific component can agree, anyway...)
I think we'll have to chock it up to different experiences and meaning different things when we write stuff down.
I see that problem happening when you do what you are doing and enable people to make those special snowflake VMs because of conflicting requirements. The technically integrate correctly until you get an edgecase bug that was fixed in one and not the other.
It makes perfect sense to use industry standard, encrypted communications with proven cryptography when creating vast numbers of systems. Yes, you don't have to. No, that doesn't mean it's a bad idea.
I'm not sure why things would be any different with docker, and after reading the article I'm unconvinced. If you want to be locked in to docker's APIs, so be it. If you want to be free of them and integrate in other ways, such as proven, portable, secure methods like SSH, that's fine too.
If a management task as simple as deploying a key is so hard in docker (couldn't you just bind-mount a read only .ssh dir?), maybe you should consider alternative methods of container instantiation.
I don't see how granting access to the host is a cleaner architecture... from a security standpoint, it seems the opposite.
If you are concerned about granting access to the host, you should be concerned about granting access to a Docker container, especially as root, until it's far more battle tested.
And if you are concerned about this, there's a simple solution: Group your Docker containers in KVM vm's.
If you are not concerned about this, you can easily do as he suggested, and force "nsenter" on login. Or you can do both: Group your containers in a KVM vm and force ssh into the KVM to use nsenter into the appropriate containers.
Part of the point is that it makes little sense to add to the complexity and attack surface to add a process monitor per container (to run both sshd and the app) and an sshd per container, even if what you add is something that provides "industry standard, encrypted communications with proven cryptography" if it is not actually needed inside the containers.
You're also not getting "locked into" Dockers API much if at all. What is proposing depends on two things: A way of executing a command that can access a certain part of the containers filesystem, and a way of entering the appropriate namespaces.
Both are easily abstracted behind tiny (1-2 line) scripts where you can replace "docker run" and "nsenter" with ssh if your needs change. And one (nsenter) is entirely independent of Docker - it will work with e.g. lxc and anything else that makes use of cgroups as well.
Everyone should be.
There's a simple solution: Group your Docker containers in KVM vm's.
I consider that an ugly hack that should be avoided at all costs. Why? It's not viable without up-front automation investment, ongoing maintenance overheads, and additional latency and inefficiency, for starters. It's also probably less portable, particularly to embedded systems.
it makes little sense to add to the complexity and attack surface to add a process monitor per container and an sshd per container
Process monitor? What are you talking about? At least with LXC, each container has PID 1 (master application process, init system, whatever) and can be stopped/started easily with lxc-stop -n nameofcontainer.
As for sshd, my point was that it can make sense because it's secure, remote-accessible, proven. The article premise was that it's a bad idea, but none of its arguments hold, to my mind.
Both are easily abstracted behind tiny (1-2 line) scripts...
That aren't remotely accessible, without breaking the abstraction and creating some shared access scenario on the host. The entire point of an abstraction is to be clean. It's not clean if you have to adopt hacky workarounds and dump all of your existing tools in order to play with it.
And in that case running sshd in the Docker container is not a good idea. There has not been sufficient work in hardening Docker to ensure that there are no way of breaking out of Docker containers.
> I consider that an ugly hack that should be avoided at all costs. Why? It's not viable without up-front automation investment,
There are multiple automation tools available that can handle KVM just fine, including OpenStack. And if you're going to be deploying enough Docker containers for you to be concerned about KVM automation, then you need to invest in automation anyway, for Docker.
But what is your alternative means of sandboxing the apps if you are concerned about users gaining access to the host? Outside of other VM solutions like Xen etc.? OpenVz? No matter which sandboxing method you choose, you're either trading off security or picking complexity + additional latency and inefficiency.
> Process monitor? What are you talking about? At least with LXC, each container has PID 1 (master application process, init system, whatever) and can be stopped/started easily with lxc-stop -n nameofcontainer.
Yes, and if you want to stuff an sshd inside an lxc container or docker container alongside your application process, that means you will typically need use init, daemontools, mon or a similar tool to be the pid 1 in the container responsible for spawning the sshd and app, instead of just letting the application itself be the "local" pid 1. Which means that you don't typically end up just adding sshd, but dragging in additional dependencies as well.
> As for sshd, my point was that it can make sense because it's secure, remote-accessible, proven
That doesn't change if it's run on the host, and you force "nsenter". But it also does not protect you against the potential for escaping the containers. If the "secure" part here matters for you, you should be considering a full VM (optionally with Docker inside, but not just Docker) until Docker has substantially more testing behind it.
> That aren't remotely accessible, without breaking the abstraction and creating some shared access scenario on the host.
First of all, restricting remote access is a good thing. Part of the point is that with a properly developed container image coupled with the proper tools, you should not need to access individual containers remotely.
Opening up for remote access is not just a security problem, but a massive configuration management hassle.
The number of problems that gets created, or not properly fixed, because people get sloppy when they have direct access to poke around production servers / vms / containers is one of the biggest ops headaches I've had the misfortune of dealing with over the last 20 years. Yes, you may occasionally need to poke around to trouble-shoot, and so having some means of gaining access is still necessary, but as the article points out: You still do.
(And again, if you have problems with granting access to the host, you should not trust Docker without containing it in a VM)
So in other words: They are just as remotely accessible as the same apps would be outside of the containers. And if your proposed alternative is to replace this:
- sshd + a single binary that'll soon be in pretty much all distro's.
... then we don't have at all the same view of what complexity is.
what is your alternative means of sandboxing the apps
Well, I don't have a finished solution but already I could write a fairly hefty book on evaluated approaches here ... basically combining the normal kernel tools (aggressive capabilities restrictions, connectivity restrictions, read-only root, device restrictions, mount restrictions (no /sys for example), subsystems-specific resource limits) with formal release process (testing, versioning, multi-party signoff), additional layers of protection (host-level firewalling, VLAN segregation, network infrastructure firewalling, IO bandwidth limitation, diskless nodes, unique filesystem per guest, unique UIDs/GIDs per guest, aggressive service responsiveness monitoring with STONITH and failover, mandating the use of hardened toolchains), kernel hardening and use of security toolkit features (syscall whitelist policy development automation via learning modes + test suite execution, etc.)
Fail-safe? No. Better than most? Probably. Security is a process, after all...
That doesn't change if it's run on the host
It does, because you're now contentious for host resource, and you have to have the comprehension of an abstract concept of hosts and guests and their identities living remotely, ie. normal tools - which assume node-per-address - won't work out of the box.
neither a "free" solution in terms of complexity
You have highlighted a tiny difference in the process space, which is basically free. But in doing so, you have ignored the other aspects. A single read-only bind mount per guest is very cheap in complexity terms.
From the Docker Host itself, if you need to manage the state of a container, the intuition is that you need to go into the container (with SSH) in order to do so. But by externalizing your state, you can manage it without the need to enter the container. Assuming your Docker Host is secure, this doesn't make anything less secure just because you're no longer abusing SSHd in order to manage your application's state.
In the case you need to gdb, or strace the process, you can do that from the Docker Host with nsenter. Assuming your Docker Host is secure, you no longer need to abuse SSHd to carry out a debugging task that has nothing to do with needing a secure shell.
Neither of these use-cases have anything to do with the security of SSH.
In the case that you need to do these things from a remote host, the prescribed answer is indeed SSHd to access the Docker Host, at which point you switch to the previously suggested methods for managing state.
"I don't see how granting access to the host is a cleaner architecture... from a security standpoint, it seems the opposite."
Because now you only have to worry about one security layer instead of N security layers for each container you run. The security layer is now actually coupled to the act of granting access to the host, its intended purpose vs granting access to a container so you can manage its state or debug it or whatever.
As far as being locked into Docker's APIs, I totally miss the aim of this remark. Volumes are just paths on the filesystem. If you're talking about the interoperability of standard tools to manage your state, I don't think they will have problems in this case.
Yes, you missed the point. Please read the other response to comprehend the difference.
Tbh I'd pay more attention to host key regeneration! Easily overlooked.
Until there's better investigation tooling available, sshd seems a fairly sensible approach.
MAC / SELinux prevents many other poke-inside techniques on enterprise platforms
There is an image by Phusion called baseimage-docker which adds SSHd, init, syslog, and cron in an attempt to make Docker containers more like lightweight VMs. But in the #docker channel, I've seen people have issues with it. For example, one person had some /etc/init.d scripts that wouldn't start up (other ones started up fine). Turns out that one of the signals that init was waiting on to start that script was never getting sent (I think it was networking coming online?), and that was just a side effect of how Docker works that couldn't easily be fixed. The Docker maintainers in the channel discouraged using this image for these reasons.
See also my other comment which clarifies baseimage-docker's position: https://news.ycombinator.com/item?id=7951042
But about that pet x cattle analogy, there are plenty of SSH based tools for large scale management of servers. And it's still the best file copy tool available on any system (rsync is not an option here, because rsync is only good when it uses ssh).
But I think using ssh for large scale management of server largely misses the main benefits of Docker. One of the nice ways of using Docker is to replace tools that uses ssh to try to replicate vm states in various ways that are often hard to make 100% reproducible, with something that is entirely reproducible because it replicates the state exactly.
When you build a docker image, push it to an index, and deploy it on your test system, and then later on the production system, you know the container remains identical.
I'm fairly sure every cloud provider worth talking about has an image-based deployment system. Even an ESXi box, the sort of which exists in many small offices, does.
Both most of these solutions are far more heavy duty. I've shipped enough VM images all over the place to learn to hate the overheads compared to the very lean-ness of a typical Docker deploy.
It is a good/valid focus for them. However, it is not for everyone.
Single process is a single process. [e.g. Nginx]
I think you are confused by what I meant.
Stateless single purpose containers is good.
One process per container is bad because it involves more service discovery before you actually need it.
Furthermore, you can easily set all of the containers to share the same networking namespace so they can all just listen locally if you wanted a turn key solution. The pretty trivial issue of single-host service discovery is not a very strong argument against the benefits of single process containers in my experience.
1) Multiple docker containers to spin up and be managed.
2) Multiple health checks.
3) Set additional flags/do additional config for Docker.
The fact people consistently say "Eh, this is a non-issue" is great. It means you are much luckier and more skilled than I since you can manage all of that with 0 additional effort.
For me, all of this is effort I don't need to expend.
If you are logging in to debug, you can log into the host.
Then when I start playing with a new application, I set up a new docker image for it, and has that docker image bind to a suitable subdirectory that is then shared with the "dev" image I ssh into.
This means I always have 100% up to date documentation of the requirements for running the app, in the form of the docker file.
I don't see a problem with running sshd in a Docker container. I do have a bit of an issue with "throwing a bunch of stuff" in the containers. If your purpose for having ssh there is to be able to investigate things, then I'd argue you don't really need it in most cases:
- You can copy files out of the containers with "docker cp"
- All the processes are visible and traceable from the host
- You can choose to export the important bits of the filesystem of the container, so that you can start a second container to inspect those bits more thoroughly (this can also be used to great effect to allow apps that needs to work on the same file sets to be isolated in separate containers).
- During dev/testing you can easily override the entrypoint and start a shell.
In practice I've found that while introspection of various forms is important during testing and development, there's also a certain benefit in not giving yourself direct access to easily modify stuff: It makes it easier to maintain discipline and update the Dockerfile etc. and rebuild containers.
If you also use e.g. systemd to run the containers, and log to the console, you get your logs concentrated via journald and optionally shipped on to rsyslog etc., and combined with the docker tools and careful use of exported volumes, there's very little state left where there's much need to ssh in to a specific container. There's also a tool (forget the name; EDIT: D'oh, it's "nsenter" that's presented in the article I'm thinking about) that let you run a process in the same cgroup namespaces as your container as well.
If anything, I'm leaning towards trimming the containers further, down to the bare essentials for each specific app.
I had a pleasant conversation with Jerome quite a while ago about SSH and what the "right" way is to login to a Docker container. We were not able to find consensus, but Jerome is a brilliant guy and his reasons were sound. For some time, I considered using lxc-attach to replace the role of SSH. Unfortunately, a few weeks later, Docker 0.9 came out and no longer used LXC as the default backend, and so suddenly lxc-attach stopped working. We decided to stick with SSH until there's a better way. Solomon Shykes told us that they have plans to introduce an lxc-attach-like tool in Docker core. Unfortunately, as of Docker 1.0.1, this feature still hasn't arrived.
Now, Jerome is advocating nsenter. There is currently an ongoing discussion on the baseimage-docker bug tracker about replacing SSH with nsenter: https://github.com/phusion/baseimage-docker/issues/102
But leaving all of that aside, we regularly get told by people that Baseimage-docker "misses the point" of Docker. But what is the point of Docker? Some people, including Jerome, believe it's all about microservices and running one process in a container.
We take a more balanced, nuanced view. We believe that Docker should be regarded as a flexible tool, that can be mended into whatever you want. You can make single-process microservices, if you want to and if you believe that's the right choice for you. Or you can choose to make multi-process microservices, if that makes sense. Or you can choose to treat Docker like a lightweight VM. We believe that all of those choices are correct. We don't believe that one should ONLY use Docker to build microservices, especially because Microservices Are Not A Free Lunch (http://highscalability.com/blog/2014/4/8/microservices-not-a...).
Baseimage-docker is about enabling users to do whatever they want to. It's about choice. It's not about cargo-culting everything into a single philosophy. This is why Baseimage-docker is extremely small and minimalist (only 6 MB memory over), flexible and thoroughly documented. Baseimage-docker is not about advocating treating Docker as heavyweight VMs.
I do have criticism for your communication around that base image, starting with the link-bait blog post "you're using Docker wrong". Your message is that anybody not using Docker your way (full-blown init process, sshd, embedded syslog) is doing it wrong. That is not only incorrect, it contradicts Docker's philosophy of allowing and supporting more than one usage pattern.
My other criticism is that you point out a known Docker bug (the pid1 issue) and use it as a selling point for your image, without concerning yourself with reporting the bug let alone contributing to a fix. Meanwhile many people hit the same pid1 bug and have reported, suggested possible fixes, or contributed code to help implement that fix. If you want to be taken seriously in the Docker community, my recommendation is that you consider doing the same.
Far be it from me to tell you how you should run your own project, but it seems to me that if Docker is going to live up to the shipping container metaphor, then it needs to be at least somewhat opinionated. In particular, you've previously explained that Docker is supposed to provide a standard way of separating concerns between development and operations. If this is going to work in practice, then it seems to me that there needs to be agreement on conventions like:
* Logs go to stdout/stderr, not to the container filesystem or even a volume.
* Configuration settings are provided on container startup through environment variables.
* Related to the above, occasional configuration changes are made by starting a new container with new variables, not by editing a config file inside the existing container.
* The container's main process needs to cleanly shut down the main service in response to SIGTERM.
* No SSH in the container, unless the container is providing an SSH-based service, e.g. a gitolite container.
So if I'm right about what the conventions are or should be, then Puhsion's base image is indeed misguided.
- In Baseimage-docker, Runit is configured to have all services log to stdout/stderr. In Passenger-docker, the Nginx error logs are redirected to stdout/stderr. We actively encourage services to log to stdout/stderr.
- Baseimage-docker provides easy mechanisms for allowing multiple processes to access the environment variables that were passed to the container.
- Baseimage-docker's custom init process was designed precisely to allow graceful termination through SIGTERM. It even terminates all processes in the container upon receiving SIGTERM.
Baseimage-docker does not mean that the Docker conventions are thrown out of the door.
1. Your Unix system is wrong unless it conforms to certain technical requirements.
2. Explanation of the requirements.
3. One possible solution that satisfies these requirements: Baseimage-docker.
4. Does your image already satisfy the requirements? Great. If not, you can implement these requirements yourself, but why bother when you can grab Baseimage-docker? And oh, it happens to contain some useful stuff that are not strictly necessary but that lots of.
As you can see, such a complicated message becomes waaay too long and hard to explain to most people. It probably only makes sense if you've contributed to the Linux kernel, or read an operating systems book. If I explained it in a way that's too technical and nuanced, 99% of the people will fall asleep after reading 1 paragraph. So the message was simplified. I apologize if the simplified message has offended you, and I am continuing to finetune the message.
As the for the PID 1 issue: I genuinely thought you guys didn't include a PID 1 on purpose, because running one isn't that hard. Last time I talked to Jerome, he had the opinion that, if software couldn't deal with zombie processes existing on the system, it's a bug in the software. With that response in mind, I thought that the Docker team does not recognize the PID 1 issue as really an issue. So please do not mistake the lack of a bug report as malice.
Later on, you told me that you guys are working on this, and I was glad to hear that.
I get the feeling that you feel bitter about the fact that I chose to write Baseimage-docker instead of contributing a PID 1 to Docker. Please understand that I did not do this out of any adversarial intentions. My Go skills are minimal and I am busy enough with other stuff. This, combined with the fact that at the time I thought the PID 1 issue was simply not recognized, led to me write Baseimage-docker. I would like to stress that I look forward to friendly relationships with you, with the Docker team and with the community.
The only occurrence of the word wrong in the whole post is in the sentence "What might be wrong with it?". That sounds more like healthy criticism than 'incorrect' contradiction of Docker's philosophy to me.
About the pid1 thing, I do not think Foobarwidget saw that as a Docker bug, but as a bug of Docker containers. Doesn't it make sense to release a Docker container with a proper init process then?
EDIT: I do really like the separation of concerns and modularity that are brought about by the approach advanced in the article. But I would argue that the arguments against SSH apply many times more strongly against alternatives to it: security upgrades? You're going to have to do that much more often with whatever you use to replace SSH. SSH has proven track record for security and authentication, it's well known, lightweight, and generally doesn't break on its own.
I've now introduced and put docker-based infrastructure projects into production environments at 2 different companies, and IMO having sshd in the containers has made it much easier and familiar for techops/devops teams to get started with docker.
Docker-attach is a much more limited solution, and I think that introducing another tool like nsenter is a non-starter since it just adds more complexity with additional tooling and dependencies. Another tool when ssh works? The additional cpu/ram use isn't a big deal, and for security as long as I secure sshd and my keys/password properly (not storing them in my image, for example...), no worries.
Docker logging is also limited compared to tried and true linux logging utils.
Docker process supervision is still a bit immature and unreliable. I'll keep trying the built-in solutions, but I have everything working fine now without needing to wait for subsequent Docker releases.
Docker is a really convenient wrapper around a bunch of standard Linux tools, and IMO that has been its power. The weaknesses in Docker have been where it tries to build its own replacement for existing and mature solutions (logging, supervision, networking, etc.).
A lot of the functionality of libcontainer, libchan, libswarm seem to be done by existing tools. Why reinvent the wheel? Are the existing project maintainers unwilling to take pull requests?
I don't buy that article at all. There's a lot of strawmen there where things are suddenly needed for micro-services while they apparently aren't when running the exact same things inside a single VM.
You can architect a microservices based system in ways that add operational complexity, but if you do it should be because there are substantial benefits to be had that way.
But you can equally well take that monolithic VM that seems to be what that article is assuming the alternative is, run Docker inside it, and run each service in a Docker container inside it, and still start to realise substantial benefits; not least because it makes it easier to grow out of the single VM easily as/when needed by making dependencies much more explicit and allowing each services software dependencies to evolve separately.
I agree you don't have to use Docker only that way, but the more I've played with Docker the finer-grained I end up making things...
I'm not sure I understand your point. My article concludes that MicroServices do indeed bring substantial benefits on a longer time horizon. However, the undeniably add operational overhead because your monolithic app explodes in terms of number of processes.
This is true whether you deploy to 1 virtual machine or 100. It's still 100 distinct processes that tend to communicate asynchronously.
The article doesn't mention the word Docker, but we actually subsequently found that Docker was the missing link and the thing that made MicroServices viable. When your abstraction layer becomes the container then the operational complexity is tamed.
Edit - Here are two articles I also wrote which describe how Docker enables MicroServices if you didn't catch them. These came after the No Free Lunch article. MicroServices with Docker, particularly polyglot MicroServices, would be painful beyond belief without Docker.
There's no reason it has to, is my point.
You may have a point in instances where you start with an application that actually already wrapping a ton of unrelated functionality together in a single process - I didn't really think about it in terms of that scenario. But then I'd argue that an increase in the number of processes will be an operational godsend over having to try to track down problems in a monolithic mess. And even then you're not forced into splitting things up into tons of little pieces in one go.
The scenario I was thinking of, on the basis of the discussion here regarding baseimage-docker, is splitting up services that consists of a bunch of interrelated processes. E.g. Nginx + Apache + Memcached + the actual application + various cron jobs + Postgres is a typical example. Starting by splitting that up into separate Docker containers for each existing processgroup doesn't introduce any new application processes.
You can then gradually break up the actual application if justified/needed.
MicroServices are an generally about application architecture where you break your application into a very finely grained interacting services.
Your eCommerce store web application for instance becomes a shopping cart service, a stock service, a category service, a login service, a user profile service etc. You can take it even more fine grained so your user profile service becomes a user profile update service and a user profile rendering service.
The point is, one application becomes tens or hundreds of distinct and distributed processes. This is a massive rise in development and operational complexity before you seperate out your web server, database etc.
FYI We are heavy production users of MicroServices within Docker and would put tools such as NGINX, MemCached, Postgres into Docker images as a matter of course so they can be built, deployed and versioned in the same way as our services.
Now you are nitpicking to the extreme.
> The point is, one application becomes tens or hundreds of distinct and distributed processes.
And this is a process that starts from the moment you start splitting up the major services that typically already live in separate processes. It's a false dichotomy to treat your custom code and the full stack separately in this respect.
I explained that I agree that you had a point when you get to a level where the application has been split up to a great extent. I also pointed out why that was not the case I was addressing - rather pointing to your article as justification for baseline-docker. In fact, if you go "the whole hog", I'd argue the argument for baseline-docker because substantially worse.
It's a false dichotomy to contrast "fully monolithic app" and "true microservices". In real-life it is a sliding scale where most larger systems will already consist of multiple cooperating services, whether or not you wrote them yourself. For every person who thinks they are doing micro-services, I bet I can find someone who would argue they should have split it up more (or less).
The more pragmatic point is to split up to the extent your operational setup handles well. From my point of view, 10 services or 200 per application makes pretty much no difference in the setup we're deploying at work, for example.
I'd be happy to discuss microservices with you in more detail (I'm in London too), but that's an entirely different discussion from the comment I was making earlier.
> This is a massive rise in development and operational complexity before you seperate out your web server, database etc.
It's a massive rise in development and operational complexity if you're set up to do monolithic or near monolithic apps. Once you're set up to handle reasonably distributed/service oriented apps, additional increases in numbers of services quickly ceases to be an issue. On the contrary, I'd argue that for many setups, splitting up your application further reduces complexity because it forces making dependencies more explicit and allows for more easy probing and monitoring. I know I much prefer doings ops for apps that leans towards micro-services than monolithic apps (part of my responsibility is ops for a private multi-tenant cloud for various relatively large CRM applications we do).
Is chef my favorite too? No. Could I be using docker in a more optimal manner? Sure. But the reality is that I wanted the simplest possible path to integrate docker within our workflow, and it already saves a ton of resources. I run linux on a machine with 4G of ram, so believe me, utilizing containers for testing infrastructure changes is a huge improvement.
So docker as a lightweight VM is most definitely a valid use case, IMHO
If it's a problem to update SSH, then it should also be a problem to update whatever else you have in your Docker container. I guess there's some argument to be made that if you're running a single-purpose Docker container, the updates to whatever service it's running won't sync up with the updates to SSH, so you may drastically increase the number of times you'll have to package the image, but that's just a general argument in favor of single-purpose containers, not anything specific to SSH like the key management issue.
Once it's in there, a "jexec" equivalent would a couple of lines of shell script.
- Your key management is safe.
- The process manager you now need to introduce to start sshd and the app running is safe.
- That the ssh daemon is sufficiently protected against abuse.
- That your configuration of it is safe.
If you don't need ssh in every container to do achieve what you need to achieve, why do you want to have to deal with each of those and waste the extra resources of having a bunch of extra sshd's and process monitors running?
(To the last point: Yesterday we suffered an attempt at brute-forcing ssh on a public facing server. We're used to people trying to brute force passwords. But as it happens, it is "easy" to make openssh consume all of your servers resources if you don't block access on the network level in the event of an apparent attack; so if any of those ssh servers are reachable in any way from the outside, you have just increased your attack surface even if your key management and everything else is perfect and they have no way of actually getting in)
SSH was purely chosen because until recently there wasn't a better alternative. lxc-attach stopped working out of the box since Docker 0.9. See https://news.ycombinator.com/item?id=7951042
other points in the blog are true but its not what people want. people want nsenter, but they dont know, so they use ssh.