I had a pleasant conversation with Jerome quite a while ago about SSH and what the "right" way is to login to a Docker container. We were not able to find consensus, but Jerome is a brilliant guy and his reasons were sound. For some time, I considered using lxc-attach to replace the role of SSH. Unfortunately, a few weeks later, Docker 0.9 came out and no longer used LXC as the default backend, and so suddenly lxc-attach stopped working. We decided to stick with SSH until there's a better way. Solomon Shykes told us that they have plans to introduce an lxc-attach-like tool in Docker core. Unfortunately, as of Docker 1.0.1, this feature still hasn't arrived.
Now, Jerome is advocating nsenter. There is currently an ongoing discussion on the baseimage-docker bug tracker about replacing SSH with nsenter: https://github.com/phusion/baseimage-docker/issues/102
But leaving all of that aside, we regularly get told by people that Baseimage-docker "misses the point" of Docker. But what is the point of Docker? Some people, including Jerome, believe it's all about microservices and running one process in a container.
We take a more balanced, nuanced view. We believe that Docker should be regarded as a flexible tool, that can be mended into whatever you want. You can make single-process microservices, if you want to and if you believe that's the right choice for you. Or you can choose to make multi-process microservices, if that makes sense. Or you can choose to treat Docker like a lightweight VM. We believe that all of those choices are correct. We don't believe that one should ONLY use Docker to build microservices, especially because Microservices Are Not A Free Lunch (http://highscalability.com/blog/2014/4/8/microservices-not-a...).
Baseimage-docker is about enabling users to do whatever they want to. It's about choice. It's not about cargo-culting everything into a single philosophy. This is why Baseimage-docker is extremely small and minimalist (only 6 MB memory over), flexible and thoroughly documented. Baseimage-docker is not about advocating treating Docker as heavyweight VMs.
I do have criticism for your communication around that base image, starting with the link-bait blog post "you're using Docker wrong". Your message is that anybody not using Docker your way (full-blown init process, sshd, embedded syslog) is doing it wrong. That is not only incorrect, it contradicts Docker's philosophy of allowing and supporting more than one usage pattern.
My other criticism is that you point out a known Docker bug (the pid1 issue) and use it as a selling point for your image, without concerning yourself with reporting the bug let alone contributing to a fix. Meanwhile many people hit the same pid1 bug and have reported, suggested possible fixes, or contributed code to help implement that fix. If you want to be taken seriously in the Docker community, my recommendation is that you consider doing the same.
Far be it from me to tell you how you should run your own project, but it seems to me that if Docker is going to live up to the shipping container metaphor, then it needs to be at least somewhat opinionated. In particular, you've previously explained that Docker is supposed to provide a standard way of separating concerns between development and operations. If this is going to work in practice, then it seems to me that there needs to be agreement on conventions like:
* Logs go to stdout/stderr, not to the container filesystem or even a volume.
* Configuration settings are provided on container startup through environment variables.
* Related to the above, occasional configuration changes are made by starting a new container with new variables, not by editing a config file inside the existing container.
* The container's main process needs to cleanly shut down the main service in response to SIGTERM.
* No SSH in the container, unless the container is providing an SSH-based service, e.g. a gitolite container.
So if I'm right about what the conventions are or should be, then Puhsion's base image is indeed misguided.
- In Baseimage-docker, Runit is configured to have all services log to stdout/stderr. In Passenger-docker, the Nginx error logs are redirected to stdout/stderr. We actively encourage services to log to stdout/stderr.
- Baseimage-docker provides easy mechanisms for allowing multiple processes to access the environment variables that were passed to the container.
- Baseimage-docker's custom init process was designed precisely to allow graceful termination through SIGTERM. It even terminates all processes in the container upon receiving SIGTERM.
Baseimage-docker does not mean that the Docker conventions are thrown out of the door.
1. Your Unix system is wrong unless it conforms to certain technical requirements.
2. Explanation of the requirements.
3. One possible solution that satisfies these requirements: Baseimage-docker.
4. Does your image already satisfy the requirements? Great. If not, you can implement these requirements yourself, but why bother when you can grab Baseimage-docker? And oh, it happens to contain some useful stuff that are not strictly necessary but that lots of.
As you can see, such a complicated message becomes waaay too long and hard to explain to most people. It probably only makes sense if you've contributed to the Linux kernel, or read an operating systems book. If I explained it in a way that's too technical and nuanced, 99% of the people will fall asleep after reading 1 paragraph. So the message was simplified. I apologize if the simplified message has offended you, and I am continuing to finetune the message.
As the for the PID 1 issue: I genuinely thought you guys didn't include a PID 1 on purpose, because running one isn't that hard. Last time I talked to Jerome, he had the opinion that, if software couldn't deal with zombie processes existing on the system, it's a bug in the software. With that response in mind, I thought that the Docker team does not recognize the PID 1 issue as really an issue. So please do not mistake the lack of a bug report as malice.
Later on, you told me that you guys are working on this, and I was glad to hear that.
I get the feeling that you feel bitter about the fact that I chose to write Baseimage-docker instead of contributing a PID 1 to Docker. Please understand that I did not do this out of any adversarial intentions. My Go skills are minimal and I am busy enough with other stuff. This, combined with the fact that at the time I thought the PID 1 issue was simply not recognized, led to me write Baseimage-docker. I would like to stress that I look forward to friendly relationships with you, with the Docker team and with the community.
The only occurrence of the word wrong in the whole post is in the sentence "What might be wrong with it?". That sounds more like healthy criticism than 'incorrect' contradiction of Docker's philosophy to me.
About the pid1 thing, I do not think Foobarwidget saw that as a Docker bug, but as a bug of Docker containers. Doesn't it make sense to release a Docker container with a proper init process then?
EDIT: I do really like the separation of concerns and modularity that are brought about by the approach advanced in the article. But I would argue that the arguments against SSH apply many times more strongly against alternatives to it: security upgrades? You're going to have to do that much more often with whatever you use to replace SSH. SSH has proven track record for security and authentication, it's well known, lightweight, and generally doesn't break on its own.
I've now introduced and put docker-based infrastructure projects into production environments at 2 different companies, and IMO having sshd in the containers has made it much easier and familiar for techops/devops teams to get started with docker.
Docker-attach is a much more limited solution, and I think that introducing another tool like nsenter is a non-starter since it just adds more complexity with additional tooling and dependencies. Another tool when ssh works? The additional cpu/ram use isn't a big deal, and for security as long as I secure sshd and my keys/password properly (not storing them in my image, for example...), no worries.
Docker logging is also limited compared to tried and true linux logging utils.
Docker process supervision is still a bit immature and unreliable. I'll keep trying the built-in solutions, but I have everything working fine now without needing to wait for subsequent Docker releases.
Docker is a really convenient wrapper around a bunch of standard Linux tools, and IMO that has been its power. The weaknesses in Docker have been where it tries to build its own replacement for existing and mature solutions (logging, supervision, networking, etc.).
A lot of the functionality of libcontainer, libchan, libswarm seem to be done by existing tools. Why reinvent the wheel? Are the existing project maintainers unwilling to take pull requests?
I don't buy that article at all. There's a lot of strawmen there where things are suddenly needed for micro-services while they apparently aren't when running the exact same things inside a single VM.
You can architect a microservices based system in ways that add operational complexity, but if you do it should be because there are substantial benefits to be had that way.
But you can equally well take that monolithic VM that seems to be what that article is assuming the alternative is, run Docker inside it, and run each service in a Docker container inside it, and still start to realise substantial benefits; not least because it makes it easier to grow out of the single VM easily as/when needed by making dependencies much more explicit and allowing each services software dependencies to evolve separately.
I agree you don't have to use Docker only that way, but the more I've played with Docker the finer-grained I end up making things...
I'm not sure I understand your point. My article concludes that MicroServices do indeed bring substantial benefits on a longer time horizon. However, the undeniably add operational overhead because your monolithic app explodes in terms of number of processes.
This is true whether you deploy to 1 virtual machine or 100. It's still 100 distinct processes that tend to communicate asynchronously.
The article doesn't mention the word Docker, but we actually subsequently found that Docker was the missing link and the thing that made MicroServices viable. When your abstraction layer becomes the container then the operational complexity is tamed.
Edit - Here are two articles I also wrote which describe how Docker enables MicroServices if you didn't catch them. These came after the No Free Lunch article. MicroServices with Docker, particularly polyglot MicroServices, would be painful beyond belief without Docker.
There's no reason it has to, is my point.
You may have a point in instances where you start with an application that actually already wrapping a ton of unrelated functionality together in a single process - I didn't really think about it in terms of that scenario. But then I'd argue that an increase in the number of processes will be an operational godsend over having to try to track down problems in a monolithic mess. And even then you're not forced into splitting things up into tons of little pieces in one go.
The scenario I was thinking of, on the basis of the discussion here regarding baseimage-docker, is splitting up services that consists of a bunch of interrelated processes. E.g. Nginx + Apache + Memcached + the actual application + various cron jobs + Postgres is a typical example. Starting by splitting that up into separate Docker containers for each existing processgroup doesn't introduce any new application processes.
You can then gradually break up the actual application if justified/needed.
MicroServices are an generally about application architecture where you break your application into a very finely grained interacting services.
Your eCommerce store web application for instance becomes a shopping cart service, a stock service, a category service, a login service, a user profile service etc. You can take it even more fine grained so your user profile service becomes a user profile update service and a user profile rendering service.
The point is, one application becomes tens or hundreds of distinct and distributed processes. This is a massive rise in development and operational complexity before you seperate out your web server, database etc.
FYI We are heavy production users of MicroServices within Docker and would put tools such as NGINX, MemCached, Postgres into Docker images as a matter of course so they can be built, deployed and versioned in the same way as our services.
Now you are nitpicking to the extreme.
> The point is, one application becomes tens or hundreds of distinct and distributed processes.
And this is a process that starts from the moment you start splitting up the major services that typically already live in separate processes. It's a false dichotomy to treat your custom code and the full stack separately in this respect.
I explained that I agree that you had a point when you get to a level where the application has been split up to a great extent. I also pointed out why that was not the case I was addressing - rather pointing to your article as justification for baseline-docker. In fact, if you go "the whole hog", I'd argue the argument for baseline-docker because substantially worse.
It's a false dichotomy to contrast "fully monolithic app" and "true microservices". In real-life it is a sliding scale where most larger systems will already consist of multiple cooperating services, whether or not you wrote them yourself. For every person who thinks they are doing micro-services, I bet I can find someone who would argue they should have split it up more (or less).
The more pragmatic point is to split up to the extent your operational setup handles well. From my point of view, 10 services or 200 per application makes pretty much no difference in the setup we're deploying at work, for example.
I'd be happy to discuss microservices with you in more detail (I'm in London too), but that's an entirely different discussion from the comment I was making earlier.
> This is a massive rise in development and operational complexity before you seperate out your web server, database etc.
It's a massive rise in development and operational complexity if you're set up to do monolithic or near monolithic apps. Once you're set up to handle reasonably distributed/service oriented apps, additional increases in numbers of services quickly ceases to be an issue. On the contrary, I'd argue that for many setups, splitting up your application further reduces complexity because it forces making dependencies more explicit and allows for more easy probing and monitoring. I know I much prefer doings ops for apps that leans towards micro-services than monolithic apps (part of my responsibility is ops for a private multi-tenant cloud for various relatively large CRM applications we do).
Is chef my favorite too? No. Could I be using docker in a more optimal manner? Sure. But the reality is that I wanted the simplest possible path to integrate docker within our workflow, and it already saves a ton of resources. I run linux on a machine with 4G of ram, so believe me, utilizing containers for testing infrastructure changes is a huge improvement.
So docker as a lightweight VM is most definitely a valid use case, IMHO