Hacker News new | past | comments | ask | show | jobs | submit login

This is a really great step forward - thanks Mitchell!

I've recently spent a couple of weeks doing a deep dive into Docker, so I'll share some insights from what I've learned.

First, it's important to understand that Docker is an advanced optimization. Yes, it's extremely cool, but it is not a replacement for learning basic systems first. That might change someday, but currently, in order to use Docker in a production environment, you need to be a pro system administrator.

A common misconception I see is this: "I can learn Docker and then I can run my own systems with out having to learn the other stuff!" Again, that may be the case sometime in the future, but it will be months or years until that's a reality.

So what do you need to know before using Docker in production? Well, basic systems stuff. How to manage linux. How to manage networking, logs, monitoring, deployment, backups, security, etc.

If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.

If you already have a good grasp on systems administration, then your current systems should have:

    - secured least-privilege access (key based logins, firewalls, fail2ban, etc)
    - restorable secure off-site database backups
    - automated system setup (using Ansible, Puppet, etc)
    - automated deploys
    - automated provisioning
    - monitoring of all critical services
    - and more (I'm writing this on the fly...)
If you have critical holes in your infrastructure, you have no business looking at Docker (or any other new hot cool tools). It'd be like parking a Ferrari on the edge of an unstable cliff.

Docker is amazing - but it needs a firm foundation to be on.

Whenever I make this point, there are always a few engineers that are very very sad and their lips quiver and their eyes fill with tears because I'm talking about taking away their toys. This advice isn't for them, if you're an engineer that just wants to play with things, then please go ahead.

However, if you are running a business with mission-critical systems, then please please please get your own systems in order before you start trying to park Ferraris on them.

So, if you have your systems in order, then how should you approach Docker? Well, first decide if the added complexity is worth the benefits of Docker. You are adding another layer to your systems and that adds complexity. Sure, Docker takes care of some of the complexity by packaging some of it beautifully away, but you still have to manage it and there's a cost to that.

You can accomplish many of the benefits of Docker without the added complexity by using standardized systems, ansible, version pinning, packaged deploys, etc. Those can be simpler and might be a better option for your business.

If the benefits of Docker outrank the costs and make more sense than the simpler cheaper alternatives, then embrace it! (remember, I'm talking about Docker in production - for development environments, it's a simpler scenario)

So, now that you've chosen Docker, what's the simplest way to use it in production?

Well, first, it's important to understand that it is far simpler to manage Docker if you view it as role-based virtual machine rather than as deployable single-purpose processes. For example, build an 'app' container that is very similar to an 'app' VM you would create along with the init, cron, ssh, etc processes within it. Don't try to capture every process in its own container with a separate container for ssh, cron, app, web server, etc.

There are great theoretical arguments for having a process per container, but in practice, it's a bit of a nightmare to actually manage. Perhaps at extremely large scales that approach makes more sense, but for most systems, you'll want role-based containers (app, db, redis, etc).

If you're still not convinced on that point, read this on microservices which points out many of the management problems: http://highscalability.com/blog/2014/4/8/microservices-not-a...

You probably already have your servers set up by role, so this should be a pretty straight-forward transition. Particularly since you already have each system scripted in Ansible (or similar) right?

To run Docker in a safe robust way for a typical multi-host production environment requires very careful management of many variables:

    - secured private image repo (index)
    - orchestrating container deploys with zero downtime
    - orchestrating container deploy roll-backs
    - networking between containers on multiple hosts
    - managing container logs
    - managing container data (db, etc)
    - creating images that properly handle init, logs, etc
    - much much more...
This is not impossible and can all be done and several large companies are already using Docker in production, but it's definitely non-trivial. This will change as the ecosystem around Docker matures (Flynn, Docker container hosting, etc), but currently if you're going to attempt using Docker seriously in production, you need to be pretty skilled at systems management and orchestration.

There's a misconception that using Docker in production is nearly as simple as the trivial examples shown for sample development environments. In real-life, it's pretty complex to get it right. For a sense of what I mean, see these articles that get the closest to production reality that I've found so far, but still miss many critical elements you'd need:





(If you know of better ones, please share!)

To recap, if you want to use Docker in production:

    1. Learn systems administration
    2. Ensure your current production systems are solid
    3. Determine whether Docker's benefits justifies the cost
    4. Use role-based containers
Shameless plug: I'll be covering how to build and audit your own systems in more depth over the next couple months (as well as more Docker stuff in the future) on my blog. If you'd like to be notified of updates, sign up on my mailing list: https://devopsu.com/newsletters/devopsu.html

I forgot to mention, the Phusion guys (who Mitchell mentions in the post and who create the excellent Passenger web server) have created some great assets for Vagrant and Docker:


https://github.com/phusion/passenger-docker (built on the baseimage above)


And regarding role-based containers, Phusion's Hong Lai says:

"Wait, I thought Docker is about running a single process in a container?

Absolutely not true. Docker runs fine with multiple processes in a container. In fact, there is no technical reason why you should limit yourself to one process - it only makes things harder for you and breaks all kinds of essential system functionality, e.g. syslog.

Baseimage-docker encourages multiple processes through the use of runit."

> Docker runs fine with multiple processes in a container.

I think that, while most people realize this, it's important to highlight this fact again. Generally, the examples you see are "look, we can run this would-be-backgrounded-elsewhere daemon in the foreground in its own container!", when IMO this sets the bar a bit low for how people initially approach Docker. Containers have simplified my life significantly, and (again, IMO) their real power becomes evident when you treat them as logical units vs individual components.

That's a great post, thanks for that. I'm currently using basically Amazon * for everything and was looking into docker a little bit yesterday. For a two man team I think I will continue to stick with Amazon since my strength is not in system ops.

Just wanted to say that I wrote the 2nd article you mentioned, and I fall more into the category of "engineer that just wants to play with things" rather than "run a mission-critical business", so take my article with a grain of salt. Thanks for the good summary!

As someone just now learning how docker works, I absolutely agree.

Personally, I think there are some new-ish interesting immutability ideas that can be explored with regards to static files. It's not clear to me whether static assets (or even static sites) belong inside the container. I would be really interested in experienced folks' opinions on the immutability of the image. Where do you draw the line on what goes inside it and what's mounted in?

This is my general rule of thumb, but bear in mind it's only my approach. I'm not even going to claim that it's good. Just that it works for me...

Is it source or configuration? Then its external to the container, either mounted at runtime of the container (typically always true for source code) or injected into the image at build time through the Dockerfile. Application source is almost always mounted, things like nginx/apache configuration files are nearly always injected during the build process.

I prefer this approach because my source is decoupled from the image, and if another developer wants a dev setup free from Docker, he/she can do so.

I prefer the source I use for config files because it allows me to keep the dockerfile, the configuration files for various services, and a readme all beside one another in source control. This allows other developers to get a basic idea of the container's configuration (if they so desire), and also modify that configuration if they want to tweak/alter a setup.

I see the configuration file approach being potentially a bad idea, but with the small group I currently work with, we're all fairly comfortable making configuration changes as necessary and communicate those changes to others effectively. I don't know how well that approach would hold up at scale.

I would do the same thing with static sites and files, I think. Why? The static site isn't the part of the system, it's the product that should be executed on the system. Therefore, at least in my opinion, it should be as decoupled from that system as possible.

But, like I said, this is just my philosophy. I'm sure someone else will have an equally valid but totally opposite approach.

You don't have to choose between what's inside and what's mounted in. You can mark persistent directories (directories that should live longer than any given instance) as volumes with "dockercrun -v". Docker will arrange for them to live separately from the rest of the container filesystem. Then you can share volumes between containers with "docker run --volumes-from"

> If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.

IMO Docker is a no-ops thing in the same way Heroku is. Not because Docker is especially good at taking care of all this but because Heroku isn't especially good either. Some things are taken care of by Heroku for free that aren't taken care of by Docker and vice versa. One example is that when you set up a Heroku app the number of logs it keeps is very short. This could be a disaster. At least when Docker is set up it keeps a reasonable number of logs (with most that are on GitHub). The developer may not know how to get them but can learn it on the fly.

The other no-ops solutions often suck in some ways because to the difference between the needs of the companies that sell them and the developer. So even though Docker might have some problems I'm not convinced that a naive Docker setup is worse than a naive PaaS setup.

Either way the developer who doesn't get the basics right (not your exhaustive list) is likely to be embarrassed at some point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact