I've recently spent a couple of weeks doing a deep dive into Docker, so I'll share some insights from what I've learned.
First, it's important to understand that Docker is an advanced optimization. Yes, it's extremely cool, but it is not a replacement for learning basic systems first. That might change someday, but currently, in order to use Docker in a production environment, you need to be a pro system administrator.
A common misconception I see is this: "I can learn Docker and then I can run my own systems with out having to learn the other stuff!" Again, that may be the case sometime in the future, but it will be months or years until that's a reality.
So what do you need to know before using Docker in production? Well, basic systems stuff. How to manage linux. How to manage networking, logs, monitoring, deployment, backups, security, etc.
If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.
If you already have a good grasp on systems administration, then your current systems should have:
- secured least-privilege access (key based logins, firewalls, fail2ban, etc)
- restorable secure off-site database backups
- automated system setup (using Ansible, Puppet, etc)
- automated deploys
- automated provisioning
- monitoring of all critical services
- and more (I'm writing this on the fly...)
Docker is amazing - but it needs a firm foundation to be on.
Whenever I make this point, there are always a few engineers that are very very sad and their lips quiver and their eyes fill with tears because I'm talking about taking away their toys. This advice isn't for them, if you're an engineer that just wants to play with things, then please go ahead.
However, if you are running a business with mission-critical systems, then please please please get your own systems in order before you start trying to park Ferraris on them.
So, if you have your systems in order, then how should you approach Docker? Well, first decide if the added complexity is worth the benefits of Docker. You are adding another layer to your systems and that adds complexity. Sure, Docker takes care of some of the complexity by packaging some of it beautifully away, but you still have to manage it and there's a cost to that.
You can accomplish many of the benefits of Docker without the added complexity by using standardized systems, ansible, version pinning, packaged deploys, etc. Those can be simpler and might be a better option for your business.
If the benefits of Docker outrank the costs and make more sense than the simpler cheaper alternatives, then embrace it! (remember, I'm talking about Docker in production - for development environments, it's a simpler scenario)
So, now that you've chosen Docker, what's the simplest way to use it in production?
Well, first, it's important to understand that it is far simpler to manage Docker if you view it as role-based virtual machine rather than as deployable single-purpose processes. For example, build an 'app' container that is very similar to an 'app' VM you would create along with the init, cron, ssh, etc processes within it. Don't try to capture every process in its own container with a separate container for ssh, cron, app, web server, etc.
There are great theoretical arguments for having a process per container, but in practice, it's a bit of a nightmare to actually manage. Perhaps at extremely large scales that approach makes more sense, but for most systems, you'll want role-based containers (app, db, redis, etc).
If you're still not convinced on that point, read this on microservices which points out many of the management problems: http://highscalability.com/blog/2014/4/8/microservices-not-a...
You probably already have your servers set up by role, so this should be a pretty straight-forward transition. Particularly since you already have each system scripted in Ansible (or similar) right?
To run Docker in a safe robust way for a typical multi-host production environment requires very careful management of many variables:
- secured private image repo (index)
- orchestrating container deploys with zero downtime
- orchestrating container deploy roll-backs
- networking between containers on multiple hosts
- managing container logs
- managing container data (db, etc)
- creating images that properly handle init, logs, etc
- much much more...
There's a misconception that using Docker in production is nearly as simple as the trivial examples shown for sample development environments. In real-life, it's pretty complex to get it right. For a sense of what I mean, see these articles that get the closest to production reality that I've found so far, but still miss many critical elements you'd need:
(If you know of better ones, please share!)
To recap, if you want to use Docker in production:
1. Learn systems administration
2. Ensure your current production systems are solid
3. Determine whether Docker's benefits justifies the cost
4. Use role-based containers
https://github.com/phusion/passenger-docker (built on the baseimage above)
And regarding role-based containers, Phusion's Hong Lai says:
"Wait, I thought Docker is about running a single process in a container?
Absolutely not true. Docker runs fine with multiple processes in a container. In fact, there is no technical reason why you should limit yourself to one process - it only makes things harder for you and breaks all kinds of essential system functionality, e.g. syslog.
Baseimage-docker encourages multiple processes through the use of runit."
I think that, while most people realize this, it's important to highlight this fact again. Generally, the examples you see are "look, we can run this would-be-backgrounded-elsewhere daemon in the foreground in its own container!", when IMO this sets the bar a bit low for how people initially approach Docker. Containers have simplified my life significantly, and (again, IMO) their real power becomes evident when you treat them as logical units vs individual components.
Personally, I think there are some new-ish interesting immutability ideas that can be explored with regards to static files. It's not clear to me whether static assets (or even static sites) belong inside the container. I would be really interested in experienced folks' opinions on the immutability of the image. Where do you draw the line on what goes inside it and what's mounted in?
Is it source or configuration? Then its external to the container, either mounted at runtime of the container (typically always true for source code) or injected into the image at build time through the Dockerfile. Application source is almost always mounted, things like nginx/apache configuration files are nearly always injected during the build process.
I prefer this approach because my source is decoupled from the image, and if another developer wants a dev setup free from Docker, he/she can do so.
I prefer the source I use for config files because it allows me to keep the dockerfile, the configuration files for various services, and a readme all beside one another in source control. This allows other developers to get a basic idea of the container's configuration (if they so desire), and also modify that configuration if they want to tweak/alter a setup.
I see the configuration file approach being potentially a bad idea, but with the small group I currently work with, we're all fairly comfortable making configuration changes as necessary and communicate those changes to others effectively. I don't know how well that approach would hold up at scale.
I would do the same thing with static sites and files, I think. Why? The static site isn't the part of the system, it's the product that should be executed on the system. Therefore, at least in my opinion, it should be as decoupled from that system as possible.
But, like I said, this is just my philosophy. I'm sure someone else will have an equally valid but totally opposite approach.
IMO Docker is a no-ops thing in the same way Heroku is. Not because Docker is especially good at taking care of all this but because Heroku isn't especially good either. Some things are taken care of by Heroku for free that aren't taken care of by Docker and vice versa. One example is that when you set up a Heroku app the number of logs it keeps is very short. This could be a disaster. At least when Docker is set up it keeps a reasonable number of logs (with most that are on GitHub). The developer may not know how to get them but can learn it on the fly.
The other no-ops solutions often suck in some ways because to the difference between the needs of the companies that sell them and the developer. So even though Docker might have some problems I'm not convinced that a naive Docker setup is worse than a naive PaaS setup.
Either way the developer who doesn't get the basics right (not your exhaustive list) is likely to be embarrassed at some point.