1) I built two Dockerfiless on my laptop (one for nginx, one for my postfix setup) tested locally, then scp'd the Dockerfiles over to the server, built images and ran them. I didn't really want to pollute the registry with my stuff. Is this reasonable? For bigger stuff, should I use a private registry? Should I be deploying images instead of Dockerfiles?
2) The nginx setup I deployed exports the static html as a VOLUME, which the run command binds to a dir in my home dir, which I simply rsync when I want to update (i.e. the deployed site is outside the container). Should I have the content inside the container really?
3) I'm still using the 'default' site in nginx (currently sufficient). It would be kind of nice to have a Dockerfile in each site I wanted to deploy to the same host. But only one can get the port. I sort of want to have a 'foo.com' repo and a 'bar.org' repo and ship them both to the server as docker containers. Don't really see how to make that work.
What I think I want is:
- a repo has a Dockerfile and represents a service
- I can push these things around (git clone, scp a tgz, whatever) and have the containers "just run"
Not sure how to make that fit with "someone has to own port 80"
3) As far as I understand your problem is that both containers would be running their own nginx, and would have to take port 80 for example. If this is what you mean, you could just EXPOSE port 80 from within the container, and it will automatically be mapped to a random port like 43152. Both containers would be mapped to different random ports (for example 43152 and 43153). You could then install Hipache and route different domain names/sites to different containers, essentially having Hipache proxy in front your Docker containers setup.
EDIT: There is also a project called Shipyard, which is Docker management... what I described above is called "Applications" inside Shipyard.
2) Again, this is a tradeoff. If the context is sufficiently large or changing, it might be better to rsync the data from an external source every time the image is built (or even when it first starts if the data is EXTREMELY large). On the flip side, having the data inside the image means that you do not need to worry about networking and security issues around your data source. One point to note: The cache that Docker uses when building Dockerfiles is sensitive to file changes in the build package; if your files are changing a lot, make sure they are either on an external volume OR placed late inside the Dockerfile to prevent your earlier images from being rebuilt all the time.
3) HAProxy might be a solution for this; you could pipe different requests to different ports based on the incoming DNS name. We are also working on an (WARNING: experimental) project called gantry(d)  that makes handling container-based-components a whole lot easier.
Then you can reverse-proxy requests to the right location/port from your Nginx webserver.
This could be provisioned pretty easily, but not from within a Dockerfile. I use a post-receive hook on a remote Git repository, for one of my own websites.
1) over SSH, or
2) by Vagrant when provisioning a VM.
Docker, on the other hand, provisions it's containers from a rather simplistic Dockerfile, which is just a list of commands. The current solution to provision a container through ansible is rather messy, and shows that Docker's configuration doesn't display the same separation-of-responsibilities as Vagrant's does.
Luckily, this lets me use Docker as another provider through the Vagrant API. Woooo!
I still think Ansible inside Docker is feasible, by
* using it to generate the initial base image
* receiving some sort of signal inside the container to update its playbook and run.
So when you want to update the container, you're not tearing it down but instead telling the container to perform some sort of "soft reset".
What do you gain by using ansible inside of your Dockerfile? I find ansible pretty useful to set up a bunch of Docker images on a server, but I haven't found it very useful to actually build the images.
We already have provisioning tools which attempt to solve this problem, so I'd much rather write one definition which can be applied everywhere, whether to virtual machines, Linux containers, or hosts running on physical machines.
Maybe I'm just trying to use Docker incorrectly? My particular use-case is setting up software (CKAN) in an Ubuntu environment, but the server I have access to is Arch Linux, and the software is I/O-heavy, so I imagine that a container would probably be better than a VM.
A provisioning script is essentially just a list of idempotent commands to run on the machine. But given the way that docker works, idempotence is not required -- if you change a command, Docker rolls back to a known state and runs the command.
A provisioning script might be slightly "higher-level" than a Dockerfile or shell script, but I find the difference is minimal, that the number of lines of code required are similar. Many provisioning tools provide libraries of pre-built recipes you can utilize; Docker provides a repository of pre-built images.
A Dockerfile is not a replacement for your favorite build script: it's a reliable foundation for defining, unambiguously, in which context to run your script. The Dockerfile's defining feature is that it has no implicit dependency: it only needs a working Docker install. Unlike your favorite build script, which may require "python" (but which version exactly?), ssh (but which build exactly?), gcc ( but...), openssl ( but...) and so on.
I'm curious why you feel Docker doesn't offer separation of respinsibilities? In my experience the opposite is true, and a recurring theme in why people switch from Vagrant to Docker.
One of the great things about docker is that once you've played about for an hour or so you've already picked up most of it. It's not like Chef or Puppet – configuring environments using VirtualBox and a VM is really simple. I wonder how fast this will make things.
Curious, how is the default "proxy" vm on macs sized?
I've recently spent a couple of weeks doing a deep dive into Docker, so I'll share some insights from what I've learned.
First, it's important to understand that Docker is an advanced optimization. Yes, it's extremely cool, but it is not a replacement for learning basic systems first. That might change someday, but currently, in order to use Docker in a production environment, you need to be a pro system administrator.
A common misconception I see is this: "I can learn Docker and then I can run my own systems with out having to learn the other stuff!" Again, that may be the case sometime in the future, but it will be months or years until that's a reality.
So what do you need to know before using Docker in production? Well, basic systems stuff. How to manage linux. How to manage networking, logs, monitoring, deployment, backups, security, etc.
If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.
If you already have a good grasp on systems administration, then your current systems should have:
- secured least-privilege access (key based logins, firewalls, fail2ban, etc)
- restorable secure off-site database backups
- automated system setup (using Ansible, Puppet, etc)
- automated deploys
- automated provisioning
- monitoring of all critical services
- and more (I'm writing this on the fly...)
Docker is amazing - but it needs a firm foundation to be on.
Whenever I make this point, there are always a few engineers that are very very sad and their lips quiver and their eyes fill with tears because I'm talking about taking away their toys. This advice isn't for them, if you're an engineer that just wants to play with things, then please go ahead.
However, if you are running a business with mission-critical systems, then please please please get your own systems in order before you start trying to park Ferraris on them.
So, if you have your systems in order, then how should you approach Docker? Well, first decide if the added complexity is worth the benefits of Docker. You are adding another layer to your systems and that adds complexity. Sure, Docker takes care of some of the complexity by packaging some of it beautifully away, but you still have to manage it and there's a cost to that.
You can accomplish many of the benefits of Docker without the added complexity by using standardized systems, ansible, version pinning, packaged deploys, etc. Those can be simpler and might be a better option for your business.
If the benefits of Docker outrank the costs and make more sense than the simpler cheaper alternatives, then embrace it! (remember, I'm talking about Docker in production - for development environments, it's a simpler scenario)
So, now that you've chosen Docker, what's the simplest way to use it in production?
Well, first, it's important to understand that it is far simpler to manage Docker if you view it as role-based virtual machine rather than as deployable single-purpose processes. For example, build an 'app' container that is very similar to an 'app' VM you would create along with the init, cron, ssh, etc processes within it. Don't try to capture every process in its own container with a separate container for ssh, cron, app, web server, etc.
There are great theoretical arguments for having a process per container, but in practice, it's a bit of a nightmare to actually manage. Perhaps at extremely large scales that approach makes more sense, but for most systems, you'll want role-based containers (app, db, redis, etc).
If you're still not convinced on that point, read this on microservices which points out many of the management problems: http://highscalability.com/blog/2014/4/8/microservices-not-a...
You probably already have your servers set up by role, so this should be a pretty straight-forward transition. Particularly since you already have each system scripted in Ansible (or similar) right?
To run Docker in a safe robust way for a typical multi-host production environment requires very careful management of many variables:
- secured private image repo (index)
- orchestrating container deploys with zero downtime
- orchestrating container deploy roll-backs
- networking between containers on multiple hosts
- managing container logs
- managing container data (db, etc)
- creating images that properly handle init, logs, etc
- much much more...
There's a misconception that using Docker in production is nearly as simple as the trivial examples shown for sample development environments. In real-life, it's pretty complex to get it right. For a sense of what I mean, see these articles that get the closest to production reality that I've found so far, but still miss many critical elements you'd need:
(If you know of better ones, please share!)
To recap, if you want to use Docker in production:
1. Learn systems administration
2. Ensure your current production systems are solid
3. Determine whether Docker's benefits justifies the cost
4. Use role-based containers
https://github.com/phusion/passenger-docker (built on the baseimage above)
And regarding role-based containers, Phusion's Hong Lai says:
"Wait, I thought Docker is about running a single process in a container?
Absolutely not true. Docker runs fine with multiple processes in a container. In fact, there is no technical reason why you should limit yourself to one process - it only makes things harder for you and breaks all kinds of essential system functionality, e.g. syslog.
Baseimage-docker encourages multiple processes through the use of runit."
I think that, while most people realize this, it's important to highlight this fact again. Generally, the examples you see are "look, we can run this would-be-backgrounded-elsewhere daemon in the foreground in its own container!", when IMO this sets the bar a bit low for how people initially approach Docker. Containers have simplified my life significantly, and (again, IMO) their real power becomes evident when you treat them as logical units vs individual components.
Personally, I think there are some new-ish interesting immutability ideas that can be explored with regards to static files. It's not clear to me whether static assets (or even static sites) belong inside the container. I would be really interested in experienced folks' opinions on the immutability of the image. Where do you draw the line on what goes inside it and what's mounted in?
Is it source or configuration? Then its external to the container, either mounted at runtime of the container (typically always true for source code) or injected into the image at build time through the Dockerfile. Application source is almost always mounted, things like nginx/apache configuration files are nearly always injected during the build process.
I prefer this approach because my source is decoupled from the image, and if another developer wants a dev setup free from Docker, he/she can do so.
I prefer the source I use for config files because it allows me to keep the dockerfile, the configuration files for various services, and a readme all beside one another in source control. This allows other developers to get a basic idea of the container's configuration (if they so desire), and also modify that configuration if they want to tweak/alter a setup.
I see the configuration file approach being potentially a bad idea, but with the small group I currently work with, we're all fairly comfortable making configuration changes as necessary and communicate those changes to others effectively. I don't know how well that approach would hold up at scale.
I would do the same thing with static sites and files, I think. Why? The static site isn't the part of the system, it's the product that should be executed on the system. Therefore, at least in my opinion, it should be as decoupled from that system as possible.
But, like I said, this is just my philosophy. I'm sure someone else will have an equally valid but totally opposite approach.
IMO Docker is a no-ops thing in the same way Heroku is. Not because Docker is especially good at taking care of all this but because Heroku isn't especially good either. Some things are taken care of by Heroku for free that aren't taken care of by Docker and vice versa. One example is that when you set up a Heroku app the number of logs it keeps is very short. This could be a disaster. At least when Docker is set up it keeps a reasonable number of logs (with most that are on GitHub). The developer may not know how to get them but can learn it on the fly.
The other no-ops solutions often suck in some ways because to the difference between the needs of the companies that sell them and the developer. So even though Docker might have some problems I'm not convinced that a naive Docker setup is worse than a naive PaaS setup.
Either way the developer who doesn't get the basics right (not your exhaustive list) is likely to be embarrassed at some point.
For instance, the idea of ssh provisioner does not jive with Docker. The better approach is run the container with shared volume, and run another bash container to access the shared volume. If you are just starting to look at Docker, I would recommend to use Vagrant to provision the base image, and leave the heavy lifting to Docker itself.
To be clear: see the first example Vagrantfile that is in the blog post. Then read down further and see `docker-run`. You can use that to launch another container to get a bash prompt. This is _exactly_ the workflow you describe.
We built exactly for this. :)
We also support SSH-based containers, but you can see that it is explicitly opt-in (you must set "has_ssh" to "true"), also showing that it really isn't the normal way things are done.
Vagrant is about flexibility, and we surface both use-cases.
I didn't want to be negative -- it'd be great to be able to have an environment set up in 10 seconds with vagrantfiles -- so until then I'm trying to see the positives.
Given that it transparently wraps boot2docker and handles proxying SSH connections, I had hoped that it would also transparently manage the port forwarding to the host VM, as that's a much more common use case with Docker than SSH access.
Nonetheless, it looks nice!
I imagine that docker containers have to be provisioned in some way, and if you're provisioning with `apt-get` then it's not going to work when deploying to a redhat OS.
Essentially, I understand Docker containers to be lightweight virtual machines rather than applications that can be deployed to anything running the docker service. Am I on the right track?
`FROM busybox` or `FROM ubuntu:latest`
Containers contain processes, and Docker base images allow you to use yum/dpkg/apt in various containers, it doesn't matter what host OS you use, as long as you run a supported Linux Kernel.
1. Set up a VM using CentOS to mimic my deployment environment
2. Distribute that to several people, including some running Windows and OSX, and have it automatically set up, with all parties reliably in exactly the same environment.
I've personally used it for creating and iterating quickly on Puppet scripts. I've seen it recommended for devops with Chef also. See also: http://www.packer.io/ (for making your own gold master vm images; written in Go by the same guys)
However if you don't use a Dockerfile, you won't be able to use the Trusted Build feature, and other users won't be ae to verify which source the image was built from. So your image will remain a second-class cotizen.
It's nice but it still doesn't fix the issue with people who have medium end machines where you develop in a VM full time and want to be able to run a VM within a VM.
For example with fairly old but still reasonable hardware you cannot run virtualbox inside of an existing virtualbox instance.
If you have a windows box and develop full time in a linux VM you cannot run vagrant inside of that linux VM because unless you have a modern CPU it lacks the instruction sets required to run virtualization within virtualization.
Now using docker instead of a VM would work but docker only supports 64 bit operating systems so anyone stuck with a 32bit host OS still can't use vagrant and has to resort to using raw linux containers without docker which is really cumbersome.
If you have a great dev box with a ton of ram then using docker or a VM is irrelevant. You can just set the whole thing up in a ram drive and things are nearly instant, assuming you're using virtualization for short lived tests on some provisioned server that matches your prod env.
With an older machine and a 32bit OS (ie. 2 gigs of ram) you can't do anything except run 2x 512mb VMs side by side within a host or a single ~1GB VM on its own so it's a real let down to see they decided to use docker instead of just plain LXCs which do work with a 32bit OS.