* Sanity in our environments. We know exactly what goes into each and every environment, which are specialized based on the one-app-per-container principle. No more asking "why does software X build/execute on machine A and not machines B-C?"
* Declarative deployments. Using Docker, Core OS, and fleet[1], this is the closest solution I've found to the dream of specifying what I want running across a cluster of machines, rather than procedurally specifying the steps to deploy something (e.g. Chef, Ansible, and the lot). There's been other attempts for declarative deployments (Pallet comes to mind), but I think Docker and Fleet provide even better composability. This is my favorite gain.
* Managing Cabal dependency hell. Most of our application development is in Haskell, and we've found we prefer specifying a Docker image than working with Cabal sandboxes. This is equally a gain on other programming platforms. You can replace virtualenv for Python and rvm for Ruby with Docker containers.
* Bridging a gap with less-technical coworkers. We work with some statisticians. Smart folks, but getting them to install and configure ODBC & FreeTDS properly was a nightmare. Training them in an hour on Docker and boot2docker has saved so much frustration. Not only are they able to run software that the devs provide, but they can contribute and be (mostly) guaranteed that it'll work on our side, too.
I was skeptical about Docker for a long time, but after working with it for the greater part of the year, I've been greatly satisfied. It's not a solution to everything—I'm careful to avoid hammer syndrome—but I think it's a huge step forwards for development and operations.
Addendum: Yes, some of these gains can be equally solved with VMs, but I can run through /dozens/ of iterations of building Docker images by the time you've spun up one VM.
If you take the steps for, say, AWS, of building a new AMI for every role you have, then it's pretty much the same.
(but in my experience with building AMI's, that process is way too slow compared to Docker)
Docker becomes closer to declarative when you build static Docker images for every role and rebuild for every change and re-deploy. Even more so when your deployment is based on a tool like Fleet that declaratively specifies your cluster layout.
The point is to avoid ever having situations where you say "install package foo on all webservers". Instead you say "replace all webservers with a bit-by-bit identical copy of image x".
The benefit is that you can have already tested a container that is 100% identical, and know that the deployed containers will be 100% identical, rather than hope the commands you pass to the config tool handles every failure scenario well enough.
For configuration, we separate the application and the config files into two separate containers. The config files are provided through a shared volume to the application. This model is definitely odd. However, it's allowed us to decouple our application and our configuration and to swap out configurations. With this in mind, it's more declarative because we specify "run this application with this configuration unit" rather than "here's how you get yourself started". See the Radial project for our inspiration[1]
We've found that this approach has generalized so far. For example, setting up a Cassandra cluster is often a real PITA to configure since you need the seed IPs up front. Our configuration container manages the dance by registering and pulling the IPs from Consul (etcd would work fine too). Perhaps a bit of smoke-and-mirrors, but it's achieved being able to spin up a properly-configured Cassandra cluster without needing to manually specify who's in the cluster.
Great explanation. If possible, I'd love to see/know more about how the statistician training step was accomplished. I also work with many nontechnical folks and haven't found success getting training on docker to 'stick'.
The Docker folks get a point for bootstrapping a familiar user interface: git. Our non-dev coworkers are competent enough with git, and they felt comfortable drawing analogies between the two. They pull the image (from our private Docker registry), run the container, make some changes, build, run, repeat. Very similar to pull, check out, commit, etc in git.
The only pain I've had is the silly flags for 'docker run'. Ugh. Before I told them to make aliases, there were all sorts of complains when they forgot '-it' and '--rm'. I think '-it' should be the default, and possibly '--rm' as well, with switches to toggle them off. Oh well.
Yes, very much so. In fact, our goal is to have NixOS based containers. Right now, we're using Debian as the base image, and there's /no/ guarantee that the versions of software installed are consistent (since Docker caches based on the line in a Dockerfile, rather than what's actually installed).
With Nix, we can have version guarantees in all of our Docker images--including the cached images.
That depends on how the package is specified doesn't it? You can snapshot the full chain of versions with dpkg and explicitly specify them all. It should be too hard to wrap this into something like Gemfile.lock
Great. I have a major complaint about Nix. Their packages are built with all sorts of dependencies included. So you install mutt and you end up getting python. Or you install git and you also get subversion.
I understand all their philosophy, but they should allow for flexible runtime dependencies without the need for rebuilding packages. Perhaps with a second hash or something, to sign dependencies.
I used Docker to solve a somewhat unconventional problem for a client last week. They have a Rails application that needs to be deployed in two vastly different situations:
* a Windows server, disconnected from the internet
* about 10 laptops, intermittently connected to the internet
Docker let us build the application once and deploy it in both scenarios with much less pain than the current situation, which basically consists of script to git-pull and over-the-phone instructions when dependencies like ruby or imagemagick need to be upgraded.
We run VirtualBox with a stock Ubuntu 14.04 image with docker installed from the docker-hosted deb repo. We use the Phusion passenger Ruby image[1], which bundles almost every dependency we needed along with a useful init system so we can run things like cron inside a single container along with the application. This makes container management trivial to do with simple scripts launched by non-technical end users.
The laptops are Macbook airs and were not running a VM at all. Instead, users had a script they could double click to launch the application in a terminal window.
Now the laptops run the VirtualBox setup and always have the application running in the background. Docker adds value by letting us distribute a much smaller amount of data vs sending out an entire VM image.
For the Windows server, we used to distribute upgrades by sending out an entire VirtualBox appliance image, which was usually around 3GB. Additionally, the operator would have to manually shuffle data between the old and new images. Now, we can ship out an saved Docker image (built with `docker save`) which cuts down on the amount of data transferred, and the final VM we shipped him knows how to upgrade the Docker container and shuffle the data automatically.
- Are you/have you tried using "FROM" and layering things in multiple images to reduce what you need to keep shipping?
- Anything stopping you from using a private registry? I'm running one at it seems to work quite well (with the caveat that it's annoying to have to specify the registry all the time).
- You talk about "shuffling the data". Does that mean you're not using volumes to keep the data separate from the container? If so, any particular reason?
1) We've talked about it but it's not a blocker so we haven't done it yet. Right now we're trying to reduce the number of layers we ship, since they seem to get big for no good reason.
2) We're using a private Docker Hub account to transfer images to the laptops. They have a script that the users can invoke that shuts down the container, updates, runs some initialization tasks (`db:migrate` + some other stuff) and then brings the container back up.
3) Yep, we're using mounted shared directories in both cases. Previously the laptops of course just stored everything on the local filesystem since they were running the app directly. I'm not 100% sure what the Windows server was doing, but I believe the operator had to move data from one share to another and run initialization tasks by hand.
What value does adding a user add if you're already running a VM? By default, all that user adds is directory-access-controls. Docker provides isolation between processes on the system it is running. Executing a root exploit from inside a Docker container is not impossible, but it's also harder than "simply" being a user. Application-level security can also be improved significantly if an application requires multiple processes to run on a host. Docker can be used to restrict processes from accessing the network, etc. Nothing that couldn't be done without Docker by a sufficiently dedicated ops team, I'll admit, but Docker greatly simplifies and standardizes these mechanisms. That's especially true if you've adopted DevOps culture where developers have come to own more of the systems security.
Docker also enforces immutability. With a VM there's always the temptation to manually fix any issue that arises, and if you don't have some bulletproof way to document that then you'll have issues when you go to recreate the environment on a new machine. Docker kind of forces you to solve the original problem via the dockerfile, which is what will spawn images for any future installs anyway.
Could you explain this more? I think my confusions stems from where the config comes from. Regardless of whether I have a bit-for-bit image or a vm created from a bunch of script commands, the immutability disappears when I apply the config.
So for my example If I have a role that specifies one instance of a a galera server. I have to config each one with the other servers in the pool. And each config will be dependent on the other server's config. So is Docker the first part (get the galera server instance running) and then there is some 2nd part that does the config so the instances in the cluster work together?
To your first question: for me it's a on-paper vs reality difference. On paper you're exactly right re "vm created from a bunch of script commands" will end in the same state.
The reality is that once the VM is built there is the temptation/opportunity to make ad-hoc changes for any variety of reasons. Those ad-hoc changes sometimes make it back into the official build process, but sometimes they get forgotten in the heat of the moment. With docker you can't do this...to make the necessary change you are also changing the official build process. No opportunity for the two to deviate.
Second question: Yes, that is my understanding (though not a use case I have atm).
Very good call out. For most use cases this actually turns out to be OK, but to reduce the surface area of this being a potential issue you could:
- Vendor dependencies (works to replace stuff like `go get` but probably not for apt packages etc.)
- Create a base image which handles the stuff you need to reach out to the network for (`apt-get install openjdk-6-jre` etc.) and is infrequently updated. Then the Dockerfile for the final application is `FROM me/myjava` and just does a few things that don't use the network like `ADD . /code`.
- Use `docker commit` instead of Dockerfiles for those steps (pretty gross IMO)
- Use CM in your docker build to install a very specific version of a package if you need (I'm not 100% sure this exists but it seems probable). This isn't perfect but tightens things up if you're worried about upstream breaking apt packages etc.
One of the goals of a new image format for Docker (this is 2.0 stuff) is to make the layers content-addressable by ID. That way, you will have a reasonable assurance that two Docker images constructed with the same Dockerfile in two different places will have the same IDs if they result in the exact same layers, and you will be able to see the point of divergence otherwise.
"- Create a base image which handles the stuff you need to reach out to the network for (`apt-get install openjdk-6-jre` etc.) and is infrequently updated. Then the Dockerfile for the final application is `FROM me/myjava` and just does a few things that don't use the network like `ADD . /code`.
- Use `docker commit` instead of Dockerfiles for those steps (pretty gross IMO)
- Use CM in your docker build to install a very specific version of a package if you need (I'm not 100% sure this exists but it seems probable). This isn't perfect but tightens things up if you're worried about upstream breaking apt packages etc."
These are some of the goals of ShutIt.
We had complex development needs due to technical debt, and dockerfiles simply didn't cut it, and I got frustrated with the indirection of chef/puppet/ansible. I also needed the several hundred devs in my company to get productive quickly, so transferring all the little bash scripts and storing it in docker was the path of least resistance.
It's out of date and heavily edited, but I talk about this here:
It simplifies the building and deployment process for the application. You can install the new version of the application in a Docker Container and just push the changes of Container to all your clients. You can do the same thing for changes in the infrastructure which is needed to run your application. Just deploy the changes.
At Shopify, we have moved to Docker for deploying our main product. Primary advantages for us are faster deploys, because we can do part of the old deploy process as part of the container build. Secondly: easier scalability, because we can add additional containers to have more app servers or job workers. More info at http://www.shopify.com/technology/15563928-building-an-inter...
wvanbergen, forgive me for veering off topic. I'm planning on applying to Shopify (Toronto) as a software developer before the end of the weekend. Any advice you're willing to share?
MattyMc: sure. Primarily: be yourself, show what you are passionate about, and be willing to adopt change. When applying: cover letter > resumé, and try to stand out because we get many applications. Email me at willem at shopify dot com if you have any specific questions.
We've been using Docker for YippieMove (www.yippiemove.com) for a few months now, and it works great.
Getting your hand around the Docker philosophy is the biggest hurdle IMHO, but once you're there it is a delight to work with. The tl;dr is to not think of Docker as VMs, but rather fancy `chroots`.
In any case, to answer your question, for us it significantly decreased deployment time and complexity. We used to run our VMs and provision them with Puppet (it's a Django/Python app), however it took a fair amount of time to provision a new box. More so, there were frequently issues with dependencies (such as `pip install` failing).
With Docker, we can more or less just issue a `docker pull my/image` and be up and running (plus some basic provisioning of course that we use Ansible for).
how do you do restarts when you update the app ? I assume you have to take the app server out of the server pool (remove from load balancer or nginx) and shut it down, then docker pull your image.
I'm doing deploys with ansible and its just too slow
Actually, we have Nginx configured with Health Check (http://nginx.org/en/docs/http/load_balancing.html#nginx_load...). Hence, it will automatically take a given appserver out of the pool when it stops responding. Once the node is back again, Nginx will automatically bring it back into rotation.
Also, we actually use a volume/bind-mount to store the source code on the host machine (mounted read-only). That way we can roll out changes with `rsync` and just restart the container if needed.
The only time we need to pull a new update is if the dependencies/configuration of the actual container change.
How do you deal with connections that are in progress to the app server? If you just take it down, you're potentially throwing away active connections.
Yes, that's absolutely true and something we're aware of. It would of course be possible to solve, but would increase the complexity by a fair amount.
It is also worth mentioning that it is a more back-end heavy service, than front-end heavy. Since each email migration runs isolated in its own Docker container, a given customer can generate 100s of Docker containers.
Hence, given the relatively low volume of users on the web app, and the fast restart time, the chance of throwing away an active connection is relatively low.
ok thanks. I have several app servers and I take them out of nginx server list, stop it gracefully, git pull and configure (slow, I want to get rid of this step), put it back in nginx servers, move onto the next one.
tedious, although my whole deploy-to-all-servers is a single command.
Yeah, that sounds pretty tedious, but I guess it could still be automated (but somewhat tricky).
Once CoreOS becomes more stable, we're looking to move to it. The idea is then to use `etcd` to feed the load balancer (probably Nginx) with the appserver pool. That way you can easily add new servers and decommission old ones.
We automated this pretty trivially at my last job using Fabric[0]. All we had to do was cycle through a list of servers and apply the remove from LB, update, add to LB steps. Removing from the LB should simply block until connections drain (or some reasonable timeout). It makes deploys take longer for sure, but avoiding the inevitable killing of user connections was worth it.
I will add that if you're using Docker (which we weren't) it might be easier to deploy a new set of Docker containers with updated code and just throw away the old ones.
I have learnt enough about Docker to know it's not something which solves any problem I have, but finding out concrete facts about what others are actually doing with it was one of the hardest parts of the learning process.
The official Use Cases page is so heavily laden with meaningless buzzwords and so thin on actual detail that I still feel dirty just from reading it. https://docker.com/resources/usecases/
Some of the answers are interesting. I agree with you that most of the companies they mention on their page are so big that it makes you think that Docker is a thing for big companies, and not relevant to what I do.
For my part, I use Docker for my personal home server to keep dependencies documented and keep everything cleanly isolated.
Currently I have 8 images running, and more sitting around that I spin up as needed, running everything from a minecraft server, to AROS (an AmigaOS re-implementation) , haproxy, a dev-instance of my blog, a personal wiki, my dev environment (that I use to run screen and ssh into for when I want to edit stuff - it bind mounds a shared home directory, but means I don't mess up to host server with all kinds of packages I decide to install).
It's gotten to the point where the moment I find myself typing "apt-get install" I pause to consider whether I should just put it in a Dockerfile instead. After all, if this is a package I'll need once, for a single program, the Dockerfile does not take much longer to write, and it saves on the clutter.
I have a draft blog post about how I'm using it I keep meaning to post - probably after the weekend.
We're moving all of production in EC2 from an old CentOS 5 image managed by capistrano to CoreOS, with fleet deploying images built by the docker.io build service and private repo. I love it.
Every week, we rebuild our base image starting with the latest debian:stable image, apply updates, and then our apps are built off of the latest base image. So distro security updates are automatically included with our next deploy.
We had been deploying multiple apps to the same EC2 instances. Having each app's dependencies be separate from other apps has made upgrading them easier already.
This also means all containers are ephemeral and are guaranteed to be exactly the same, which is a pretty big change from our use of capistrano in practice. I'm hoping this saves us a lot of debugging hassle.
Instead of using ELBs internally, I'm using registrator to register the dynamic ports of all of my running services across the cluster in etcd, with confd creating a new template for NginX and updating it within 5 seconds if a service comes up or drops out. Apps only need to talk to their local NginX (running everywhere) to find a load-balanced pool of whichever service they are looking for. NginX is better than ELB at logging and retrying failed requests, to provide a better user-experience during things like deploys.
Some of these things could be solved by spinning up more EC2 instances. However that usually takes minutes, where docker containers take seconds, which changes the experience dramatically.
And I'm actually reducing my spend by being able to consolidate more. I can say things like "I want one instance of this unit running somewhere in the cluster" rather than having a standalone EC2 instance for it.
The biggest problem I have overall is pushing new code. When you push new code to git, do you then stop a container and restart it to get a new container working? (Assuming you do something like git clone in the Dockerfile)
I grant the docker.io build and private repository service access to my github repo, drop a Dockerfile at the root of my git repo, and the build server does the checkout outside of the Dockerfile and then executes the Dockerfile. I then use a github webhook to trigger a build when there's a new checkin to the master or qa branches. If the dockerfile completes successfully (based on exit status codes), it then spits out new docker images tagged with either "master" or "qa".
My fleet unit does a docker pull before it starts the container. So I just stop and start an individual unit to get it to run a new version.
Though fleet has a concept of global units (that run on all available machines), there's no way to do a rolling restart with them yet. Instead, I use unit templates to launch multiple instances of the same unit, and then stop and start each instance individually, and wait for it to respond before continuing to the next one. I intend to catch a webhook from the build server and do this automatically, but haven't written this yet.
Docker, CoreOS, fleet, and etcd have completely changed how I build projects. It's made me much more productive.
I'm working on Strata, which is a building management & commissioning system for property owners of high-rise smart buildings. It's currently deployed in a single building in downtown Toronto, and it's pulling in data from thousands of devices, and presenting it in real-time via an API and a dashboard.
So in this building, I have a massive server. 2 CPUs, 10 cores each, 128GB of RAM, the works. It came with VMWare ESXi.
I have 10 instances of CoreOS running, each identical, but with multiple NICs provisioned for each so that they can communicate with the building subsystems.
I built every "app" in its own Docker container. That means PostgreSQL, Redis, RabbitMQ, my Django app, my websocket server, even Nginx, all run in their own containers. They advertise themselves into etcd, and any dependencies are pulled from etcd. That means that the Django app gets the addresses for the PostgreSQL and Redis servers from etcd, and connects that way. If these values change, each container restarts itself as needed.
I also have a number of workers to crawl the network and pull in data. Deployment is just a matter of running 'fleetctl start overlord@{1..9}.service', and it's deployed across every machine in my cluster.
With this setup, adding machines, or adding containers is straightforward and flexible.
Furthermore, for development, I run the same Docker containers locally via Vagrant, building and pushing as needed. And when I applied for YC, I spun up 3 CoreOS instances on DigitalOcean and ran the fleet files there.
As I said, I've been able to streamline development and make it super agile with Docker & CoreOS. Oh, and I'm the only one working on this. I figure if I can do it on my own, imagine what a team of engineers can do.
> They advertise themselves into etcd, and any dependencies are pulled from etcd. That means that the Django app gets the addresses for the PostgreSQL and Redis servers from etcd, and connects that way. If these values change, each container restarts itself as needed.
Take Postgres for example. I use the official Docker image, and run it via a fleet .service file, having it store its data in a Docker data-only container, also launched from a fleet unit file. Finally, there's a third unit file that runs a slightly customized version of the docker-register image from CoreOS's polvi. This Dockerized app polls the Docker socket every 10s or so to get the IP and port information for the Docker container it's monitoring, in this case Postgres. It PUTs this value into etcd.
Afterwards, when my Django containers start, they pull their configuration from etcd using python-etcd. It will get the IPs and ports for the services, including Postgres, from etcd, and configure itself using that. Finally, it will keep these values in a dictionary. If these values ever differ from what's currently in etcd, then my Django app will send a signal to cause uwsgi to gracefully restart the Django app, so that it can reload itself with the new configuration.
For other Dockerized apps like nginx, I use kelseyhightower's confd-watch to monitor the etcd keys I'm interested in. confd-watch is great because it will use golang's text/template package to allow me to generate configuration files, like nginx.conf, with the values from etcd. So as these values get updated, so will nginx as confd-watch will force a graceful reload.
"If these values ever differ from what's currently in etcd, then my Django app will send a signal to cause uwsgi to gracefully restart the Django app, so that it can reload itself with the new configuration"
You could probably avoid the restart/reload by using skydns (https://github.com/skynetservices/skydns), it is a local dns service built on top of values in etcd. You can configure your django app to access the other service containers by dns name and it will be translated to the correct ip all the time.
I'd love to get in touch and have a chat with you on what you're doing, I'm working on similar stuff but don't see much HVAC/energy around HN. Could you drop me a line (contact details in my profile)? Thanks!
I've said it many times, but I inherited the hardware. My current plan is to virtualize a couple of other servers, and then create a VMWare cluster using the extra machines to allow a sort of "auto-failover" setup.
We're using Docker to solve these kinds of problems:
- Running Jenkins slaves for some acceptance/integration tests that runs in the browser, previously we had to configure multiple chromedrivers to spin up on different ports or be stuck running 1 test per machine. Now we have 6 machines (down from 9) which runs 6 slaves each, so we can parallelize our tests as 36 tests run concurrently. That has significantly improved our deployment time (as these tests are necessary to do a deployment) while reducing costs.
- Migrating our infrastructure (around 70 instances) to AWS VPC, we had our machines running on EC2-Classic. While I had previously done some work automating most applications using Chef we have really managed to fully automate our machines with Docker, it was way easier than solving cookbook dependency and customization issues. We have a couple dozen Dockerfiles that fully explain how our systems run and what are the dependencies for each application.
And that is only in the last month and a half that I began using Docker, I was pretty skeptical before as it was touted almost as a silver bullet. And it comes close to that in many scenarios.
I honestly find it really depressing to see all these folks taking code and applications that would otherwise be entirely portable, and rebuilding their entire deployment and development environment around a hard dependency on Linux.
If Docker becomes sufficiently popular, it's going to put HUGE nails in the coffin of portability and the vibrancy of the UNIX ecosystem.
It is not a code portability issue - it is a configuration and environment and dependency volatility issue. Everyone can use cross compilable C and still have deploy and lifecycle headaches.
- running graphite (can't say it was less pain launching it, since Dockerfile was outdated a bit, and I also had to additionally figure out persistency issues, but overall I'm happy it's all virtualized and not living on server itself)
- building our haskell projects for specific feature (your run a container per feature, this way you omit pain switching between features when you need to build one)
- running tests (per each feature we start container with whole infrastructure inside (all databases, projects etc.))
- running staging, also container per feature
Very useful stuff, comparing to alternatives, I should say. And quite easy to work with after you play a bit with docker's api.
In VMware, we use Docker for automated build and test of several of open source efforts as well as for a production IT business management mobile cost analysis SaaS offering we provide.
Docker has demonstrated value in ensuring the app remains consistent across environments.
I'm very interested in feedback from any of you on what VMware can do to make Docker CI and Docker production use easier in general and on vSphere, fusion and vCloud Air.
We are engaged with Docker and the open source projects and would love to hear your feedback.
Please email me at carterm at vmware dot com with any feedback, comments or ideas.
Thanks,
Mark
So you're using containers in only staging? I can see why it would be appealing, but have you run into complications with staging being structured so differently from production?
I had no problems with staging being different than production (which is kind of multi-machine multi-environment rather than "everything in one docker"), because it all uses same fabric scripts and supervisor configs for deploy and maintainance.
We use the Google Compute Engine container optimised VMs, which make deployment a breeze. Most of our docker containers are static, apart from our application containers (Node.js) that are automatically built from github commits. Declaring the processes that should run on a node via a manifest makes things really easy; servers hold no state, so they can be replaced fresh with every new deployment and it's impossible to end up with manual configuration, which means that there is never a risk of losing some critical server and not being able to replicate the environment.
Docker revolutionized our server deployment. My company has 50 nodejs services deployed on VPS providers around the world. It allows us to completely automate the deployment of these servers regardless of the provider's APIs. When we roll out updates, we never patch a running box, we just bring the new container up and remove the old one. Super easy, super reliable, and best of all, totally scriptable.
We also have a pretty sophisticated testing environment using Docker which creates a simulation of our server in on any developer's laptop. It's really remarkable actually.
We have a stateless worker process that previously required a separate EC2 instance for each worker. Even using small instances, this meant a pretty cumbersome fleet in AWS with lags for spin up and excess costs due to workers that were brought online and finished their jobs well before the use hour was complete.
Using Docker, we can have several of these workers on a single box with near instantaneous spin up. This means that we are able to use fewer, larger instances instead of several small ones. In turn, this makes the fleet easier to manage, quicker to scale and less costly because we aren't over paying for large portions of AWS hours that go under utilized.
I am not entirely sure that Docker was a necessity in building this as I sort of inherited the technology. I originally was pushing for a switch to pure LXC, which would have fit the build system that was in place better. However, given the fervour over Docker there is a lot of information out on the web and so changing the build and deployment systems has been relatively easy and quick. I bring this up because I think some tasks are better suited to pure LXC, but people seem to be defaulting to Docker due to its popularity.
We actually started using Docker a few months ago and it really sped up our deployment process. It's not only incredibly faster than using virtual machines for testing; it allows you to host multiple apps on one server and to have all versions of your app ready to download and run. More info at http://www.syncano.com/reasons-use-docker/
We use docker extensively to ship a large and complex legacy platform that was designed to run as a hosted service, but was transformed into an on-premise product.
The system is composed of several components originally designed to run on separate VMs for security reasons. Luckily, we were able to translate VM <-> docker container, so now each component has its own Dockerfile + shell script for booting up and providing runtime configuration.
Docker helps us solve several problems:
* A canonical build. It provides a way to configure the build system, fetch all dependencies and execute a reproducible build on different machines/environments. It's also used as documentation when engineers have no clue, where settings/parameters come from.
* A super fast build pipeline and release repository. We use maven -> nexus, docker -> docker-registry, vagrant -> local export for a completely automated way to bootstrap an ovf-file that can be deployed at customer site. Releases for the old platform were not automated and took the previous teams weeks (!) on a single platform.
* A way to restrict resources. Given some security constraints from the product, lxc + docker helps us restrict memory and networking.
* Shipping updates. We deliver automated updates through a hosted docker registry for customers who open up the appliance to the internet. Previous teams were not able deliver updates in time for a single hosted platform. We can now ship new releases and have them deployed at several customers data-centers in a matter of hours.
We have been using docker in production for almost a year now and despite headaches in the beginning it's been absolutely worth it.
One thing I'd like to point out are OS upgrades, security patches or generally package updates. With docker I just rebuild a new image using the latest ubuntu image (they are updated very frequently), deploy the app, test and then push the new image to production. Upgrading the host OS also is much less of a problem because far fewer packages are installed (i.e. it's just docker and the base install).
We have many images, and we build them in a CI setup using jenkins. I used to run jenkins inside docker and build images within that docker, but this turned out to be a problem. (Mainly more and more resources were used up until the disk was full.) Now it's just jenkins installed on the host building images, starting them and run integration tests.
This still doesn't say what you are doing. You update the base image, which is presumably something every Docker user does, then you "deploy the app".
What are you deploying? How much heavy lifting is your dockerfile doing? How much of the environment do you have to setup manually? How do you supply the app its static and dynamic data? How do you make the app accessible to users? How are you handling availability if your app crashes? Is the app distributed or load balanced in any way?
At Lime Technology, we have integrated docker into our NAS offering along with virtual machines (KVM and Xen). Docker provides a way to eliminate the "installation" part of software and skip straight to running proven and tested images in any Docker environment. With Containers, our users can choose from a library of over 14,0000 Linux-based apps with ease. Docker just makes life easier.
I'm solving the most obvious issues that docker was meant to solve. I'm currently working alone in a company that's starting up and yesterday I needed to spin up a server and create a rest api service to integrate with a telco's system. They asked me how long it would take me to do that and I said an hour. I just spun up a digital ocean instance, cloned my api git project, built from the docker file (it's incredibly fast on ssd), and in about 30 minutes I was running a nginx->uwsgi->flask with python3.4, bcrypt, and a few other packages.
Now all this can be done with a simple bash script too but then that affects my main server environment. In this case when I want to stop the service or change something I simply edit my docker image.
My dev environment is a windows laptop and I use vagrant to spin up a server with nginx configured and docker installed. And I use docker to get my environment running for working on my apps. It's pretty awesome.
Vagrant and docker are one of the best things that has happened for me as a developer.
We're using Docker for a few internal projects at Stack Exchange. I've found it to be simple and easy, and it just works.
We have a diverse development team but a relatively limited production stack - many of our devs are on Macs (I'm on Ubuntu), but our servers are all Windows. Docker makes it painless to develop and test locally in exactly the same environment as production in spite of this platform discrepancy. It makes it a breeze to deploy a Node.js app to a Windows server without ever actually dealing with the pain of Node.js on Windows.
Also, it makes the build process more transparent. Our build server is Team City, which keeps various parts of the configuration in many different hidden corners of a web interface. By checking our Dockerfile into version control much of this configuration can be managed by devs well ahead of deployment, and it's all right there in the same place as the application code.
Since we are still waiting for CoreOS + (flynn.io || deis.io) to mature. I modified our existing VMWare VM based approached to setup ubuntu boxes with Docker install. Where I then use fig to manage an application cluster, and supervisor to watch fig.
When its time to update a box, jenkins sshs in calls docker pull to get the latest, then restarts via supervisor. Any one off docker run commands require us to ssh in, but fig provides all the env settings so that I don't have to worry about remembering them. The downtime between upgrades is normally a second or less.
The biggest thing I ran into is that each jenkins builds server can only build and test one container at a time. After each one, we delete all images. The issue is that if you have an image it wont check for a new image. This applies to all underlaying images. We cut the bandwidth by having our own docker registry that acts as our main image source and storage.
We use Docker to deploy on Aptible, and this makes our projects entirely self-contained. With a Dockerfile in the project directory, the entire build and runtime environment is now explicitly declared.
With "git push aptible", we push the code to the production server, rebuild the project, and run it in one command.
Can you elaborate on the middle bits? How do you go from 'git push aptible' to the actual execution of the docker commands on the production server? (Setting up something similar myself and would love some direction).
Docker explicitly violates the principles of the Twelve-Factor App. Docker apps don’t rely on any external environment. In fact, Docker demands that you store all config values, dependencies, everything inside of the container itself. Apps communicate with the rest of the world via ports and via Docker itself. The trade-off is that apps become a little bit bulkier (though not significantly), but the benefit is apps become maximally portable.
In essence, Docker makes almost no assumptions about the app’s next home. Docker apps care about where they are even less than twelve-factor apps. They can be passed to and fro across servers—and, more importantly, across virtualization platforms—and everything needed to run them (besides the OS) comes along for the ride.
> I should state up front that I disagree with quite a few bits of the 12 Factor model. It’s important to remember that, imho, the 12 Factor model was designed as a business strategy for Heroku. The steps follow exactly what makes an app work best on Heroku, not what is best for an application.
> In fact, Docker demands that you store all config values, dependencies, everything inside of the container itself.
No it doesn't. In fact, parameterised containers are a thing that many people encourage. This is where you specify configuration when you run the container instead of when you build it. This can done using arguments to the CMD, environment variables, config volumes, or using something like etcd.
This is the first time I've heard the phrase "twelve-factor app". Although I'm ignorant of this term and docker, it wasn't obvious to me how it violates the concepts. Which factor does it violate?
One of the main things I'm using it for are reproducible development environments for a rather complex project comprising nearly ten web services.
We have a script that builds a few different docker images that the devs can then pull down and get using straight away. This is also done through a dev repo that they clone that provides scripts to perform dev tasks across all services (set up databases, run test servers, pull code, run pip etc.).
It used to take a day to set up a new dev enviroment, now it takes around 30 mins and can be done with almost no input from the user and boils down to: install docker, fetch databases restores, clone the dev repo, run the dev wrapper script
This is approximately what I'm using it for too. I'm working on my MSc, and I'm using it to make reproducible experimental environments. Packages locked to specific versions, all required libraries installed, isolated from the rest of the system. Working pretty well in that capacity!
We're been using Docker and coreos+fleet for our production environment at GaiaGPS for a few months now, and have been very impressed. We use quay.io for building our repositories, triggered by a github commit.
I agree with what others have said, and for us, the biggest benefit we see is keeping our production environment up to date, and stable. We're a small shop, and want to waste as little time as possible maintaining our production environment. We were able to go from 1 host (that occasionally went down -- and downtime for every deploy) to a 3-node coreos cluster fairly easily. We can also scale up, or even recreate the cluster, very easily.
At Shippable(shippable.com) we've been using docker for over an year now for the following use cases:
1. deploying all our internal components like db, message queue, middleware and frontend using containers and using a custom service discovery manager. The containerization has helped us easily deploy components separately, quickly set up dev environments, test production bugs more realistically and obviously, scale up very quickly.
2. running all the builds in custom user containers. This helps us ensure security and data isolation.
We did run into a bunch of issues till docker was "production-ready" but the use case was strong enough for us to go ahead with it
I've used docker for process isolation at two companies now. In both cases, we were executing things on the server based on customer input values, and desired the isolation to help ensure safety.
In the first company, these were one-off import jobs that would import customer information from a URL they provided.
In the other, these are long-running daemons for a multi-tenant service, and I need to reduce the risk that one customer could exploit the system and disrupt the other customers or gain access to their data.
I have some other experiments in play right now in which I am packaging up various services as docker containers, but this is currently non-production.
Deployment. We have a legacy application that would take about a day of configuration to deploy properly. With Docker (and some microservices goodness) we've reduced the deploy down to an hour, and are continually improving it.
We are running our app[1] in instances of Google Compute Engine. We installed Docker in those instances.
Our app is a bunch of microservices, some Rails Apps each one running with Puma as webserver, HAProxy, some other Rack app (for Websockets). We also use RabbitMQ and Redis.
All the components are running in their own containers (we have dozens of containers running to support this app).
We choose this path because in case of failures, just 1 service would be down meanwhile the whole system is nearly fully functional. Re-launching a container is very straightforward and is done quickly.
We use docker at stylight for deploying our frontend app (wildfly). We use the docker hub for storing images. When we do a release we basically push to the hub, pull on all upstreams and restart the containers with the new image. We have a base application image (containing java, wildfly etc.) which basically never changes so builds and distribution are super fast. We really like the fact that the containers are isolated! We ran into an issue the other day where we wanted to dump the heap of the JVM to debug some memory leak issue, this should be easier with 1.3!
We are a small startup and host on Softlayer (we are part of their startup program).
I would postulate this - if you are using AWS, you will not need a lot of what Docker provides. But if you are hosting your own servers, then Docker provides close-to-metal performance with stateless behavior.
For example, when Heartbleed or Shellshock or POODLE hit the ecosystem, it took us 1 hour to recreate all our servers and be compliant.
My biggest complaint and wishlist is for Docker to roll-in Fig inside itself. The flexibility to compose services/stacks is very useful and Fig claims to be too closely tied to Orchard.
Not technically in production yet, but I use Docker for the following scenarios:
- Build agents for TeamCity, this was one of the first scenarios and it's been amazingly helpful so far.
- Building third-party binaries in a reproducible environment
- Running bioinformatics pipelines in consistent environments (using the above tools)
- Circumventing the painfully inept IT department to give people in my group easy access to various tools
I've also been contemplating building a Docker-based HPC cluster for a while now, though unfortunately I'm currently lacking support to make that happen.
> Building third-party binaries in a reproducible environment
Awesome! Not enough people talk about this use case IMO. For instance, docker itself and the docker-cli are both compiled inside of a docker container. This allows you to circumvent the "OK, install the right version of Go, all the libraries I need etc." song and dance if you're developing in a new place (or if new developers join in the contribution).
It's also a great way to circumvent the (all to familiar) problem of "OK which lib* do I need to get this to compile? It isn't documented" problem.
The image basically contains (on top of the TeamCity agent itself of course) all the required dependencies. These dependencies are built in exactly the same way (and in the same context) as the ones used for running the code itself (i.e. after building/testing). This basically ensures that the code consistently runs in the same conditions.
Using Docker for this also means I can roll out the agents rather painlessly across a variety of machines. For instance, we have some pretty serious hardware used for HPC and similar stuff, so whenever possible I like running builds on that, since it takes a fraction of the time it would on the more common hardware. Sometimes however those machines aren't available for a number of different reasons, so I can very quickly move my build agents to another machine, or better even, multiple machines.
I've developed a little command-line tool (https://github.com/zimbatm/cide) that can run the same environment on the developer machine and Jenkins. It also makes the Jenkins configuration much easier since build dependencies are all sandboxed in different docker boxes.
The tool is mainly for legacy apps and is able to export artefacts back to Jenkins instead of publishing Docker images.
Cool thing about this, you don't need to change anything in config when deploying.
+ it follows docker's best practices like one process per container.
Already can't imagine I will deploy something without it.
It's OpenSource (MIT, contributions are welcome) and current functionality will stay free forever.
Hope will be useful for somebody, not only for our team.
At UltimateFanLive we use docker on Elastic Beanstalk to speed up the scaling process. Our load goes from 0 to 60 in minutes, as we are connected with live sports data. Packages like numpy and lxml take way too long to install with yum and pip alone. So we pre-build images with the dependencies but we are still using the rest of the goodies on Elastic Beanstalk. Deploy times have plummeted and we keep t2 cpu credits.
We use Docker to set up our testing environments with Jenkins and install the application in it. Every build will be installed in a Docker Container automatically. The Container is used for acceptance tests. The Docker Containers are set up automatically with a Dockerfile. Its an awesome tool for automatating and deployment and used to implement the concepts of "Continuous Delivery".
I'm considering Docker for a small side project I have where I want to deploy a Runnable Java Jar as a daemon. Getting Java paths right across different Linux distributions can be a hassle, hoping Docker will help me solve this. For that matter, getting a daemon (service) running correctly on different Linux'es is one more thorn I'd rather not have to deal with.
>Docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating system–level virtualization on Linux.
Docker is based on LinuX Containers (LXC), which are much more lightweight than a full OS VM. Containers are basically namespaces for processes, networks so I would expect way lower overhead than a full fledged VM.
It is a virtualization layer, but not nearly as heavyweight (and perhaps not as well isolated) as a full VM. As to the isolation - I don't know how "good" it is, but I expect it will only get better. I think this is what PG means by "do things that don't scale".
But for the case of deploying Java applications, it seems particularly redundant. The JVM already has a very extensive security manager framework, and since it's Java it's already write once run everywhere. You can even easily bundle your entire project into a single executable .jar for deployment.
As a Java developer, I can't fathom why someone would want to add a redundant layer of cruft on top of their project. Docker may be the latest trendy thing, but it seems like many of its users aren't entirely sure why they need it.
Edit: I should clarify that I have nothing against hypervisors or other virtualization and sandboxing schemes. However, a lot of people seem to be running Docker, under an outside hypervisor, in order to run their also virtualized language runtime. That strikes me as a bit pointless.
It's the daemonizing part that complicates things. Running java -jar MyApp.jar is easy, running it under the control of a system init script, dealing with how to ensure it's restartable as other daemons, logging, log rotation, where Java is really located on the system are all variables that affect any daemonized application; not just Java apps but ones written in any language, Ruby, Python, C or C++.
Docker seems like a good way to "contain" the problem domain challenges I outlined, hoping anyway.
I don't know if docker would be the right service for my use case, but considering the user experience on this thread I thought I'd ask...
I'm looking to deploy a python based analytics service which runs for about 12 hours per job, uploads the results to a separate server then shuts down. At any given time there could be up to 100 jobs running concurrently.
Most people seem to be using Docker with distro-based images, ie. start with ubuntu and then add their own app on top etc. Is anyone using more application-oriented images, ie. start with empty image and add just your application and its dependencies?
I'm doing this, but starting with the minimal 80MB debian:stable image and then using apt-get to grab my dependencies. The 80MB covers the dependencies for apt-get.
I believe the centos images are also minimal, but the ubuntu image starts out much larger.
I've used to to make an Ubuntu packaged app available on CentOS machines. The compile-from-source was a bit of a headache (lots of dependencies which also had to be compiled from source) so being able to deploy like this saved a lot of hassle.
we use docker to run tests, honestly we could deploy the resulting images to our production infrastructure now quite happily however we haven't got round to it yet.
1. You run the db and its files inside a container and regularly export a DB dump like so (example for postgres):
docker run --volumes-from db -v $(pwd):/backup ubuntu:trusty sh -c "cd /var/lib && tar zcvf /backup/backup.tgz postgresql"
(I don't recommend this)
2. You run the db inside a docker container and mount a volume for /var/lib/postgresql into docker. This I find is the best way. the db files remain on your host and you can easily maintain the DB (runtime) and files separately.
Docker is not safe for untrusted code. If the client can get local root privileges (e.g. CVE-2014-4699, CVE-2014-4014, CVE-2014-0196, unix-privesc-check, many more) they can then escape the docker container.
But it's definitely better than running as your own user!
* Sanity in our environments. We know exactly what goes into each and every environment, which are specialized based on the one-app-per-container principle. No more asking "why does software X build/execute on machine A and not machines B-C?"
* Declarative deployments. Using Docker, Core OS, and fleet[1], this is the closest solution I've found to the dream of specifying what I want running across a cluster of machines, rather than procedurally specifying the steps to deploy something (e.g. Chef, Ansible, and the lot). There's been other attempts for declarative deployments (Pallet comes to mind), but I think Docker and Fleet provide even better composability. This is my favorite gain.
* Managing Cabal dependency hell. Most of our application development is in Haskell, and we've found we prefer specifying a Docker image than working with Cabal sandboxes. This is equally a gain on other programming platforms. You can replace virtualenv for Python and rvm for Ruby with Docker containers.
* Bridging a gap with less-technical coworkers. We work with some statisticians. Smart folks, but getting them to install and configure ODBC & FreeTDS properly was a nightmare. Training them in an hour on Docker and boot2docker has saved so much frustration. Not only are they able to run software that the devs provide, but they can contribute and be (mostly) guaranteed that it'll work on our side, too.
I was skeptical about Docker for a long time, but after working with it for the greater part of the year, I've been greatly satisfied. It's not a solution to everything—I'm careful to avoid hammer syndrome—but I think it's a huge step forwards for development and operations.
[1]: https://coreos.com/using-coreos/clustering/
Addendum: Yes, some of these gains can be equally solved with VMs, but I can run through /dozens/ of iterations of building Docker images by the time you've spun up one VM.