Hacker News new | comments | show | ask | jobs | submit login
Docker: Git for deployment (scoutapp.com)
318 points by itsderek23 on Aug 28, 2013 | hide | past | web | favorite | 115 comments

Disclaimer: Vagrant creator/maintainer guy.

It is unfortunate that so many people compare Vagrant and Docker. While there is overlap, Docker is mostly not viable as a dev environment tool alone, so it isn't a fair comparison. The main reason is because you have to be using Linux (and a recent Linux) as your main dev system, and in practice this is very rare. Move beyond indie developers and for all intents and purposes Linux desktops are non-existent (Vagrant is in use by companies like BBC, Expedia, Intuit, etc. and I can tell you most devs don't know how to use Linux let alone run it as their primary dev platform).

BUT, I agree that putting your dev environment in a Docker container is absolutely _amazing_, and there is a KILLER Vagrant/Docker combo.

The killer combo is actually running Vagrant to spin up Docker-ready VMs, then using Docker inside that to develop. This lets people use Docker on Windows, Mac, and Linux. You get the fast iteration time because all your state is actually in a container, so you just docker kill and run as usual.

In fact, an upcoming version of Vagrant is adding Docker as a provisioner, so you can `docker pull` containers down as part of `vagrant up`.

And I published Packer[1] templates to build Docker-ready VirtualBox/VMware/AWS images that are Vagrant-ready: https://github.com/mitchellh/packer-ubuntu-12.04-docker

[1]: www.packer.io

I'm really curious what these "don't know linux" devs are doing with Vagrant in their day to day jobs. Do they get one team member who 'knows linux' to set up a Vagrantfile and then force them all to use it?

First off, it's not so black and white. Many developer know some Linux but wouldn't trust themselves setting up a secure production server. I consider myself part of this group.

Using Vagrant, we can still: * Write software that depends on POSIX-only applications, such as Redis and, yes, Docker * Share development environments-that-look-like-production with other developers, thus avoiding "works on my machine" bugs.

You need very little Linux knowledge to do this. Just apt-get, a text editor and the occasional HOWTO/blog post gets you very far.

Additionally, with Docker-on-Vagrant, we can easily: * Simulate a multi-server environment locally without hogging resources * Do effective versioning on dev environment configuration before sharing stuff with colleagues * As a result, learn Linux administration with easy rollbacks after fuckups.

All this without an on-team Linux guru.

Of course, once you go live, you'll need a decent sysadmin/devop type to un-suck the installations. And backport that to the dev setups. Or, just go to some PaaS and have the security/efficiency part done for you. But that's not my point: my point is that even without running your own hardcore-linux-guru-devopsed production environment, and without anything more than basic Linux skills, you can get a lot of value from Vagrant+Docker.

Yes, the ops side of things primarily sets up the Vagrantfile. Developers just know the Vagrant workflow (up, destroy, ssh). Usually developers know "just enough Linux" to `vagrant ssh` and do useful things, but definitely not enough Linux to run it as a primary dev platform.

And it is growing more and more common that most developers in larger organizations don't even know Vagrant is being used under the covers because it is being masked by higher level scripts ("click button to start dev environment"). Under NDA I can't say any names here, but it is more common than you think.

It's quite common. I've worked at several companies, including BSkyB where the developer team is supported by a devops team - the devops write Vagrant scripts which bootstrap a VM that closely mimics the production environment and include all the necessary services ready to run such as nginx and php or ruby, meanwhile the devs all work on Macs and have enough command-line-fu to do development work, but certainly not enough to provision a Linux virtual machine and set up said services.

Basically, yes. Used in this manner it's a lower friction version of slapping vmdk's on a web server (or 'how we used to do it')

One thing I keep seeing and feeling uncomfortable about, is this throwing layers ontop of layers ontop of layers until you are so far removed from what is actually happening, if one piece breaks down it's impossible to fix, and you have a ton of overhead. I want fast, simple, easy to hack, and something that just goes.

Isn't there a way to do this super slimmed down and light weight?

I've been keeping an eye on the ShipBuilder PAAS project - the minimalist Heroku clone written in Go. It is starting to look quite promising.

Quoting jaytaylor's comment [1] from below:

    > You may be interested in checking out ShipBuilder - it
    > is an open-source Heroku-clone PaaS.
    > ShipBuilder is written in Go, and uses Git, LXC and
    > HAProxy to deliver a complete solution for
    > an application platform-as-a-service.
    > http://shipbuilder.io [2]
    > https://github.com/sendhub/shipbuilder [3]
[1] https://news.ycombinator.com/item?id=6292463

[2] http://shipbuilder.io

[3] https://github.com/sendhub/shipbuilder

Oh, Thanks! I'll check it out!

I just, (seriously, just) read an interview with Ken Thompson arguing that this over-layering is killing him.

Edit: It was in the book Clean Code.

Could you provide page number or chapter name or (preferably) a quote? I searched for "Thompson" in the book and couldn't find anything.

Hey, sorry i had two books in front me and misreported the title. The book is Coders at Work.

I just saw that book on a coworker's desk of mine, now I've GOT to check it out, thanks!

Hey, it was Coders at Work not Clean Code. :) I have both and I guess I mixed the two.

Layers are completely optional (though thoroughly encouraged due to aggressive caching.) If it doesn't fit your model and you're not interested in using the features, that's completely fine.

Fixed, thanks for catching my herp derp.

As a dev whose 2010 MBP doesn't do so hot spinning up multiple Vagrants, I've been strongly contemplating Linux for my next laptop just so I can run containers natively.

Docker-on-linux-on-vagrant is a nice compromise though.

If you're just using Vagrant as a harness to run Docker-ready VMs, you only need to spin up one VM to host multiple Docker containers, so it works decently on earlier (definitely 2010!) laptops.

Yes, just to confirm I run loads of stuff on a 2010 MBP... not docker but similar and heavier workloads. With caching as advertised in the linked article, you should see no issues. Just be sure to fork over adequate memory to your VM to stop it swapping over a seriously slow, paravirtualized block device with parallel applications causing host OS contention.

Get a system76 laptop with dual SSD drives and 16gb ram for 1,400!

Best machine I've ever had

This is my primary use of Vagrant. I think it's amazing to be able to set up a Docker-ready host.

Combining these technologies to have the Docker builds released in ready to go AMIs/VBox/VMWare images for easy deployment is on my dream list. Does anyone want to help?

Sure thing. I am working on something similar for a different LXC-based deployment layer. One of the biggest challenges is keeping the payload (ie. container) naieve of the actual hardware (ie. virtualized interface to the outside world), both in terms of kernel driver requirements and in terms of network topologies. Then, for any nontrivial deployment, you also have to hand over information about services to which it may wish to connect, which may not be appropriate to hardcode. If you would like to thrash this out with me I'd be keen.

I agree that the Vagrant/Docker combo is awesome. I see them as solving completely different problems. To me, Vagrant is for running complete systems while Docker is for running isolated processes (quite commonly within a system managed by vagrant).

Docker source itself ships with a Vagrantfile so that you can get an example development environment up and running should you be running on an old Linux kernel that doesn't sport Docker's current runtime requirements. Just 'git clone git@github.com:dotcloud/docker.git docker && cd docker && vagrant up'.

I'm still a bit confused as to how Vagrant, Docker and Packer all fit in together. I think using Vagrant should be better than interfacing with the virtualization/container layer directly.

Thanks for this post. It's very difficult to keep up with all the stuff going on in the DevOps world right now and figure out what goes together, what competes with each other, and when there's overlap.

I feel like we need some sort of standard nomenclature or diagrams for explaining different pieces of complex software systems. Reading through pages and pages of documentation to make decisions about what to use is too time-consuming.

Anybody have thoughts on this?

I would really love to see this too. A diagram with layman description of all the different packer/vms/dockers/vagrants/deployment projects and how they work together.

Glad to see a Vagrant maintainer hop into the fray. Also, really excited to see Docker being added as a provisioner. I'm a senior internal dev at a marketing firm and I'm using Docker to prototype / develop testing environments / workflows and translating them to Vagrant / puppet setups when it's time to hand things off to the junior devs who are running god knows what ( Win 7 / 8, OS X, etc).

For anyone following this thread, a good breakdown about the role of Vagrant and Packer in your dev environment is available here:


> using Linux (and a recent Linux) as your main dev system, and in practice this is very rare.

There is the problem right there.

Make it not rare, hire people who work on Linux and prefer it, this is a way for any company to ensure a good hire and not having to find new interview questions or other tricks to weed out the copy-pasters.

Unfortunately declaring "let it be so" isn't enough to make it so.

If you hire only people proficient with Linux, you won't create a sudden thirst for developers to grok Linux. Instead you'll have just limited your talent pool to the subset of developers who enjoy tinkering with OSes. If that's your goal, then great - but most companies are looking for people who can develop well, regardless of their OS preferences.

I prefer OS X. Guess you wouldn't hire me.

I only know devs who work on OSX or Linux machines. OSX isn't that far away from Linux as it is from Windows (my feeling)

I used OSX a few times with the same software I used with Linux. Same browser, same IDE, etc.

I would hire you, especially if you also work close to UI.

I still find the association of mac users with graphics design/video/audio editing interesting.

I've used a mac since 10.4 as my main workstation system and I still get people asking me what they should use for a video editor. I spend most of my time in Emacs, or the Terminal ssh'd into something, or looking at docs in Dash, or in chrome looking at webpages.

I couldn't edit an image to save my life, I leave that to professionals.

I met a few Apple fans in my life, and those who were programmers had a thing for fancy UI - they were not always right, for example the damned switch toggle, but at least they have an opinion and interest. So my small sample size led me to conflate UI and mac users.

So emacs ey, why Mac then? Why not a Linux flavor?

I've used all flavors of Linux (from Gentoo, when there was no installer and I had to use dd, to Ubuntu). I still love Linux, but I don't use it anymore as my main system. I have 2 problems with it:

(1) I want the computer to work for me, not the other way. I used to like to hack everything on my computer, but after years and years of doing that, I've lost my patience. I want things to just work and not break after every damn update. I want to be able to do my real job and use the computer as just a tool.

(2) It looks ugly, the fonts are really bad, nobody cares about UX when designing apps, etc. You can say that I can change that, but as I've said: I want the damn thing to just work out of the box. I understand that maybe you don't care about that, but that's why it's me who's using a Mac. Don't get me wrong, it's nothing wrong in using Linux, it's just not for me anymore.

I'm still using the command line more than the GUI, though. But that is a choice. It combines the best parts of both worlds.

Well, I prefer FreeBSD. Did you mean "*nix"?

I don't see the parallels to Git at all. If you just looked at the command line options without understanding what's going on at all it might look similar.

> a tool like Puppet or Chef is needed when you have long-running VMs that could become inconsistent over time.

Uh, no, Puppet and Chef are designed for configuration management. To manage your configs. They are not designed to replace customization and they don't address package management or service maintenance. (They have options to munge these things, but you still need a human to make them useful and coordinate changes with your environment) Neither does Docker. Docker also doesn't do configuration management. All different things.

> You'll be 100% sure what runs locally will run on production.

Incorrect; you're using unions to fill in the missing dependency gaps, so there's no guarantee what was on your testing container's host is on your production container's host. Your devs also might be running with container A and B, but in production you're using containers A and C. Not to mention the kernels may be different, providing different functionality. All this assuming A, B and C don't need instances of different configuration. There are no guarantees.

You know what else is crazy fast and easy to manage? Packages. There's this new idea where you can have an operating system, run the command 'apt-get install foobar', and BAM all of a sudden foobar is installed. If you need a dependency, it just installs it. And it only downloads what it needs. Also does rollback, transactions, auditing, is multi-platform, extensible, does pre-and-post configuration, etc. Sound a lot like docker? That's because it's a simpler and more effective version of docker [without running everything as a virtualized service].

Deploy using your package manager. Except for slotted services which (AFAIK) no open-source package manager supports, it will do everything you need. And what it doesn't do, you can hack in.

Package managers are okay mechanisms for installing software, but really limited for configuration management. For example:

- I need to configure 50 virtual hosts, each containing a different set of web content - should I make 50 separate packages? or make one meta package?

- I need to install a java web application, and configure a database.properties file with a JDBC URL that might vary based on environment

- I have a set of cron jobs that I need to be configured based on the application sets that I have installed, but a different set that I need to be consistent across all systems. Now I need to build logic that figures out whether any of the cron jobs configured match jobs that already exist, or run the risk of them running more than once.

- I need to install tomcat 15 times with slightly varied configurations.

Now, you can say that some of these things are not relevant in development environments, or that you can do some of them with packages, and so on. But there's real advantages to use a config management tool to build your dev environment, and then when you're ready to move to production, use teh same config management model to build that environment.

It's not that you can't make packages do most or many of the things you want. It's about using the right tool for the right job.

1. If you believe in the paradigm of package manager as deployment tool, making 50 different packages would be really simple and effective. One meta package would get clunky to use, but require less package maintenance (and instead require more configuration/maintenance on the target)

2. Make one package for the web application. You can put the database.properties as a post-install configure section if you already know what it should be, or have it run a script on the target that loads the correct variables. Or make separate packages just for the database.properties. Again, it comes down to where you want to do maintenance.

3. This one isn't too difficult. Use /etc/cron.d/ and name your cron jobs uniquely based on what they do. Then make packages however you want. Even if multiple packages deliver the same cron job (more than once), you're just overwriting a file that already exists, so no duplicates. If that causes a conflict you can deliver to a temp location and have a %post section test if the target cron file already exists, and delete the temp file.

4. This is where slotted services (admittedly a term I made up) comes in handy. No package managers really deal with this properly, which is where a completely virtualized service becomes a much easier way to handle it. But you can still install a chroot directory and run the service from it using just package manager, no deploy tool required. Optionally, you can also build a set of packages and dependencies and install them to version-specific directories, and set up directories of symlinks to the versions you want, and target your application at the symlink-directory-tree that matches the versions you want to run. It can be a hassle if you don't have a tool to do it for you. I think there are some existing open source tools designed to do this, but I haven't used them.

(That was for running 15 different versions of tomcat, by the way. If you just want to run tomcat with different configurations, just make your configs and run tomcat for each instance! The scripts that ship with tomcat already support instance-specific configurations; iirc, HOME_DIR was the base path, and INSTANCE_DIR was the instance-specific configuration path, or something similar)

Everything you mentioned is relevant to development. And i'm not trying to discount real configuration management. If anything, it's critical to use a real configuration management tool to manage large, complex sets of configurations across large orgs. Using a package manager is just an easy deploy tool for the configuration, but how you manage it is left as an exercise for the engineer.

(That being said: why don't deploy tools incorporate change management, persistent locking, user authentication, pre/post install hooks, and audit trails? We need more open source solutions that fulfill enterprise requirements)

1. This is debatable, but probably too long a discussion to have over HN.

2. Once you're "running a script on the target that loads the correct variables", you're doing configuration management. There needs to be a mechanism for retrieving the metadata from somewhere, usually centralized. You'd be better off delivering this through a CM tool to build and maintain the file.

3. This doesn't work. Not only do you have the conflicts to deal with, and a post-configure script a hacky way to deal with it, but now if you need to change a single cron job, you have two unsatisfying options - make a new meta-package for that one cron job and overwrite the versions delivered by the five other packages, which will break any config validation you're doing (as one would hope you're doing), or I can update all of the packages that provide that cron job and update them all (which works, but now you have to come up with a way to notify your package that just in this case you shouldn't restart services adn the like just because the package was deployed. On top of that, there's an even worse issue, which is - how do you know when the last cron job is removed? If I have five packages that all create that file, either removing the first one removes it, which breaks the other four, or I have to come up with post-remove script logic that tries to programmatically determine if this is the last reference to that file and remove it only then. If I did the latter, my meta-cron job package update would break this model as well, and I'd have to remember to remove that specifically as part of uninstalling the their packages.

4. I guess this works, but now you're dealing with chroot'ed environments, which means deploying not just the specific stack you want, but all of the necessary libraries, and as you say, your original package manager idea doesn't really deal with this.

And tomcat gets a lot more complicated too, when you're trying to manage shared XML resource files. In fact, the whole package manager notion really requires the "X.d" style approaches to loading files.

But all of these challenges are why package managers are the worst solution for deploying configuration files. Package managers are great for deploying static software, shared libraries, and the like. I'll even concede that dropping code on a machine is fine with a package manager. But they're not designed to deal with dynamic objects like config files and system configurations.

In fairness, there's a whole other third class of objects that currently both package managers and configuration management tools do terribly, and that's things that represent internal data structures - database schema, kernel structures, object stores, etc.

I've built a couple of configuration management tools and work for a company that has a few more, so this is something I've spent a lot of years working with. Package maangement as configuration distribution is attractive for simplicity, but falls apart beyond the simple use cases. Model-driven consistency is vastly superior.

3. If you consider base OS packages to be hacky... That's most packages deal with potentially overwriting hand-edited configs. 'deliver foo.conf.new; [ ! -e foo.conf ] && mv foo.conf.new foo.conf'

But you have a good point! Dupe files are hard to manage. Some package managers refuse to deal with it, others have complicated heuristics. The best solution would be to just deliver the files and let a configuration management tool sort out what needs to be done based on rulesets. This can still be accomplished with packages as a deploy tool, and a configuration management tool to do post-install work, instead of the %post section.

4. You already deal with chroot environments using lxc/docker/etc. They're just slightly more fancy. But even with docker's union fs you still have to install your deps if they don't match the host OS's. Unless, of course, you package all the deps custom so they can be installed along with the OS ones. Nothing is going to handle that for you, there is no magic pill. Both solutions suck.

Most configuration management eventually becomes a clusterfuck as it grows and gets more dynamic and complex. In this sense, delivering a static config to a host in a package is simpler and more dependable. I can't tell you how much more annoying it is to have to test and re-deploy CM changes to a thousand hosts, all with different configurations, only to find out 15 of them all have individual unique failures and having to go to each one to debug it. On the other hand, you could break out those configs into specific changes and manage them in VCS. Or even pre-generate all the configs from a central host, verify they are as expected, and package and deliver them. I have done both and both have their own special (read: retarded) problems.

For reference, the sites that I worked at that delivered configuration via package management spanned several thousand hosts on different platforms, owned by different business units and with vastly different application requirements. But you have to adjust how you manage it all to deal with the specific issues. edit Much of it involves doing more manual configuration on the backend so you can 'just ship' a working config on the frontend. Sounds backwards but (along with a configuration management tool!) it works out.

> There's this new idea where you can have an operating system, run the command 'apt-get install foobar', and BAM all of a sudden foobar is installed.

Nah dude, that's never gonna fly. See, as mentioned in the top comment, no one uses and knows the OS they are developing for nowadays (though only windows users are looked down upon for some strange reason).

It is always more fun to not spend precious minutes reading about how to create an apt package, but reinvent the wheel for the hundredth time and implement another under-featured and buggy package manager in our current favorite language, and then add workaround after workaround to actually make it somewhat close feature wise to apt or rpm (i.e. usable).

Each docker RUN command is like a git commit. See those cache keys? Those are the commit sha1sums.

According to the article, you can roll back to a (any?) cache key and branch off into a new direction. I wonder about merges though.

Oh, I found it in their FAQ:

> Versioning.

> Docker includes git-like capabilities for tracking successive versions of a container, inspecting the diff between versions, committing new versions, rolling back etc. The history also includes how a container was assembled and by whom, so you get full traceability from the production server all the way back to the upstream developer. Docker also implements incremental uploads and downloads, similar to git pull, so new versions of a container can be transferred by only sending diffs.

I get it now. They support features which are comparable to features found in Git. Similar to how Apt and RPM are just like Git, because they also have the same features.

What I don't see is specifically git-like functionality, which would be incredibly useful for anyone who packages deployments of OSes, applications, configs, etc. For example, with Git you have a workspace that allows you to work with a tree of files, make changes, make specific commit logs for specific changes, merge, compare, search, etc. From what I see of docker, it's all just "how do I move my already-built containers around" functionality, and of course a shell-script (or specfile, or debian rules file) replacement called a Dockerfile.

There are lots of deployment solutions out there. What there isn't is a handy way to manage and track the assembling and customization of your various things-to-be-deployed, independent of the platform. A deployment tool that did that would become very popular.

> There are lots of deployment solutions out there. What there isn't is a handy way to manage and track the assembling and customization of your various things-to-be-deployed, independent of the platform.

Have you actually tried Docker? It does exactly what you describe.

Docker containers are versioned similarly to git repositories. You can commit them to record changes made by a running process; audit those changes with a diff. Unroll the history of any container to reconstitute how it was assembled, step by step. You don't get commit messages because typically changes are snapshotted automatically by a build tool - instead you get the exact unix command which caused the change, as well as date etc. This means you can point to any container, ask "what's in there?", and get a meaningful answer. In theory that would be true if 100% of all code deployed used rpms or debs. In practice that never happens because developers never package everything that way.

You can branch off of any intermediary image. This branching mechanism is used by the build tool as a caching mechanism: if you re-build an image which runs "apt-get install", it will default to re-using the result of the previous run. Uploading and downloading of containers takes advantage of versioning, so that you only transfer missing versions (similarly to git push and pull), and only store each verion on disk once with copy-on-write.

A Dockerfile is a convenience for developers to specify exactly how to assemble a container from their source, independently of the platform. Each step of the Dockerfile is committed, and benefits from the aforementioned benefits.

Customization is a special case of assembly: just use a pre-existing container as a base, and assemble more stuff on top.

All of this can be tracked, managed and automated as described above.

> A deployment tool that did that would become very popular.

Right. :)

You may be interested in checking out ShipBuilder - it is an open-source Heroku-clone PaaS.

ShipBuilder is written in Go, and uses Git, LXC and HAProxy to deliver a complete solution for application platform-as-a-service.



Your documentation seems limited to basic functions, with nothing explaining really what this does, or why I would need it. I've just spent like 20 minutes looking all over it and I have no idea what to do with it.

Thanks for your feedback, this kind in particular is very helpful and valuable.

We have a video explanation and walk-through of shipbuilder in the works which should help communicate more clearly about what ShipBuilder is and what it can do for you.

I have a few additional questions if you wouldn't mind helping me improve this aspect of Shipbuilder:

    1. Have you used Heroku before?

    2. Are you confused about the purpose of ShipBuilder?
    (i.e. "what does Shipbuilder do?")

    3. Are you confused about how to setup the ShipBuilder

    4. Are you confused about how the client works?
Finally, please feel free to contact me personally; I'd love the opportunity to answer questions or help you (or anyone) get started with ShipBuilder.

contact info: #shipbuilder on irc.freenode.net or jay [at) jaytaylor do.t com

Look at how easy heroku makes your first few steps[1]. I'd advise doing something similar but with, for example, instructions for setting up a quick shipbuilder server on amazon ec2 or the like. Bonus: it will show you where the pain points are for shipbuilder at the moment, because you'll have to write too many small gotchas&workarounds into the tutorial!

[1] https://devcenter.heroku.com/articles/git

Thanks for pointing this out minikomi, I will have to put together an EC2 quickstart guide. I am thinking of a more succinct form of what is currently the server install documentation [1].

[1] https://github.com/Sendhub/shipbuilder/blob/master/SERVER.md...

I think the main question that would help me would be "Why do I need Shipbuilder?" with different examples of when I would need it, how it compares to other tools, what it was designed to accomplish, all the system and network dependencies, and a breakdown of each individual component and how it works with other components. Mainly I want to be able to have an image in my head of how it fits into my network/system, what it might replace, or how I can take advantage of it.

How does ShipBuilder compare to other open-source PaaSes like http://www.cloudfoundry.com ? CloudFoundry's pretty far from production-ready at this point, but it does have the huge advantage of supporting existing Heroku buildpacks.

I haven't quite heard this perspective before. Thank you, I find it interesting!

I believe the use cases are orthogonal at best. If you want to distill it, just as packages are great for dependency management and application install (which, if you read many Dockerfiles, you'll see the common approach is to have the package manager do most of that work,) Docker is great at combining the technologies and providing 2 types of experiences.

1) Developer intent. It is up to the developer to specify that the application receives traffic on specific ports. That it is going to store persistent data in a specific location. That is should run as a specific user.

2) Fulfillment (sysops). This is a prod environment? Let's put that storage on a NBD instead of local storage. Need static port allocation? Map it at run time. Host based routing? Run time.

I've found that the duality of the roles here can be quite powerful. And I believe it can only get better.

There was a grandiose tool at a previous place I worked that was built in-house as a universal deployment tool. It was based on SVN and Rpm with a MySQL backend and an HTTP API, and friendly console and web tools to manage it all.

You could build something as a developer and install it on a machine, and it could run multiple versions of the same application at the same time, including with different ABIs. There were build servers and all the build scripts were automated and vcs-managed. You could package config changes or applications. You could go back and rebuild old crap nobody had looked at in 3 years, and have it actually work. Ops and devs could both use it independently, with ops having the ability to overwrite dev changes. It was slightly clunky, but the functionality was beautiful.

Decentralized, distributed, automated, auditable, and able to support maintenance of pretty much any kind without interrupting existing services. It was fucking sweet, and i've never seen another tool that could match it.

That indeed does sound sweet.

Yep, except quite clearly full of heavily single platform linked assumptions.

Not all software strives to be anything other than really awesome for a single platform, where those assumptions are actually helpful, not detrimental.

It all depends on the use case. Having everything is pretty much the same as having nothing.

Having everything is pretty much the same as having nothing.

Ahh yes, grasshopper. But neither situation is the pure folly of being attached to the idea of such possession!

It was built on a single "stack", but all of the code and technology involved was multi-platform. It would probably take a month or two to make it fully portable. But the RPM database it used was independent of the OS, and it provided all its own dependencies across multiple architectures and versions of the OS. (You could also run it on Solaris...)

With the correct versioning, you can sort the guarantees out - there is some discussion on the docker forum at the moment on signing / hashing or otherwise verifying the images.

For slotted services, I suggest looking at nix and nixos, a package manager (and a distribution) which pinches some ideas from containers.

As for the main point of your comment:

Yes, native package management is lighter-weight than containers (which is lighter than vms, which is lighter than seperate physical machines). Perhaps unsurprisingly, that weight brings additional features. The main one that containers (upwards) adds is segregation. apt (lovely as it is) can only ensure packages don't conflict on the files that they install - you are on your own for ensuring there are no runtime conflicts. Yes, with proper user creation + management you can restrict their ability to tread on each other's toes (hope there are no setuid programs in there), but that is all more effort than the 'their filesystems are seperate' that the heavier options give you.

There is also the question of tidying up / migrating. Let's say I install number of packages for some thing I'm deploying on a box. After a while I realise the load is too high and decide to migrate one/some of the apps to another machine. apt, etc can tell me what files a package has installed. It can't tell me what files a package has created while running. I'll have to go around and figure out the data (config, user config, log, etc) file locations and probably miss a couple and end up just duplicating the original machine. Or I copy the container file and the half a dozen images that make it up.

It's true that docker (and to a lesser extent vagrant et al) are perhaps suffering from over-use as the are 'the new hotness', but that's because we have a new tool and haven't yet fully figured out how to use it - it's somewhat inevitable behaviour. And yes, for some applications package management is fine and containers is unneeded overhead. But for others it isn't.

I will add one more difference between Docker vs. traditional package managers: Docker is a tool developers enjoy using. I have yet to meet a developer who enjoys building his application as an rpm or deb. The shorter the development/deployment cycle, the worse it gets.

Thanks. I was starting to think I was the only one. Nice to get some confirmation I'm not crazy.

I'm very excited about Docker[1] as both a development environment and deployment solution. However, from my early experiments, it seems there's an important difference between LXC (which is what Docker manages for you) and a full VM, namely that the model revolves around running one process at a time: you can install mySQL on your docker image, but once it's up, it's running mySQL -- you can't then ssh into it as you would a VM to poke around, modify config files, etc..

There are trivial ways to solve this, obviously. You can stop the image, restart it running bash, use that to modify config files, and then restart it again. But it requires a change of mindset: these things are much more than background processes, but they are less than a full VM. As the piece mentions, configuration management for newly-started images seems to be a missing piece of the puzzle right now, and debugging running Docker images can be... strange. [2] Not necessarily difficult, but different from what you're used to, and learning curves are barriers to adoption.

As this tech matures I think these things will be quickly solved, and I'm looking forward to the results.

[1] Plus Virtualbox, started by Vagrant. See mitchellh's comment.

[2] Unless, of course, I'm missing something. Docker-people: how do you configure vanilla server images to work in your environment?

Well, the canonical way of running a container is as you mention.

However! There's a couple of options if you do not have a config you're completely happy with yet.

One is to run a process manager like supervisord as your container process, and start up any arbitrary amount of services you wish (like ssh.) It's my understanding that in the future Docker will allow you to call `init` directly, so it becomes more vm like.

The other, assuming a sufficiently modern kernel (I believe 3.8+, which is the minimum supported for Docker) is to use the lxc userland tools, specifically `lxc-attach <containerid>` This will allow you to create a shell in the running container and poke about as needed.

My experiments with lxc-attach always failed; presumably my kernel was wrong in some way (I followed instructions to get to 3.8, but I am sufficiently clueless that I wasn't sure it had worked, or that it was the right flavor of 3.8).

But that's only the ad-hoc case: the bigger question is, if you have an image with instructions "RUN apt-get install mysql", you're not even halfway to having a copy of MySQL you can run in production: at a minimum you'll need to install a custom my.cnf to suit your application's operational parameters[1], but really you'll want it to be slightly different every time -- new bind addresses, potentially new master-slave relationship grants, etc.. The way docker images interact with configuration management in a grown-up production environment is still really hazy to me.

[1] We are all agreed that running default my.cnf in production is laughable, yes? That information has filtered into the mainstream from the DBA crowd?

How I would personally tackle that specific problem is the following:

1) Create a Dockerfile which installs the dependencies of my image as a base (maybe in this case all it is is RUN apt-get install mysql)

2) Tag the image as mysql-base.

3) Shell in to mysql-base, and iterate over the changes as needed until its 'production ready.'

4) Once it's suitably 'production ready', `docker diff` the version to see which files changed.

5) Here comes the fork in the road. Either go back and instrument my original Dockerfile to modify the files that were updated to make my image production ready, OR, `docker commit` that image. There are benefits to both sides, but ultimately it will be up to you in terms of maintainability. The definition of 'production ready' will differ from org to org.

6) Push the final image to a private registry.

With step #1 ... apt-get install mysql ... what happens when the network repos go down? Like when you have to rebuild the same system four years later? You might wind up with an epic fail. That's not very stable as a packaging format then, is it? But this is just an example challenge from a much larger set... all of which derive from the fact that state is being allowed to seep in from random places. It's not clean.

This is essentially one of the core complaints I have with some of these tools. In my own as-yet-unfinished tool's architecture that tackles similar domains, network access is disallowed at deployment time. If a package cannot be installed without network access, then it is not considered a valid package.

It all depends on your tolerance threshold and the trade-off's that are involved to make an acceptable decision.

If you expect apt-get install mysql to fail in the future, there are plenty of mitigating factors (storing the build/deps on your local repo, building from source..)

My point is, you can always find pathological cases. Discussing them is great as a straw man for improvement, but not really useful beyond it.

Right, my tooling generally builds everything from source (mostly gentoo is the target platform, though also ubuntu) and generates the deps automagically.

This is achieved by viewing 'build' and 'install' for the cloud-capable service package as two separate steps, ie. build is the 'gather all requisite goodies' step, and then a version is applied. 'Install' is where an instance is actually created on top of a target OS platform image (also versioned).

Apparently what I consider fundamental architectural issues you see as pathological cases. Your call! :)

Take for instance multiple cloud providers. Those guys are notorious for giving you a slightly different version of any OS as a stock image, and running slightly different configurations. Some of them even insert their own distro-specific repos/mirrors. In that case, you are going to see entire classes of weird and subtle bugs appear where you either:

(A) are not using the same cloud provider for test/staging/production environment. (People tend to lean on local virt for the former).

(B) try to migrate (eg. due to cloud provider failure, hacks, bandwidth or scaling issues, regulatory ingress, etc.) to another provider

That's not unrealistic, IMHO.

> This is achieved by viewing 'build' and 'install' for the cloud-capable service package as two separate steps, ie. build is the 'gather all requisite goodies' step, and then a version is applied. 'Install' is where an instance is actually created on top of a target OS platform image (also versioned).

This doesn't apply to Docker. You can use the exact same process.

> Take for instance multiple cloud providers. Those guys are notorious for giving you a slightly different version of any OS as a stock image, and running slightly different configurations. Some of them even insert their own distro-specific repos/mirrors. In that case, you are going to see entire classes of weird and subtle bugs appear where you either

These are not issues with Docker. The Dockerfile specifically states its environment, so it matters not what the cloud providers use on their host image.

You can use the exact same process.

Yes, my point was that the state is iffy... the architecture isn't clean. The output itself isn't versioned, only the script being input. The product is assumed-equivalent (with inputs from the wider world suggesting it's not always going to be), and not known-same. That's a bug at the level of architecture, and it's real.

The Dockerfile specifically states its environment

Well, I wasn't talking about docker. I was talking about the reality of different cloud providers. But in my direct experience if docker makes the assumption that, say, 'ubuntu-12.04' on 5 cloud providers is equivalent, then sooner or later it's going to encounter problems.

> if docker makes the assumption that, say, 'ubuntu-12.04' on 5 cloud providers is equivalent, then sooner or later it's going to encounter problems.

You misunderstand how docker works. 'ubuntu:12.04' refers to a very particular image on the docker registry (https://index.docker.io/_/ubuntu/). That image is in fact identical byte for byte on all servers which download it. So any application built from it will, in fact, yield reproducible results on all cloud providers.

My bad. That sounds logical, though a bit SPOFfy. FYI on our system instead of providing an image (since the format is hard to fix if we want to support arbitrary OSs and arbitrary cloud providers) we first provide a script that can assemble (or acquire) an image (after which it is versioned), and also specify a linked test suite.

That way, a particular build of a platform (ubuntu-12.04-20130808) that we create on a cloud provider could be used, or alternatively a particular cloud provider's stock image (someprovider-ubuntu-12.04-xyz) or existing bare metal machine matching the general OS class in a conventional hosting environment could also be used.

The idea is that where bugs are found (defined as "application installs fine on our images, but not on <some other existing platform instead>") new tests can be added to the platform integrity test suite to detect such issues, and/or workarounds can be added to facilitate their support.

That way, when an application developer says "app-3.1 targets ubuntu" we can potentially test against many different Ubuntu official release versions on many different images on many different cloud providers or physical hosts. (Possibly determining that it only works on ubuntu above a certain version number.) Similarly, the app could target a particular version of ubuntu, or a particular range of build numbers of a particular version of ubuntu.

It's sort of a mid-way point offering a compromise of flexibility versus pain between the chef/puppet approach (which I intensely disagree with for deployment purposes in this era of virt) and the docker approach (which makes sense but could be viewed as a bit restrictive when attempting to target arbitrary platforms or use random existing or bare metal infrastructure).

Also, would you consider the architectural concern I outlined valid? I mean, in the case you are pulling down network-provided packages or doing other arbitrary network things when installing... it seems to be like there is a serious risk of configuration drift or outright failure.

If you use Ubuntu without upgrading it, apt-get install will fail for any package given a year or two when they (routinely) change all the source repo URLs.

If you are on Linux 3.8, lxc-attach should work.

I use it on a regular basis and never saw any problem (as long as I was on Linux 3.8).

The exact syntax is "lxc-attach -n $FULL_CONTAINER_ID", and you can get the full container ID with "docker inspect" (or "docker ps -notrunc").

Agreed on the my.cnf points, indeed :-)

If you want to poke around like normal, use supervisord or OpenVZ. (You get more features with OpenVZ like checkpointing and live migration)

How would supervisord fit in this use case? Just wondering.

Instead of running the mysql process, you could have Docker run supervisord, which in turn would be configured to fire up the mysql and ssh daemons.

I spent the entire article wondering how deployment via git was at all relevant, until I read the very last heading.

(The title is supposed to be read as "Docker is as powerful for deployment as git is powerful for SCM!". There is no mention of git-based deployment strategies like Heroku's.)

If you're around Paris, we're doing a Docker meetup in October http://www.meetup.com/Docker-Paris/events/136924002/

Did you look at vagrant-lxc? If you already have a vagrant setup it's very easy to switch to and works very well.


Is there any way to run this on a recommendable way with other platforms than ubuntu?

I could love to see it running smoothly with debian and centos to turn immediately into a converted user.

Nice work, looks impressive.

They have official images for Ubuntu, Centos and Busybox, and unofficial ones for many more (Gentoo, Arch, Debian, OpenSUSE, and others).

See https://github.com/dotcloud/docker/wiki/Public-docker-images and https://index.docker.io/

Note that that is for containers (i.e. guests). For the host side, only ubuntu is officially supported; many people are using arch quite well, and a few have got it running under fedora and suse that I know of (I'm going to assume gentoo have it as well, knowing them). both require custom kernels however.

Thanks +1 for the direct pointer.

I am still not sure how useful Docker is but we have created a simple example how to run Docker on Vortex which is sort of Vagrant alternative. https://github.com/websecurify/node-vortex/tree/master/doc/e...

OS containers can be very cool. Want to see 1000 instances started on a small machine? http://ivoras.sharanet.org/blog/tree/2009-10-20.the-night-of...


(i would give credibility to the docker maintainer)

Get back in touch with me when Docker can help with Windows deployments. (Is there an issue I can 'star' or otherwise track?)

Edit: To say Docker is analogous to Git for deployment fails outside of the Linux server realm (lxc). I'm trying to think of an OS-specific source control system to fix the analogy but can't come up with anything only-Linux... 'TFS for deployment'? :)

I plan to check out http://ulteo.com/home/en/download/sourcecode as an alternative.

Docker is a pretty thin usability wrapper around LXC, which has Linux right in the name, so very unlikely to become useful on Windows (or even OS X, which is BSD) anytime soon.

Not sure I would call it thin, but today Docker uses LXC as a container provider. But in saying so it misses the point of Docker. Keep in mind also, this is all related to the daemon. Full client capabilities are available on any platform today.

In a previous blogpost, the Docker team outlined how LXC will become a (albeit native) plugin, just like AUFS. Running Docker on BSD (using Jails as the container provider) is certainly a goal. If you're on OS X, you could use chroot instead of full namespacing capabilities.

To be fair, this is not available today. But I don't think it's fair to say that Docker will not be useful anytime soon on those environments.

The underlying technology for Docker is LXC (Linux Containers), which is very tied to an underlying linux system. I would expect Joyent to support enough for Docker to work in SmartOS before MS would/could in Windows.

The closest thing for Windows that I can think of is Sandboxie. http://www.sandboxie.com/

It's still quite different.

The title should read: "Docker is Git for deployment"

Currently, it implies that Docker is using Git for deployment.

How does Docker mesh with something like continuous deployment? How many layers until the AUFS falls down?

> How does Docker mesh with something like continuous deployment?

Very well. Because any source repository with a Dockerfile can be built into a container with no other out-of-band information, it is very easy to compose your ideal delivery pipeline with docker as the "lego" brick.

Docker+Jenkins and Docker+buildbot are popular combinations.

> How many layers until the AUFS falls down?

The hardcoded limit is 42 layers. But future versions of docker will store a container's history separately from the aufs layers, so you can have an arbitrary number of build steps.

I don't think there is a way to rollback an image to the previous commit. Is there?


Provided you haven't deleted the old container, it's just a matter of using `docker tag <previous commit id> amjith/imagetorollback`

Problem solved.

docker is pretty awesome, at my company we share auxilary server around with docker (message queue system) and it's pretty RAD. I can't wait until they made it rock-solid enough to run systems in production with it.

Would it be possible to deploy an email server inside a docker container?

possible? yes. The better question you should ask yourself would be "does the use case make sense?"

Linux containers are ephemeral, which means that you will lose all data from your email server on a container restart. If you're just setting up a postfix SMTP server in a container, and forwarding the guest's port 25 to the host, I don't see why not. You probaby won't have any app scaling support or the likes, though. I could be wrong about that. I've never attempted to set up a clustered SMTP server configuration before.

Docker containers are not ephemeral. Persistent storage can be easily shared between containers using a VOLUME or bind mount.

TIL. Thanks for the heads up.

My top wishlist for docker: Native windows support.

Unfortunately, that's where the "containers are like VMs, but better!" argument breaks down.

Linux Containers, which docker makes use of, are more like creating a "clone" of the currently running operating system with a boxed-in filesystem (and device-space?).

You can't run a Windows container on a Linux host; neither can you run a Solaris container on a Linux host; ad nausium.

With VMs, you can run whatever you want on whatever you want, assuming the processor architecture is compatible.

> Linux Containers, which docker makes use of, are more like creating a "clone" of the currently running operating system with a boxed-in filesystem (and device-space?).

This is not accurate. I am currently running an Ubuntu Docker host, but a CentOS 6.4 based container, a Gentoo container, and a busybox container.

As stated elsewhere in this thread, which I encourage you to read, Docker aims to solve problems that exist in container technologies across kernels by taking advantage of their strengths (Docker does not aim to replace libvirt/lxc/jails/zones) but instead to build abstractions on top of them for better building, management, discovery, and scale.

It's 100% accurate.

In your "counter-example", you're running four different userspaces/distributions on the same operating system kernel.

You could have posted a useful clarification to my comment (e.g. "but you _can_ run different Linux distributions!") but instead have chosen to be condescending and inaccurate.

Sorry, just read this response. I fail to find the place that I was condescending and assure you that was not my intent.

My only comment was in response to 'creating a "clone" of the currently running operating system.' Which is not, in any sense, accurate.

What is shared, as you correctly point out (and I do not refute) is that kernel and devices are not virtualized per container. So you're absolutely right, it's not like a VM. And it does break down, but none of that has to do with the part where your statement is incorrect, which I've addressed and expanded upon.

From my limited knowledge of Windows: that's highly unlikely. You're much more likely to see Docker start supporting FreeBSD jails with ZFS than you are Windows.

Then again I know nothing of Windows and any containerization technology it may have...

http://www.sandboxie.com/ ?

Or perhaps build something off the Chromium sandboxing.


Solaris Zones support would be awesome.

Still a longshot, but more likely than native Windows support, at least.

Maybe it can happen with this: https://github.com/websecurify/node-vxdocker

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact