Hacker News new | comments | show | ask | jobs | submit login

For me, it is the ultimate in the idea in Continuous Delivery of "build once." I can be very confident that the docker image I build in the first stage of my pipeline will operate correctly in production. This is because that identical image was used for unit tests, to integration and functional testing, to the staging environment and finally production. There is no difference than configuration.

This is the core that Docker solves, and in such a way that developers can do most of the dependency wrangling for me. I don't even mind Java anymore because the CLASSPATHs can be figured out once, documented in the Dockerfile in a repeatable programatic fashion, and then ignored.

In my opinion the rest of it is gravy. Nice tasty gravy, but I don't care so much about the rest at the moment.

Edit: As danesparz points out, nobody has mentioned immutable architecture. This is what we do at Clarify.io. See also: https://news.ycombinator.com/item?id=9845255




I get this by using vagrant + ansible rather than docker. Easy to spin up or destroy the environment in the same way in a VM, staging server or live environment.

I don't really see the point of lightweight virtualization. It provides an illusion of isolation which will likely come crashing down at some probably very inconvenient point (e.g. when you discover a bug caused by a different version of glibc or a different kernel).


Vagrant + ansible is a developer environment tool. Docker is a tool around making containers redistributable for production usage.

Packer is not quite an apt comparison, but would be a better comparison, than Vagrant.

The advantage is you do the steps that could possibly fail at build time. The downside is you need to learn to get away from doing runtime configuration for upgrades.

http://michaeldehaan.net/post/118717252307/immutable-infrast...

I wrote Ansible, and I wouldn't even want to use it in Docker context to build or deploy VMs if I could just write a docker file - assumes I might not need to template anything in it, probably. I would still use Ansible to set up my "under cloud" as it were, and I might possibly use it to control upgrades (container version swapping) - until software grows to control this better (it's getting there).

However, if you were developing in an environment that also wanted to target containers, using a playbook might be a good way to have something portable between a Docker file and Vagrant if a simple shell script and the Vagrant shell provisioner wouldn't do.

I'd venture in many cases it would.


The problem is Vagrant + Ansible violates the rule of "build once."

I don't care about the isolation for isolation sake, I care about it for the artifact sake.


How is building a Vagrant box via Ansible configuration any different than building a Docker container with a docker file? You can use both tools to build an image once and then rebuild for the updates. I don't see how the tool in any way violates that constraint.

What is this rule to only build once? I can see not wanting to create multiple artifacts of your codebase, but with machines it is possible to continually update them and sometimes desirable as well. In the "cloud" world, you can arguably rebuild a server every time it needs updates, but at the physical level you don't always have capacity to absorb the hit of rebuilding multiple boxes at once. The physical servers need to get updated and managed post-install.


> How is building a Vagrant box via Ansible configuration any different than building a Docker container with a docker file?

Unless you're snapshotting that vagrant box and then deploying that to all your servers somehow, you are building multiple times.

> What is this rule to only build once?

I'd recommend reading the book Continuous Delivery. It is a fantastically helpful read.

I prefer not to update my machines, but that is because I follow immutable deployments. But, even if I did update my machines, it is far cleaner (and easier to roll back!) to deploy an asset which has all its dependencies in the box. than to push out code and maybe have to upgrade or install new packages. The gemfile.lock and friends make this a bit less of a problem, but you also get to lock things like libxml version or ffmpeg or...

> In the "cloud" world, you can arguably rebuild a server every time it needs updates, but at the physical level you don't always have capacity to absorb the hit of rebuilding multiple boxes at once.

Totally true, and we don't do this. We build a machine image and do a rolling-deploy replacing existing servers with the new machine image.

> The physical servers need to get updated and managed post-install.

One of the reasons I try not to work with hardware. Physical hardware is hard, and avoiding it makes my life much simpler. I love it.


>Unless you're snapshotting that vagrant box and then deploying that to all your servers somehow, you are building multiple times.

You're also configuring many things in many different potentially complex ways.

The docker method of using environment variables as a configuration hack to get around this is pretty horrible, IMHO. Especially compared to ansible's YAML/jinja2 configuration.


>Especially compared to ansible's YAML/jinja2 configuration.

YAML/jinja2 is just terrible. When you have to introduce a templating system to programmatically generate your YAML configuration files, what you have really needed the whole time is an actual programming language.


It's a non-turing complete programming language much like excel spreadsheets are non turing complete programming languages.

Taking out the turing completeness restricts the mess which you can make and lets non-programmers do more while still being customizable.


Some smart people disagree with that:

http://12factor.net/config

I think the point is not to conflate configuration that is equivalent to code (which, sure, put it in version control) with configuration that is specific to how code is deployed (which your deployment tool should just tell you, via env vars).


> I'd recommend reading the book Continuous Delivery. It is a fantastically helpful read.

Which one? The one by Humble and Farley (Addison-Wesley) is from 2010, is it still relevant?


Still? For me, when I hear book recommendations, the older the book is the more likely the book is still relevant. Books get forgotten over time.

The book is excellent. I was surprised that it wasn't older.


Some books cover tools in detail, that can be great but those books go out of date. Will put it on the list, thanks.


Yes, one by Humble. 2010. Still fantastic. Timeless.


> What is this rule to only build once?

If your cycle is "build, test, build, deploy", then you are not deploying those artifacts that you tested.

Any number of factors (different dependency version, toolchain difference, environment differences, non-reproducible builds) could lead to the second build being different from the first one, and then you deploy an untested artifact.

Not to mention that rebuild can be resource intenstive.


First, not everything produces bitwise identical results from build to build (Websphere ear files, for example). Second, it's time consuming to rebuild from scratch every time. This is especially important if you're in a cloud environment and scaling horizontally for load. You want a way to bring resources online quickly.


All clouds I've used have snapshots - it's hard to get faster than creating a VM from one. You can just keep an up-to-date template using any config management tool.


I don't care so much for this rule. It sounds like a figleaf covering for broken build scripts.

I care about tracking down issues before they reach production. Meaning that I want an environment that mirrors production as closely as possible. Meaning heavyweight not lightweight virtualization.


Agreed about preventing issues getting to production, but that doesn't exclusively mean heavyweight virtualization. It also doesn't "cover up" broken build scripts.

Our build scripts get tested a dozen times a day and cannot tolerate half-assed broken build scripts.

Our deployment pipeline (after verifying the image is good enough to be deployed) packs the docker image into a machine image along with several other containers. The machine image is then deployed to staging. If the machine image passes staging, it goes to production. If there is an issue which has hit production exclusively (it has happened only a handful of times,) it is simply an issue of rolling back to the previous machine image.


It is far easier and less risky to only run it once and use that artifact than to make sure every environment it could possibly build in is identical. Keeping all your environments perfectly identical is more likely to either: 1. introduce subtle differences you aren't even aware of, or 2. introduce DLL-hell at datacenter scale.


>Agreed about preventing issues getting to production, but that doesn't exclusively mean heavyweight virtualization.

Well, it gets you a step closer to accurately mimicking production.

>It also doesn't "cover up" broken build scripts.

That seems to be what that 'build once' rule is for. If your building process isn't risky, why the need to prohibit running it twice?


So that your build process is faster. If I can use the same container for local dev, continuous integration, staging, and production, that means I only had to build it once and the pipeline to get something from a developer's laptop to a production instance is much quicker.

Just because I can install the operating system doesn't mean I want to do this on every deploy of an application.


The kernel is a concern, but I thought the libraries were container-specific.


Unix had this since day 1, you put your app in a file, and run it as a process. The modern day fashion of making an app out of a thousand separate files is just a silly fashion, and Docker is just a way to make this stupid model barely usable.


Like OP, I've been wondering exactly what problem Docker is meant to solve -- thank you for this explanation, it makes total sense.


>> I don't even mind Java anymore because the CLASSPATHs

Agree and add that Python's paths (forget what they're called) (as well as Java CLASSPATHS) have been a problem for me on occasion too which means Docker would probably help with all these types of path issues.


This was already done at the development shop I last worked at with VM images. Docker didn't do anything in particular to help this except for reducing the image size.


Sounds like a lot of work rebuilding and redeploying images every time a security update is available?


It isn't. Our deployment process is very simple, we download the configuration, and we download the container. This is extremely scriptable and repeatable. We have push-button deploys and don't mind rebuilding. That said, we do have a more mature pipeline, so it may be an issue for other people.


How do you handle different configurations then? Especially if you need to provide N values (or structured data).

Also, how do you manage your containers in production?


Configuration files, made available to containers as a read-only mount via the volume flag. No external network or service dependencies that way.

I'm not terribly fond of using environment variables for configuration, personally. That method requires either a startup shim or development work to make your program aware of the variables, and your container manager has to have access to all configuration values for the services it starts up.


I use environment variables, which can be passed in as arguments when you start the container. In cases where the configuration is complicated I'll use an environment variable to tell the container which redis key / db key to load config from.


As others have written, you can either download your config from an external host, or pass in environment variables to your container and generate a configuration based on them.

Lots of people write their own scripts to do this; I wrote Tiller (http://github.com/markround/tiller) to standardise this usage pattern. While I have written a plugin to support storing values in a Zookeeper cluster, I tend to prefer to keep things in YAML files inside the container - at least, for small environments that don't change often. It just reduces a dependency on an external service, but I can see how useful it would be in larger, dynamic environments.


Configuration: We use environment variables for anything small-ish. For more complicated configurations, similar to omarforgotpwd, we keep the values (files AND environment variables) in S3 and download them at deploy time. For stage/prod differences we can literally diff the different S3 buckets.

Management: We create AMIs using Packer. Packer runs a provisioning tool which downloads tho container and the configuration and sets up the process monitoring. It then builds a machine image, and then we launch new servers.


of course, you loose the "identical image" as soon as you build again or run the dockerfile.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: