Hacker News new | past | comments | ask | show | jobs | submit login
Building Good Docker Images (bergknoff.com)
274 points by jaswilder on Oct 20, 2014 | hide | past | favorite | 66 comments

Google takes this a step further and creates single binary containers with the minimal OS bits needed [1, 2]. Personally, I think this is where we need to be headed vs running a full blown ubuntu/debian/centos OS inside the container. Three benefits, 1) no OS to manage eg. no apt-get update or configuration management, 2) container has less of an attack surface (think shellshock -- the container does not have bash, wget, curl, etc), 3) they are lightweight. The issue is that, how you do we (container creators) know the dependency tree for the app? Sure this might be easier for Go binaries, but what about complex apps like rails and mysql? It is a major pain to figure this out, so we just use an OS, and it takes all the thinking out of it.

Kelsey Hightower actually published something on this topic called "Building Docker Images for Static Go Binaries" [3].

[1] https://registry.hub.docker.com/u/google/nodejs-hello/

[2] https://github.com/thockin/serve_hostname

[3] https://medium.com/@kelseyhightower/optimizing-docker-images...

> how you do we (container creators) know the dependency tree for the app

Nix package manager [1] offers a potential means to know the complete dependency tree. If you're not familiar, a nix expression to build a package takes a set of inputs (specific binary packages of, e.g., make, gcc, bash, libc, libxml2) and produces a binary output (depending only on the inputs). The run-time dependencies can be a smaller set than the build-time dependencies and are deduced by observing shared library linking for example.

I've been using it (outside Docker) for various Ruby apps, and I can't say it's been easy, but a large part of the pain has been Rubygems' inability to encode dependencies on C-libraries (e.g. libxml-ruby depends on libxml2).

There have been attempts at provisioning Docker containers with Nix [2]

Of course, if you are using Nix, some part of Docker's isolation becomes redundant (Nix isolates multiple version of things on the filesystem using plain-old-directories, so it's trivial to run ten different versions of Ruby side-by-side, for example).

[1] http://nixos.org/nix/ [2] http://zef.me/6049/nix-docker/

Perhaps I'm missing something, but I don't see how [1] can be used to create "single binary containers with the minimal OS bits needed"? It is from https://registry.hub.docker.com/u/google/nodejs/dockerfile/ and uses the full Debian stack that you discuss including apt-get etc.

I've heard whisperings on the wind of research being done with respect to monitoring what files a Docker container uses, and then removing everything that the container doesn't need to run the app. I agree that this is the future- I shouldn't have apt-get, curl, etc. taking up space in my final image if I don't need them - but how do you tell a "good" file from a "bad" one? (Just thinking out loud here - what if my app depends on imagemagick, libffmpeg etc.?) Nix looks pretty cool I suppose.

I wrote https://github.com/jwilder/docker-squash to remove things that I know I don't need in the final image such as curl, wget, temp files, various packages, etc..

I've managed to get most images to basically the size of the base image + my app.

This process is sort of the reverse of building a single binary and adding it to a minimal image. I like that approach but it's not always straightforward w/ some applications.

Ah yes, I've played with docker-squash and like it, I wish there was a built-in docker solution for squashing layers (perhaps any contiguous string of instructions starting with ~ would be squashed into one layer?).

Mostly the problem I've run into is figuring out what to remove without b0rking the containerized app.

Sorry, here's the one I was thinking of: https://registry.hub.docker.com/u/google/cadvisor/ However, I just broke apart that image, and it is using busybox. I swear this was standalone though. I'm going to dig through the layers to see if/when this changed.

If you are willing to do abit of hacking you can do it.

For instance here is a gist I whacked together in a few mins that will build you a runnable Ruby intepreter with NOTHING else installed but it's required shared libraries. (Note this would not be fun to get say Nokogiri working in without knowing what you are doing)


Interesting this approach of building single binary containers.

I think that would be like packr [1] for Java, already discussed here [2]. I wonder if there is something like this for other languages/platforms like python/ruby/node.

[1] https://github.com/libgdx/packr

[2] https://news.ycombinator.com/item?id=7696564

You might want to look into nix and nix-docker.

Nix is a package management system which knows the full (yes completely full) transitive dependency tree of every package it installs, so you can have an absolutely minimal set of software in your container if you use nix-docker.

Because it's already doing version isolation in the package manager, you can also mount the software from the host into a shim container, which is more the "nix way" of doing things.

For Python there is pex [1].

[1] http://pex.readthedocs.org/en/latest/

There's another gotcha in super-small images - things like "docker exec" will not work because there's nothing to exec. SSH-to-container becomes impossible.

I think there's got to be a middle-ground - small (maybe O(tens of MB) max) but full-featured enough to have a simple shell and the ability to get debug tools.

How small of a debian or fedora image could we get if we REALLY tried? 50 MB? 30 MB?

I thought there was a way to enter a namespace (googled this as I write the comment). Basically you have your shell and debug bits outside the container and make it appear as if it was inside it via nsenter [1]. I have not tested this, but will do that in a second. This might correct the situation you are thinking of.

    | docker container   |
    |  w/ static bin     |
    |                 <----- nsenter + bash/debug bits (on host machine)
[1] http://www.kevssite.com/2014/08/05/console-access-into-a-run...

As of Docker 1.3 you can simply 'docker exec' :)

He is actually talking about the case where bash binary (or gdb or whatever you're going to exec) does not exist in container. So the hassle of loading bash from the parent, then moving the process into container's namespace with nsentry.

So, nope, docker exec just wouldn't work.

You're right, we're still missing dynamic volume mounts to do the same thing in Docker. My bad.

I really think there are no good reasons to include build tools in a docker image. The author lists three possible reasons:

- you need a specific version (e.g. redis is pretty old in the Debian repositories).

- you need to compile with specific options.

- you will need to npm install (or equivalent) some modules which compile to binary.

But you can avoid all of these by building your own DEB/RPM packages and installing those into the container.

This might make the container less "whitebox", in that the Dockerfile no longer contains the full steps to reproduce the built image from public sources. But having an internal package repository makes a lot of sense, and not just for building small Docker images. Keeping your own package repository helps make your server builds more reproducible in general, and provides clean mechanisms for performing updates on your own software as well as third-party packages.

(edited for formatting)

I wish I could upvote this a few times.

I think this is a case where gentoo can really shine. Although tooling might not be there just yet, the linux meta distribution allows you to build a strict set of dependencies based on what you need and nothing else. There's already been pretty successful attempts at this, such as https://github.com/edannenberg/gentoo-bb (63mb custom nginx sound ok to you?) or https://github.com/wking/dockerfile.

edit: To elaborate for people not very familiar with gentoo; it solves what a lot of the discussion in this thread seems to be about - having complete control of the dependency chain based on how you choose to build your software. Using nginx as an example, enabling mod_security would pull and build its own dependencies (which also can be limited based on its compiler options). Strip man pages? Done. Change libc library? No problem (if the packages support it).

The work that needs to be done is expanding the toolset to a point where you say "I want this in docker, plx" and anything else (dependencies disregarded) basically goes out the window. The current attempts builds upon a small set of packages for convenience then removes "safe" stuff. When time allows, I'd like to be much more aggressive in terms of what's considered safe. :edit

I'm personally also very interested in progrium's work with bundling busybox with opkg (https://github.com/progrium/busybox), but still think that docker containers should not be built from within - which why cross compiling from gentoo to create a minimal docker image is the way to go.

Thanks for the pointer. I've been thinking about this kind of scheme: - using a vanilla gentoo - install portage - emerge my package - at the end, do a diff of the filesystem to apply it to the vanilla gentoo.

And using docker filesytem feature for that. I'm still quiete new to Docker and don't know if it is easy to do.

But I'll have a look to gentoo-bb, I think it is exactly what I need!

This gave me an (probably non-novel) idea: "double-layered" Docker image creation. One thing that rubs me the wrong way is how Docker images contain stuff like apt (and all the related supporting stuff) when they don't really need them (at runtime). On the other hand you need to install/compile/setup the environment somewhere, and relying on the host system would break any hopes of reproducibility.

To reconcile these issues I propose two-phased building of Docker image. First you setup a regular Docker image based on Debian or whatnot which contains all the tools you need to build/setup the application. Then inside that container you build the final image based on empty image (eg http://docs.docker.com/articles/baseimages/#creating-a-simpl... ), adding only the files that are really needed at runtime.

Yep cool idea. This is being addressed by the (not yet merged) docker nested build feature:


I propose a third layer: Integration with an application checkpoint library, like CRIU[1], to snapshot the running container. CRIU already supports snapshots of docker containers[2], just not correctly restoring them.

[1] http://criu.org/

[2] http://criu.org/Docker

The problem with this is when you introduce applications with arbitrary dependencies. If your language can spit out a binary which is truly statically compiled, like Go, a Makefile or simple shell script such as https://github.com/nathanleclaire/sfserver/blob/master/build... will do you just fine, but if your app is Ruby or Javascript, requires dynamic linking, etc. the task just got a lot harder.

I'd love to see more research in the area.

The Docker hub is - after such a short time - an even darker place than the wordpress plugin registry and is already a source of security problems and a useless waste of bandwith, time and effort. Besides too many amateurs publishing BS, the real problem is that the company behind this is not taking responsibility to assure the quality of their containerized-app-store. This does not have to end in censorship, like our beloved Big Brother Apple does - some automated checks on each uploaded image could be a way to go plus a team of reviewers, that approves everything uploaded. Of course, the amount of information attached to any image currently is a joke, the whole hub is a one-day-of-work prototype that never should have been published in the first place in this premature state, but now it´s too late, so there is no other way than burning it down and restarting it with some more thinking before.

Much better concept would be: share layers, not images, based on verified base images with preinstalled saltstack. This effectively boils down to sharing good and up2date provisioning scripts.

There are some more conceptual problems with the whole docker idea that are rooted in a "need-to-productize-quick" infected thinking and do make everything seem immature and not really thought out - very basic problems that pop up with orchestration and networking should have been solved before releasing the product, now millions of half-assed "products" step into that gap and the result is a bizarr level of overcomplication of any infrastructure that was not possible before with virtualization alone, and still there are important things that "will be contributed in the future by somebody, hopefully".

Docker should not be a product itself with it´s own "market", but the basic docker ideas should be added to already existing concepts and inherit already existing infrastructure. The docker execution model should be a standard feature of any linux distribution with a standardized container modell (with some security added!) and the existing packaging infrastructure should be extended to handle what is needed to support it, including userspace updates and provisioning or on-the-fly rebuilds, so people can concentrate on writing provisioning scripts and not fighting another layer of system config BS. Getting rid of the VM is great, but building even more complicated overhead is totally absurd. Meanwhile something like Vagrant is a great thing to learn from.

Is minimizing the size of a docker image really the top priority?

I would hold that making docker images easy to use, transparent as possible, reliable, versatile and easy to use (did I mention that already? oops) are far more important priorities.

Admittedly, I use docker primarily for development/testing purposes and my use-cases are a bit different than the average production use-case, however, having a large toolbox easily accessible for me to use (yes, including the ability to ssh into the docker container) is invaluable to me.

I may be missing something here, but racing to make docker images "as small as possible" feels like a bit of premature optimisation.

If we were talking about shaving a couple of bytes off a layer, I'd agree that attention might be better focused elsewhere.

However, I've seen (and am guilty of) a few of the pitfalls in the parent post. When you suddenly start adding 100mb+ unnecessarily to a Docker image, it can have some nasty ramifications in devspeed, and also deployments. A classic way to accidentally dramatically increase the weight of a Docker image is to apt-get install build-essential.

What kinds of ramifications you might ask? Well, I live in Australia. I don't know if Docker Registry has a CDN POP here, but that extra 100mb tends to take a solid minute or two longer for me to pull down.

It's not a question of size but a question of semantics.

If you dump an OS in a container you are treating it like a lightweight VM (and that might be fine in some/many cases).

If however you restrict it to exactly what you need and it's runtime dependencies + absolutely nothing more then suddenly it's something else entirely - it's process isolation, better yet it's -portable- process isolation.


If it doesn't hinder runtime performace +/- 100MB of disk space is fairly benign. I understand how smaller images would be useful, but for my use case it doesn't help much.

If you are making services small and having lots of them and redeploying often it matters more. If you just have one at a time maybe you dont care.

yes, exactly how i feel too.

"Pin package versions" -- yes. One of the things that has been bugging me about Docker is that if you begin every Dockerfile with an `apt-get -y update`, you never know what you're going to end up with.

On the other hand, pinning every package that you install would end up being pretty verbose.

If you need to pin all your packages then it probably makes more sense to just have your own software package "layer" that imports your pinned base with the packages that are required.

Many organizations are actually doing this.

Yeah , if you use the Dockerfile, but pre-build images has tags and ID's that you can use to make sure you always get the same image.

I see a few reasons to build your own from the Dockerfile:

  - 1) You don't trust the image and want to build your own.
  - 2) You want to build something slightly different
  - 3) You want an up-to-date version.
2) is often solved by building your own image with the changes, and I think 1) is solved by the Automated Builds (?), but I haven't used them yet.

Depends on which Debian I believe, but Ubuntu doesn't update their package versions beyond security fixes within a single distro version.

So you'll generally want those fixes, unless they really broke something, which I'd guess would be somewhat rare.

Ubuntu also issues stable release updates (particularly to the latest LTS version) to fix regressions, major bugs, or occasionally update minor release versions. Some applications known to have good release procedures are also allowed updates (eg Firefox).

Why do an update if you prefer your packages to be pinned?

Depending on what your base image is, the pinned versions you want to install may not be available.

I hadn't even thought of pinning packages before I read the article; I'm kind of a sucker for apt-get update. I've been thinking a lot about Docker lately, and something in the back of my mind was like, "but what if you don't want the latest version of everything?"

One additional tip for readability is to replace the && with set -e at the top of any RUN commands that combine more than one command.


  RUN curl -SLO "http://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.gz" \
    && tar -xzf "node-v$NODE_VERSION-linux-x64.tar.gz" -C /usr/local --strip-components=1 \
    && rm "node-v$NODE_VERSION-linux-x64.tar.gz"

  RUN set -e; \
    curl -SLO "http://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.gz"; \
    tar -xzf "node-v$NODE_VERSION-linux-x64.tar.gz" -C /usr/local --strip-components=1; \
    rm "node-v$NODE_VERSION-linux-x64.tar.gz


Because it's more readable? It tells you at the top that any failure will fail the whole thing, and you can avoid the prefix `&&` which reduces the effective indentation level and noise.

You can use the same spacing for the ampersands as used for the semicolons (i.e. at the end of the line).

That said, obviously there's going to be different preferences...I was more wondering if there was some semantic difference.

> You can use the same spacing for the ampersands as used for the semicolons (i.e. at the end of the line).

Then if the lines are long (as they are here) they get cut off and it's much harder to notice that they're there at all.

Hmm, needs additional semi-colons.


I was trying to use the example given... Here's another, smaller example:

  # install wget without artifacts
  RUN set -e; \
    apt-get update; \
    apt-get install -y wget; \
    apt-get clean; \
    rm -rf /var/lib/apt/lists/*

I think parent meant you trade two &&s for one ;.

Personally I also don't think the meaning is as clear, and you now need to maintain the top line if you cut'n'paste. I suppose it's a matter of taste.

Docker images are like if some high school kids decided to sell mass-produced beer. Instead of writing down a recipe with the ingredients, measurements, temperatures, elevation, times, and distributor-sourced quality-assured materials, the kids go to random stores and buy whatever they think they need to make beer in huuuuge quantities. They make huge batches of beer so that they won't need to make it again for months or years. Then the next time they brew a batch, the beer tastes completely different, they say, "Oh, we might need to write some of this stuff down, and make sure it was the same as last time."

This patch is essential for creating minimal, best images with Docker. https://github.com/docker/docker/pull/8021

I hope nested build functionality get merged soon. From reading the last proposal minutes, I wasn't sure if it's been shelved for later?


It'll replace a bunch of unnecessary shell scripts that are currently required to get similar functionality working.

Great minds think alike? That sounds exactly what I wanted in my comment at https://news.ycombinator.com/item?id=8483679

Something the article doesn't touch on in its pursuit of a smaller image - when you run "apt-get install" or "apt-get upgrade" you should do "&& apt-get clean" in the same RUN command.

This will remove the .debs apt just downloaded and installed that are being cached in /var/cache/apt/archives, saving you a little disk space.

Thanks for pointing this out. In the "debian:wheezy" docker image, "apt-get clean" didn't seem to have an effect. I dug around a bit and found that, in this image, aptitude is configured to not cache the downloaded packages (via /etc/apt/apt.conf.d/docker-clean).

On one hand, I consider this a good default behavior for building docker containers, so I'm glad it's there. On the other hand, I didn't know about it until I investigated. It's strong evidence for the great point @amouat made regarding base images being blackboxes in most (all?) cases.

This is a great topic. I hope these docker image thingy gets a bit more mature and quickly.

i wanted to use public docker images and spin on it AWS and quickly use a sanity check/vet the app if it can be something I want to use internally or recommend for customers.

Realized there is nothing of such sort and started xdocker.io -- open source initiative.

Currently we support security monkey and ice (both from netflix).

Just love docker and learning quite a bit of tricks along the way.

This article just helps us to do our job better by following the best practices to build docker images.

I would also appreciate if experts on this can help us screen the docker files we have created and share the feedback with us. https://github.com/XDocker/Dockerfiles.

Great post. It shows there's a bit to do in Docker to make transient data a bit more user friendly.

There are lots of proposal sitting in Github, it'd be great to get more feedback.

Lately we've been using shell scripts to do a lot of boiler-plate container preparation. We copy that script in and run it at the beginning of the container build. The nice thing about doing things this way is that you keep your Dockerfile a little cleaner, you end up with less layers, and that layer will only rebuild if you change the source file.

The author mentions a good image is a "whitebox" if it publishes its Dockerfile on the Hub. Unfortunately this isn't really enough; many (even most) Dockerfiles depend on scripts and data files which aren't hosted on the Hub.

I would suggest the only truly whitebox images are the ones that can be recreated from github (or similar) repositories.

You're right that a published Dockerfile isn't enough to really know what went into the image, but I think it's the closest thing we have at the moment. I would love to see some tools for building truly minimal docker images from scratch.

Curious if the sizes quoted Debian netinst vs Ubuntu server, or a standard Debian install vs Ubuntu server? We base all our installs (regular VMs, not Docker) off of Debian netinst because we get to choose exactly what goes on to the server.

> Pin package versions

This is true for any packages you use, not just Docker. For example, rails gems. If you don't pin them your app will break at some point on an update. Always manually update packages and test before deploying.

"Thus it seems that if you leave a file on disk between steps in your Dockerfile, the space will not be reclaimed when you delete the file."

This must be a bug? Why should it legitimately behave this way?

From what I understand, docker caches the state of its world during each step. So you add a file, it caches, remove the file, it can't free that space from the layered filesystem. But if you create & remove in the same `RUN` command, it never gets persisted as a step, so doesn't take up space.

I don't think its a bug. Its due to the layered filesystem architecture. Deleting a file in a later build step is not going to remove it from the previous layers. It just adds a new layer with meta-data saying the file has been removed.

OP makes a good point about buildpack-deps being quite huge. My only complaint about the official images is that they take a LONG time to pull.

Thanks for writing such stuff. Too many images available today are just a pain in the ass to use!

This is Gold, if only for the info about temp/transitory files and resultant image size.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact