Dockerfile Tips from the Official Images

akerl_ · on Nov 17, 2014

#1: I'm still hopeful that we'll see folks stop treating containers as lightweight VMs that need a full OS and start treating them as processes. The fact that the recommended steps are to bundle a distribution's userspace to run a single process adds a lot of bulk for minimal gains.

#3: The ugliness of the RUN command shown should be a clear indicator that Something Isn't Right. I highly recommend building things externally and then pulling them into the container either manually or via a package format. You can build and package the software using a build container, and then your service containers can consume the package without needing to install build tools themselves. This saves you from long chained RUN commands and allows more logical separation of build tasks from the end product.

I agree with the other listed points wholeheartedly: do ensure you're checking the authenticity of downloads via appropriate means, do use the right utilities and source images for the task, and definitely make use of labels: they're a very powerful resource for managing images in the field.

wernerb · on Nov 17, 2014

#3: You could very well build a deb with docker (compile from source and package it) and then use that deb subsequently in other containers. Docker can be used in CI/CD processes very well. If that is the case, I like the way that Dockerfiles give me complete transparency over the build process, which manual building perhaps does not.

This point clearly states "If you compile code from source during your build" then you should clean the sources.

akerl_ · on Nov 17, 2014

My apologies for being unclear: I'm saying that in the case where you're building from source during the build of your service container, you're almost always better off building from source in a separate container and then either turning that into a deb or other consumable format.

amouat · on Nov 17, 2014

Maybe. But you do lose some transparency and repeatability - it becomes harder for others to understand and use your Dockerfile. They suddenly have to either download or recreate whatever debs/tarballs etc you've used.

I think the real problem is that Dockerfiles aren't expressive enough yet. At the very least we need a more user friendly method to run several commands in single layer.

amouat · on Nov 17, 2014

Author here.

I agree with the philosophy on #1, but what image would you rather see? You still need a big enough image that provides the dependencies for most common processes.

akerl_ · on Nov 17, 2014

I'm a fan of statically compiled binaries for things like this. Given that I'm already treating containers as immutable once deployed, and containers (processes) aren't sharing libraries, the normal issues with static binaries are already issues: if I need to patch a library, I'm going to need to rebuild my containers and I'm going to need to replace old containers with new ones.

That lets your container build be "add this binary, add some configs". If we really want to have shell access via docker exec or similar, we could base the image off of busybox or similar to get that.

For dynamic things that resist compiling down (I'm looking at you, PAR::Packer), I can see more reason for building in a minimal distro userspace, but ideally there you're using something that's more minimal based on your use case. For example, if I run a fleet of rails apps, I might build out an image that had busybox + static ruby, with my codebase added alongside it.

The minimal distro method has very powerful up-front value, which I suspect is why Docker is focusing on it: it allows users to dive in with a familiar environment and learn to do Docker things without having to discard a lot of their previous understanding. For that use case, it is magnificent. Users can use their existing understanding of packaging, of OS tools, and of service management and apply it when dockerizing their services.

The methods I'm describing would be the next step, roughly akin to writing out a new codebase and then identifying where to optimize it. Once a user is comfortable with Docker and is using it to run their services, they can make a determination as to whether or not pushing around distro userspaces is helping or hurting them, and react appropriately.

For example: a use case where each host is running service containers in a variety of languages or where the admin knows they're going to want to interact with userspace tools on the container. In that case, basing on Debian or such makes great sense.

Another case: each host is running a collection of containers with golang binaries to crunch the last digit of pi or design the cutest protein or whatever. There's no reason to base on a full OS: build your protein builder statically, drop it in an image, and you have an extremely lightweight image to throw into your fleet.

amouat · on Nov 17, 2014

I think in the majority of cases it's just going to be simpler to use a minimal base image like debian. And the overhead is pretty minimal - once you've got the base image, you only need to download the changes for other child images. Admittedly, you do incur the image overhead at least once per Docker host.

In short - horses for courses ;)

wernerb · on Nov 17, 2014

A problem I had with some dockerfiles, was that often they rely on debian/ubuntu upstream BUT rely on their own dockerfile container as an intermediary. I understand this, it makes it easier to maintain if you have a lot of docker containers.

But for me 'just pulling from the registry', I have to check out that intermediary dockerfile to ensure that no unexpected things happen or that the correct version of debian/ubuntu is referenced.

What I then do is fork their docker containers and replace FROM 'bla/ubuntu' to the official one, to regain control. But this sticks me with the job of then maintaining the container.

In short, I wish registry containers would just directly use the ubuntu/debian official images as some sort of guideline for general "public use" docker images.

Edit: Off-topic, would love some kind of graph view or some kind of dependency-depth indicator when browsing registry containers, or set it as a requirement.

amouat · on Nov 17, 2014

What do you mean by "rely on their own dockerfile container as an intermediary"? Things like the buildpack-deps image?

Graph view would be cool!

wernerb · on Nov 17, 2014

Basicly just a reiteration of your #1 point. I have seen containers where they create their own 'parent' containers that point to the ubuntu/debian base-image for their generic app containers. The extra layer irks me, i'd very much like to just have all app-containers depend on official containers. E.g. a project maintainer might install wget in their upstream container, then rely on that image for all its app-containers.

vidarh · on Nov 17, 2014

I think the issue there is that once you start building a number of containers, it's easy to start noticing patterns that repeats across your containers, and so it's very tempting to do what you mention to avoid repeating yourself.

I sort of agree with you, but I think it reflects a tooling problem in expanding dependencies more than anything.

After all, Docker introduces temporary images for nearly every step of the Docker file anyway, so if these "parent containers" are genuine dependencies it's beneficial to collapse the steps for individual app containers down to a set of shared ancestors as possible.

amouat · on Nov 17, 2014

Ok. I think the issue is more transparency than layers though - for example the Hub could let you click on FROM lines and have them expand in place.