Hacker News new | past | comments | ask | show | jobs | submit login
Intro Guide to Dockerfile Best Practices (docker.com)
436 points by rubinelli 10 days ago | hide | past | web | favorite | 85 comments

As is the case with the Docker's best practices for Dockerfiles in the official documentation, they're leaving out some really important details.

Specifically, they don't really express how Docker packaging is a process integrating the way you build, where you build, and how you build, not just the Dockerfile.

1. Caching is great... but it can also lead to insecure images because you don't get system package updates if you're only ever building off a cached image. Solution: rebuild once a week from scratch. (https://pythonspeed.com/articles/docker-cache-insecure-image...)

2. Multi-stage builds give you smaller images, but if you don't use them right they result in breaking caching completely, destroying all the speed and size benefits you get from layer caching. Solution: you need to tag and push the build-stage images too, and then pull them before the build, if you want caching to work. (Long version, this is a bit tricky to get right: https://pythonspeed.com/articles/faster-multi-stage-builds/)

3. Don't run processes are root, most of the times you don't have to, it's an easy easy win and protects against many docker security CVEs.


To be fair the official documentation Dockerfile best practices does mention this (although it also gives contradictory advice).

The documentation may do but many of the official images run as root and provide little to no documentation on how to change that, the Bitnami images in comparison are light years ahead.

A simple multistage example for Golang apps would be:

  FROM golang:1.12.4
  WORKDIR /opt/src/github.com/project1/myprog/
  COPY . .
  RUN go get -d -v ./...
  RUN CGO_ENABLED=0 GOOS=linux go build -a -mod=vendor -o myprog .
  FROM ubuntu:latest
  RUN useradd -u 5002 user1
  FROM scratch
  COPY --from=0 /opt/src/github.com/project1/myprog/myprog .
  COPY --from=1 /etc/passwd /etc/passwd
  USER user1
  ENTRYPOINT ["./myprog"]

Indeed - and there's no mention of any of this at https://hub.docker.com/_/golang - compare with https://hub.docker.com/r/bitnami/nginx that discussed non-root users and how to use it.

I ended up expanding this into a blog post covering not the linked article (though it shares some of the same flaws), but the official docs: https://pythonspeed.com/articles/official-docker-best-practi...

I honestly don't understand this sentiment (other than shamelessly plugging your posts, which is ok).

The tradeoff in any packaging mechanism is always "pin everything to the byte" for maximum security and minimum updateability vs. "blindly update to latest without thinking". We normally develop with the second and deploy using the first and docker is no different.

This isn't really about "getting the latest version", it's about "getting the latest security patches for a stable version."

The presumption here is that you're running on a stable base image, Debian Stable or Ubuntu LTS or something. Updates to system packages are therefore basically focused on security fixes and severe bug fixes (data corruption in a database, say).

Even if you're pinning everything very specifically (and you probably should), you still want to at some point get these (severe-bugfix-only) updates on a regular basis for security and operational reasons.

Blind reliance on caching prevents this from happening.

> This isn't really about "getting the latest version", it's about "getting the latest security patches for a stable version."

Typically security patches trigger new releases with minor/patch version number bumps, which are then installed by getting the latest version of the package. That's pretty much the SOP of any linux distribution.

> Solution: rebuild once a week from scratch.

I’ve noticed the official docker images don’t seem to do this. E.g. the official “java” images seem to be uploaded and then are never changed, the only way to get a newer underlying base system is to upgrade to a newer version tag release. Is this true of all the official images, I wonder?

Using tagged upstreams is a good idea as it puts you in control of forcing an upgrade.

Best combo is to pin to a specific tag, that you periodically update to the latest stable release, and also allow overriding via a build arg. Anyone who wants the bleeding edge, say for a CI server, can run a build with “latest” as the tag arg.

The Python ones seem to be rebuilt much more frequently (last update was 10 days ago). This is perhaps because it depends on pip which has frequent releases.

We check once a day to see if the upstream repo has been updated and build our base images. I have used versions of this with clients. https://github.com/boxboat/auto-dockerfile

This is not true. The images are rebuilt automatically when base images are updated.

What is the source for that? When I looked into this before, I wasn't able to find anything in the documentation stating this would happen.

Here's the official Node.JS image from a couple years ago, for example...

  $ sudo docker inspect node:6.1 | grep 'Z"'
         "Created": "2016-05-06T21:57:54.091444751Z",
             "LastTagTime": "0001-01-01T00:00:00Z"
Node 6.1.0 was released on May 6 2016, it looks to me like the image was never changed after that? And if I run `ls -lah /var/lib/dpkg/info/*.list` inside the image, I get a modification time of May 3, 2016 on all the files... I tried the "node:10.0" image as well and I see similar behavior.

This is how the official images work.


Each image has a manifest with all the source repos, tags, what commit to pull from, what arches are supported, etc.

As long as the tag is listed in that manifest, it is automatically rebuilt when the base image changes.

Perhaps we’re talking past each other? When I go to https://github.com/docker-library/official-images/blob/maste...

It only is showing the newest version of node 8.16 listed in the manifest file. In other words, if I had an image based off node 8.15, it isn’t going to be updated ever.

So it’s not a matter of just rebuilding regularly, if you aren’t updating your dockerfiles to use newer language versions, you also aren’t going to get system updates.

Edit: I think i do see your point which is that if you are completely up to date on language versions, clearing the build cache every once in a while may still help get a system update if an upstream image is changed in between the release of a new language tag.

Yes, and it is mostly up to the maintainer of the image on how to handle tags. Typically minor patch releases are not kept around once the new patch is out. May be worth filing an issue if this is problematic?

> Solution: you need to tag and push the build-stage images too

With BuildKit cache for all the intermediate stages is tracked. You can push the whole cache with buildx or inline the intermediate cache metadata in buildx or v19.03. `--cache-from` will pull in matched layers automatically on build. Can also export the cache to a shared volume if that suites you better than a registry.

I find it odd that docker pipelines don't cache all the layers by default. That is to say, that they don't push all cached layers to a central repository. I'm not even sure if it's possible.

Is there a reason why this would be bad? Clearly you'd have to clean the old cache layers regularly, but I'm more concerned with some layers taking very long times - the caching seems required.

Though I suppose if you set up your CI system to store the caches locally then you get caching, and it's more efficient as you're not downloading layers. So maybe that's just the "right" way to do it, regardless. /shrug

Sometimes multi-stage builds are used to get build secrets into the build image. The final image just copies the resulting binaries over, and so you don't want to push the build image since you'd be leaking secrets. (I talk about alternatives to that here, since caching the build image is something you do want: https://pythonspeed.com/articles/docker-build-secrets/).

Eventually this will be unnecessary given (currently experimental) secret-sharing features Docker is adding. But for now pushing everything by default would be a security risk for some.

Just to add onto your second point for others who might not be aware, the experimental secret and SSH agent forwarding features have greatly simplified a lot of Dockerfiles I work on. SSH forwarding in particular has been really helpful for dealing with private dependencies.

There's a good summary here: https://medium.com/@tonistiigi/build-secrets-and-ssh-forward.... The tl;dr is you can now write lines like "RUN --mount=type=ssh git clone ssh://git@domain/project.git" (or any command that uses SSH) in a Dockerfile to use your host machine's SSH agent during "docker build". You do currently need to specify experimental syntax in the Dockerfile, and set the DOCKER_BUILDKIT environment variable to 1, and pass "--ssh default" to "docker build", but it's a great workflow improvement IMO.

Buildkit has some interesting things around storing build cache information in an image.

The flipside of 1. though is that cached images can't be hijacked and have vulnerabilities injected, or from a non security standpoint, have breaking changes introduced which mean the image no longer builds and/or runs how you're expecting.

Use the experimental BuildKit Dockerfile frontend for much improved build time mounting: https://github.com/moby/buildkit/blob/master/frontend/docker...

* You can mount build-time secrets in safely with `--mount-type=secret`, instead of passing them in. (Multistage builds do alleviate the problems with passing secrets in, but not completely.)

* Buildkit automatically parallelizes build stages. (Of course!)

* Mount apt-cache dirs in at build time with `--mount-type bind` so that you don't have to apt-get update every single time, and you don't have to clear apt-caches either.

And lots more.

Notice that this mostly involves capabilities that Docker already has to build time.

But in order to build with BuildKit you MUST BE connected to the internet during the build... So not ready to go yet...

I assume you mean the helper for COPY/ADD is pulled from the registry. That is not the case since Buildkit v0.5 / Docker v19.03 . If you have images you use locally, network connectivity is not required, otherwise, BuildKit will verify that the mutable tags have not changed in the registry (eg. it doesn't use the string "latest" to verify the validity of cache).

The message without internet connection is: ERROR resolve image config for docker.io/docker/dockerfile:1.0-experimental (the most current version of Docker Desktop for Windows - with Docker Engine 18.09.2)

As I mentioned you need to have the images you use available locally (eg. `docker images | grep docker/dockerfile`). In your Dockerfile you explicitly say that you want to use docker/dockerfile:1.0-experimental from the hub as a build frontend. If there is no such name/tag locally it needs to check the state in the registry as `1.0-experimental` tag is updated on new releases.

Which is indeed not Docker 19.03 :)

Why in the world would you need an internet connection to do this? Never used nor heard of build kit but that seems crazy.

BuildKit is an experimental replacement for docker build from Docker Ltd. When building it fetches some metafiles from the internet. So you can not use it in isolated environments...

Exactly. It's experimental, so this WIP compromise in implementation is understandable. I'm very grateful for the work on BuildKit.

Yeah but why would you build in that kind of tech debt in the first place? It seems a lot of effort to host meta files and keep that available than some other solution that doesn't require internet access.

I take it as a temporary measure that will be undone when it's past being experimental.

If I were to propose to my team that we install a temporary measure in the form of an additional service that needs availability they would smack me down, because that's a pretty stupid temporary measure.

Software is generally published online. Docker image repositories are generally online. The very act of building a Docker image pulling in dependencies is generally done online.

It was a deep hack into the entirety of the Docker build chain. This way it was probably possible to publish experimental work-in-progress build features to the world at a faster pace than the official Docker release cycle.

And as pointed out above, the online-only requirement has been lifted already.

The situation is perfectly understandable, and to be commended that people offered to do this work for the betterment of all.

Thank you! I didn't know that!

If you use multi-stage builds be aware that COPY --from also can do a chown and save image size - doing a RUN chown -R that is sometimes necassary to run stuff as a regular user duplicates the image size because changed metadata equals a copy for Docker.

Also if you dare to enable user-namespaces for Docker because, well also security - multi-stage builds fail (https://github.com/moby/moby/issues/34645)

COPY --chown

COPY --chown is a big improvement, but the way that COPY works differently than a Unix cp can be an impediment. (It flattens directory structures.)

Tip #6 (Use official images when possible) is certainly convenient when you're just spinning up something (I use them in local docker-composes all the time), but it's surely opening yet another security hole when it comes to prod. We're not lacking examples where packages are hijacked (feels like it happens constantly on npm, rubygems had it just the other day...), and docker hub has already had one security breach.

Perhaps worth a mention in this blogpost?

You can use the official images and tag them with the SHA image ID - that should give cryptographically enforced security and reproducibility.

Actually, if you use the SHA256 of the image as a reference for the FROM (ex: tomcat@sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f), and if there is no tag associated to that SHA, there is a chance that the Docker Registry will garbage clean it. The Docker Inc garbage cleaning frequency is not very documented.

Hello, I am one of the engineers on Docker Hub. If the image was ever pushed via a tag which must be the case if it was done via docker CLI then that image is never deleted (unless the tag is deleted from Hub UI and no other tag refer to it). This means if sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f was referred by latest tag which was pushed to another image sometime later then `FROM tomcat@sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f` will continue to work. Apologies for not having this documented. I'll work on getting this documented on https://docs.docker.com/docker-hub/.

there is no other behaviour that you’d want for that situation though... garbage collection is necessary, and if your cryptographically pinned content is deleted, you want something to fail rather than change to different content

This counts for absurdly little in the Docker ecosystem

Huh? Why wouldn’t it protect you from malicious modifications to the image in the future? Past mods might be a problem, but these are the official images we’re talking about.

Can you explain?

For Java it’s better to use Jib Gradle/Maven plugin from Google. It produces docker image directly and creates layers with dependencies and class files.

I've read several "Dockerfile best practices" would-be tutorials, and this one stands out as both correct, concise, well explained, and ordered from simple and important to more nuanced. To the author - great job.

Could someone explain why tip#9 is a good idea? To me it makes more sense to build the application in the CI pipeline and use Dockerfile only to package the app.

The post is focused on Java apps but, for example, there is a distinction on runtime and SDK images in .NET Core. If you want to build in Docker, you have to pull the heavier SDK image. If you copy the built binaries to image, you can use the runtime image. I guess there could be similar situations in other platforms too.

Other than that, it looks like a decent guide. Thanks to the author.

For me the big advantage of doing more in docker and lees in the CIT environment is that I have less lock-in/dependency to whatever my CI provider does. I try to reduce my CI scripts to something like

    docker build
    docker run image test 
All complexities of building and collecting dependencies go in dockerfiles, so I can reproduce it locally, or anywhere else. And importantly, without messing with any system settings/packages. No more makefiles or shell scripts that make a ton of assumptions about your laptop that need to be set just right to build something from source; just docker build and off you go. Such a hassle when you need to follow pages of readme just to build something from source (plus a lot of installed dependencies that you have to clean up afterwards)

The same problems that apply to production environments also apply to CI systems - you need to make sure those build agents are project-aware, up to date, and if you decide to move to a new JDK on one project you'll need to update your build servers, and good luck to you if you want to update only some of your projects.

The appeal of docker is completely & reproducibly owning production (what runs on your laptop runs on prod), and that also applies to the build (what builds on your laptop builds on prod). Not to mention the add on benefits that you can now use standard build agents across every tech stack and project, no need to customize them or keep them up to date, etc.

With multi-stage builds you get a bunch of benefits. You can pull the heavy SDK when you start building the app, and that gets cached. Then when you package the image, you copy the jar that was built, but not the heavy SDK. When you run this again, the heavy/expensive steps are skipped because they're cached. Now you have a single set of operations to build your app and its production image, so there are no chances for inconsistencies.

In addition, you can build a separate container from a specific part of your multi-stage build (for example, if you want to build more apps based on the SDK step, or run tests which require debugging). So from one Dockerfile you can have multiple images or tags to use for different parts of your pipeline. The resulting production image is still based on the same origin code, so you have more confidence that what's going to production was what was tested in the pipeline.

Furthermore, devs can iterate on this Dockerfile locally, rather than trying to replicate the CI pipeline in an ad-hoc way. The more of your pipeline you stuff into a Dockerfile, the less you have to focus on "building your pipeline".

As I read it, the tip is to always build in a consistent environment. I think a CI pipeline counts in that regard.

The way I read it, they're more talking about when you're developing locally everyone should be building the application inside of a container rather than on their personal machines with differing set ups.

I thought about using Docker for a reproducible build environment but, in that context, found it problematic that every time a Dockerfile is built you may end up with new base images and different package versions. That's hardly reproducible.

Perhaps I'm coming at this from a wrong angle.

Build a java application with RHEL-base-jdk:10 and from commit id:[HASH], that is as reproducible as it gets. As is stated you can save all the build dependencies in an Docker image, so you can go back and look at it later. Usually it is enough to have the build logs though.

We set system time as well.

If you use the latest tag of your base image, this is definitely true. It can vary by official image, but I’ve noticed a lot of them will tag a version and leave it untouched until a new version is released. But, it is technically possible they could push a change to an older version image.

Dockerfiles are a mostly adequate prototyping tool but are not great for generating production builds. Lack of modularity, cascading versioning / dependency management, reproducible builds, ... every time I've used Dockerfiles in anger I've cobbled together another 60% of a build system out of bash scripts.

I wish Dockerfiles would just fade away into the background, and be replaced by something more similar to an archiver but with better integration with repositories and versioning metadata.

There are some people trying to build alternative toolchains for Docker builds that remove Dockerfiles altogether, e.g. https://buildpacks.io/ and (in a very different approach) Nix-based builds.

My personal approach for Python applications' Docker packaging (https://pythonspeed.com/products/pythoncontainer/) was similar to yours: wrap the build process in a script. I wrote it in Python rather than bash, so it's more maintainable, but a Ruby shop might want to write theirs in Ruby, etc..

Dockerfile = docker run + docker commit + MANY unneeded limitations

Please don't downvote parent. This is true.

It is a fact that everything that happens in a Dockerfile execution via `docker build` can be done with a `docker run` and executing commands and ending with `docker commit`. Even the caching mechanism can be replicated. (Except with more control in my opinion.) It is also a fact that `docker run` has more capabilities than `docker build`, such as being able to mount things in.

Thanks as an occasional docker user this wouldn’t have occurred to me.

Could you elaborate?

One example: With Dockerfile you almost can not create your own slim Docker image in an isolated environment (without internet connection) if you need to install some packages from a mounted volume where you have downloaded cached dependencies - because docker build does not allow you to temporarily mount the cached dependencies. With docker run + docker commit combination you can do that without limitations. Latest experimental BuildKit has the option to make a temporal mount, BUT it can not work without internet connection...

You can do that with a multi-stage build with the cached packages in your context, or you can spin up an http server with the files and ADD the URLs, or set an http proxy environment variable for apt in the container, or add your mirror to the image's apt repositories list. You could do all that with a docker compose file, too.

If you do it with multi-stage build (where you would copy cached packages in the intermediate stage and remove it after being done) and copy over the whole root (/) from intermediate to the final stage then docker will make a new layer with the completely the same content and your image size will double -> this is a no go.

And yes, you can spin up an http server and serve the cached packages and setup up all this before using docker build, but hey I can do it with docker run + docker commit much quicker... (I wrote "you almost can not create" and not "you can not create" - so it is possible with Dockerfile, but is not trivial.)

> and copy over the whole root (/)

Why on earth would anyone do that? That's simply wrong on so many levels, and is far from being standard practice.

> because docker build does not allow you to temporarily mount the cached dependencies.

Why do you believe this point is relevant? If you can mount a drive then you can access its contents on the host, and if you can access files on the host then those contents are also accessible in a Docker build.

You CAN NOT mount a drive with "docker build". You CAN however mount a drive with "docker run". So therefore: Dockerfile = docker run + docker commit + MANY unneeded limitations

> You CAN NOT mount a drive with "docker build".

No one said that. I stated the fact that you can access files from the host during a docker build, thus it's irrelevant if you can mount drives or not. Just copy the files into your build stage and that's it.

docker = bash script + unshare + MANY unneeded limitations

But the nicer packaging and simpler model make that new layer of abstraction useful.

COPY caching doesn't work between computers.

This was surprising to me. I thought I could `docker pull` the layers from the registry and only re-build what had changed on my machine. But no, this doesn't work.

The reason is that the docker client archives the source files, including all the file attributes like uid,gid,mtime, ... Between two computers those are bound to be different.

I recommend you to upgrade to BuildKit. Also, the above isn't really true for Docker CLI, mtime was never taken into account in the old builder and Docker CLI never sends uname/gname. With API it was possible in the old builder though (hence the incompatibility with compose for example). In more practical cases I've seen it caused by git directory instead, that isn't stable. In that case, you can put .git into .dockerignore or if you don't use it, BuildKit will discard it automatically.

Order of copying is especially important with Java (Maven/Gradle) builds and NPM, it saves time.

Its also good to speed up builds with configuring them to cache artifacts in a separate layer before the build happens or you can get them to use the host machines cached .m2 /.npm folders as a volume, however that might not work with pipelines etc. that build the docker containers.

I highly recommend checking out https://github.com/docker/buildx . It’s still in tech preview but it’s an exciting look at the future of docker-build.

What about changing the default USER in the dockerfile?

I think you're not the only one that mentioned this omission, I'll throw in since it seems to be pretty poorly highlighted and has some surprising failure modes:

When you use the USER directive to set up a working user in Dockerfile, be explicit about UID and use the numeric UID.

What works 99% of the time is to use alpha username in the "USER" directive. You can get some surprising artifacts if the container runtime hits this rare bug[1]. There are likely several other great ways this can go wrong, as well. Somewhere deep in the manual, it is suggested that you should only use a numeric UID, even though USER accepts an alphanumeric username most of the time, but if you look at container images built by OpenShift and other pro docker-ers, you will see they always do this with a numeric UID.

[1]: https://forums.docker.com/t/unable-to-find-user-root-no-matc...

so if you want a vi in your image (the article has a vim line) you can get one (because they are occasionally useful) for a low cost by installing nvi instead of vim. It's great and only a few K.

You should not include binaries not needed for production for security reasons, not because of size alone.

Can you elaborate on this? If someone pwned my container and is able to execute commands, what does it change whether they get to use a text editor? Otherwise, what does it change whether unused programs are around?

I believe the whole point is that adding unnecessary software to a container also increases the attack surface unnecessarily.

My personal best practice is to never use docker.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact