Hacker News new | past | comments | ask | show | jobs | submit login

A common mistake that's not covered in this article is the need to perform your add & remove operations in the same RUN command. Doing them separately creates two separate layers which inflates the image size.

This creates two image layers - the first layer has all the added foo, including any intermediate artifacts. Then the second layer removes the intermediate artifacts, but that's saved as a diff against the previous layer:

    RUN ./install-foo
    RUN ./cleanup-foo
Instead, you need to do them in the same RUN command:

    RUN ./insall-foo && ./cleanup-foo
This creates a single layer which has only the foo artifacts you need.

This why the official Dockerfile best practices show[1] the apt cache being cleaned up in the same RUN command:

    RUN apt-get update && apt-get install -y \
        package-bar \
        package-baz \
        package-foo  \
        && rm -rf /var/lib/apt/lists/*
[1] https://docs.docker.com/develop/develop-images/dockerfile_be...



You can use "--squash" to remove all intermediate layers

https://docs.docker.com/engine/reference/commandline/build/#...

The downside of trying to jam all of your commands into a gigantic single RUN invocation is that if it isn't correct/you need to troubleshoot it, you can wind up waiting 10-20 minutes between each single line change just waiting for your build to finish.

You lose all the layer caching benefits and it has to re-do the entire build.

Just a heads up for anyone that's not suffered through this before.


That’s useful, thanks.

I’m confused why they haven’t implemented a COMMIT instruction.

It’s so common to have people chain “command && command && command && command” to group things into a single layer. Surely it would be better to put something like “AUTOCOMMIT off” at the start of the Dockerfile and then “COMMIT” whenever you want to explicitly close the current layer. It seems much simpler than everybody hacking around it with shell side-effects.


There's an issue on Github about that and it's been open for about as long as Docker has existed. Looks like they just don't care.


...or at least some syntax that represents layers like (block) scopes do in a programming language so it is visually easier to see what is going on.


This is huge, thanks for the lead. Others should note it's still experimental and your build command may fail with

> "--squash" is only supported on a Docker daemon with experimental features enabled

Up til now, our biggest improvement was with "FROM SCRATCH".


No problem.

  > Others should note it's still experimental and your build command may fail with
You might try "docker buildx build", to use the BuildKit client -- squash isn't experimental in that one I believe =)

https://docs.docker.com/engine/reference/commandline/buildx_...


Good to know. `FROM scratch` is such a breath of fresh air for compiled apps. No need for Alpine if I just need to run a binary!


Do keep in mind that you might want a set of trusted TLS certificates and the timezone database. Both will be annoying runtime errors when you don't trust https://api.example.com or try to return a time to a user in their preferred time zone. Distroless includes these.


Yea CA certs are the first pain point I hit. Worth the hurdle. Noted on the timezone. Never really thought about that one. Thanks!


Downside - squash makes for longer pulls from the image repository, which can matter for large images or slow connections (you keep build layers but now have no caching for consumers of the image). There's various tricks to be pulled that don't use squash - I've had the most luck putting multiple commands into a buildkit stage, then mounting the results of that stage and copying the output in (either by manually getting the list of files, or using rysnc to figure it out for me).


But then you end up with just one layer, so you lose out on any caching and sharing you might have gotten. Whether this matters is of course very context dependent, but there are times when it'll cost you space.


Had no idea about squash. Using cached layers can really save time, especially when you already have OS deps/project deps installed. Thanks!


Doesn't that squash all your layers though? That defeats the whole purpose of there being layers. Now instead of a huge total size, but pushing only a fraction of it, your total is lower but you're pushing all of it every time. Same goes for disk space if you're running multiple instances or many images with shared lineage.


Or build using the extra layers, and remove them by squishing the commands together once you've got the Dockerfile right.


You don't have to do this anymore, the buildkit frontend for docker has a new feature that supports multiline heredoc strings for commands: https://www.docker.com/blog/introduction-to-heredocs-in-dock... It's a game changer but unfortunately barely mentioned anywhere.


Yeah, been using this myself for a whole bunch of sanity. Until buildkit is the default though, I wouldn't expect it to gain too much traction.


Wow, that deserves its own Tell HN ;-)

Thanks for the tip!


Multistage builds are a better solution for this. Write as many steps as required in the build image and copy only what’s needed into the runtime image in a single COPY command


Is it an option to put all the setup and configuration into a script? So the Dockerfile becomes effectively just:

  RUN ./setup.sh
I have seen that in some cases as a way to reduce layer count while avoiding complex hard to read RUN commands. Also seen it as a way to share common setup across multiple Docker images:

  RUN ./common_setup_for_all_images.sh
  RUN ./custom_setup_for_this_image.sh
However this approach of doing most of the work in scripts does not seem common, so I'm wondering if there is a downside to doing that.


The downside of this is the same as the upside: it stuffs all that logic into one layer. If the result of your setup script changes at all, then the cache for that entire layer and all later layers are busted. This may or may not be what you want.

As a concrete example... if your setup.sh were:

  #!/bin/bash
  ./update_static_assets.sh
  ./install_libraries.sh
  ./clone_application_repo.sh
then any time a static asset is updated, a library is changed, or your application code changes, the digest of the Docker layer for `RUN ./setup.sh` will change. Your team will then have to re-download the result of all three of those sub-scripts next time they `docker pull`.

However, if you found that static assets changed less often than libraries, which changed less often than your application code, then splitting setup.sh into three correspondingly-ordered `RUN` statements would put the result of each sub-script its own layer. Then, if just your application code changed, you and your team wouldn't need to re-download the library and static asset layers.


I do this for all of the CI images I maintain. Additionally, it leaves evidence of the setup in the container itself. Usually I have a couple of these scripts (installing distro-provided deps, building each group of other deps, etc.).


During development or with any image, which you need to update rather often, you usually don't want to lose all of docker's caching by putting everything into one giant RUN directive. This is one, where early optimization strikes hard. Don't merge RUN directives from the start. First build your image in a non-optimized way, saving loads of build time making use of docker build cache.

Personally I would not merge steps, which have nothing to do with each other, unless I am sure, that they are basically set in stone forever.

With public and widely popular base images, which are not changed once they are released, the choices might be weighed differently, as all the people, who make use of your image, will want to have fast download and small resulting images building on top of it.

Simply put: Don't make your development more annoying than necessary, by unnecessarily introducing long wait times for building docker images.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: