> Everything after that removes the manifest files and any temporary files downloaded during this command. It's necessary to remove all these files in this command to keep the size of the Docker image to a minimum. Smaller Dockerfiles mean faster deployments.
It isn't explicitly explained, but the reason why it must be in this command and not separated out is because each command in a dockerfile creates a new "layer". Removing the files in another command will work, but it does nothing to decrease the overall image size: as far as whatever filesystem driver you're using is concerned, deleting files from earlier layers is just masking them, whereas deleting them before creating the layer prevents them from ever actually being stored.
I mean, you can do both. Or, technically with that link you mentioned, all three.
You can use HEREDOCS to combo together commands that make sense in a layer, ensure your layers are ordered such that the more-frequently changing ones are further on in your Dockerfile when possible (this will also speed up your builds, ensuring as many caches as possible are more likely to be valid), and use mutli-stage builds on top of that to really pare it down to the bare necessities.
It may be due to my ninja level abilities to dodge learning more advanced shell mastery for decades, but to me it looks haphazard and error prone. Are the line breaks semantic, or is it all a multiline string? Is EOF a special end-of-file token, or a variable, if so what’s it’s type? Where is it documented? Is the first EOF sent to stdin, if so why is that needed? What is the second EOF doing? I can usually pick up a new imperative language quickly, but I still feel like an idiot when looking at shell.
syntax for multi-line strings is worth learning since it is used in shell, ruby, php, and others. See https://en.m.wikipedia.org/wiki/Here_document . You get to pick the "EOF" delimiter.
I know those questions are rhetorical, but to answer them anyway:
> > Nice syntax
> Is it though?
Before the heredoc syntax was added, the usual approach was to use a backslash at the end of each line, creating a line continuation. This has several issues: The backslash swallows the newline, so one must also insert a semicolon* to mark the end of each command. Forgetting the semicolon leads to weird errors. Also, while Docker supports line continuations interspersed with comments, sh doesn't, so if such a command contains comments it can't be copied into sh.
The new heredoc syntax doesn't have any of these issues. I think it is infinitely better :)
(There is also JSON-style syntax, but it requires all backslashes to be doubled, and is less popular.)
*In practice "&&" is normally used rather than ";" in order to stop the build if any command fails (otherwise sh only propagates the exit status of the last command). This actually leads to a small footgun with the heredoc syntax: it allows the programmer to use just a newline, which is equivalent to a semicolon and means the exit status will be ignored for all but the last command. The programmer must remember to insert "&&" after each command, or use `set -e` at the start of the RUN command, or use `SHELL ["/bin/sh", "-e", "-c"]` at the top of the Dockerfile. But this footgun is due to sh's error handling quirks, not the heredoc syntax itself.
> Are the line breaks semantic, or is it all a multiline string?
The line breaks are preserved ("what you see is what you get").
> Is EOF a special end-of-file token
You can choose which token to use (EOF is a common convention, but any token can be used). The text right after the "<<" indicates which token you've chosen, and the heredoc is terminated by the first line that contains just that token.
This allows you to easily create a heredoc containing other heredocs. Can you think of any other quoting syntax that allows that? (Lisp's quote form comes to mind.)
Thanks, this is the helpful reply I didn't deserve!
> This actually leads to a small footgun with the heredoc syntax: it allows the programmer to use just a newline, which is equivalent to a semicolon and means the exit status will be ignored for all but the last command.
This sounds like a medium-large caliber footgun to me, and while I don’t expect Docker to fix sh, it could perhaps either set sane defaults or decouple commands from creating layers? Or why not simply support decent lists of commands if this is such a common use case?
> This allows you to easily create a heredoc containing other heredocs.
Hmm, what’s the use-case for that? The only effect for the programmer would be to change the escape sequence, no?
> This sounds like a medium-large caliber footgun to me, and while I don’t expect Docker to fix sh, it could perhaps either set sane defaults or decouple commands from creating layers? Or why not simply support decent lists of commands if this is such a common use case?
Ha ha, I guess footgun sizes are all relative. The quirky error handling of sh is "well-known" (usually one of the first pieces of advice given to improve safety is to insert `set -e` at the top of every shell script, which mostly fixes this issue). So I don't think of Dockerfile heredocs themselves as a large footgun, but rather as a small footgun that arises out of the small interaction between heredocs and the large-but-well-known error handling footgun.
I don't know why Docker doesn't use `set -e` by default. I suppose one reason is for consistency -- if you have shell commands spread across both a Dockerfile and standalone scripts, it could be very confusing if they behaved differently because the Dockerfile uses different defaults.
I also don't know why the commands are coupled to the layers. Maybe because in the simple cases, that is the best mapping; and in the very complex cases, the commands would be moved to a standalone script; so there are fewer cases where a complex command needs to be inlined into the Dockerfile in a way that produces a single layer.
It would be really nice if the Dockerfile gave more control over layers. For example, currently if you use `COPY` to import files into the image and then you use `RUN to you modify them (e.g. to change the ownership / permissions / timestamps), it would needlessly increase the image size; the only way to avoid this is to perform those changes during the COPY, for example using `COPY --chown`; but COPY has very limited options (namely: chown, and also chmod but that is relatively recent).
Regarding native support for lists of commands, I don't really see much value since sh already supports lists (you "just" need to correctly choose between "&&" and ";"/newline).
> > This allows you to easily create a heredoc containing other heredocs.
> Hmm, what’s the use-case for that? The only effect for the programmer would be to change the escape sequence, no?
It can be useful to embed entire files within a script (e.g. when writing a script that pre-populates a directory with some small files). With most quoting schemes, you'd have to escape special characters that appear in those files. But with heredocs, you just have to pick a unique token and then you can include the files verbatim.
(Picking a token that doesn't appear as a line within the files can be a little tricky, but in many cases it's not a problem; for example if the files to be included are trustworthy, it should be enough to use a token that includes the script's name. On the other hand if the data is untrusted, you'd have to generate an unguessable nonce using a CSPRNG. But at that point it's easier to base64-encode the data first, in which case the token can be any string which never appears in the output of base64, for example ".".)
I like here-docs, but frequently I think just making small shell scripts to be invoked by RUN is better, eg putting apt invocations in something like: buildscripts/install_deps and simply RUN that from the Dockerfile.
Unfortunately this syntax is not generally supported yet - it's only supported with the buildkit backend and only landed in the 1.3 "labs" release. It was moved to stable in early 2022 (see https://github.com/moby/buildkit/issues/2574), so that seems to be better, but I think may still require a syntax directive to enable.
this has been a serious pain in my side for a while, both for my own debugging and for telling people I try to help "you're gonna have to start over and do X or this will take hours longer".
The older “Docker without Docker” blogpost linked from there goes into that, it’s one of the best deep dives into containers (and honestly one of the best pieces of technical writing full stop) I’ve come across. https://fly.io/blog/docker-without-docker/
An terbative to removing files or go through contortions to stuff things in a single layer is to use a builder image and copy the generated artefacts into a clean image:
FROM foo AS builder
.. build steps
FROM foo
COPY --from=builder generated-file target
(I hope I got that right; on a phone and been a while since I did this from scratch, but you get the overall point)
Unfortunately this messes with caching and causes the builder step to always rebuild if you’re using the default inline cache, until registries start supporting cache manifests.
You did this on the same machine, right? In a CI setting with no shared cache you need to rely on an OCI cache. The last build image is cached with the inline cache, but prior images are not
How would you do this in a generic, reusable way company-wide for any Dockerfile? Given that you don't know the targets beforehand, the names, or even the number of stages.
It is of course possible to do for a single project with a bit of effort: build each stage with a remote OCI cache source, push the cash there after. But... that sucks.
What you want is the `max` cache type in buildkit[1]. Except... not much supports that yet. The native S3 cache would also be good once it stabalizes.
Ah, sorry I misunderstood you. Yes, I don't tend to care about whether or not the steps are cached in my CI setups as most of the Docker containers I work on build fast enough that it doesn't really matter to me, but that will of course matter for some.
I never got around to implementing it but I wonder how this plays with cross-runner caches in e.g. Gitlab, where the cache goes to S3; there's a cost to pulling the cache, so it'll never be as fast as same-machine, but should be way faster for most builds, right?
the cache is small but if you have a `docker buildx build --cache-from --push` type command it will always pull the image at the end and try to push it again (although it'll get layer already exists responses), for ~250mb images on gitlab I find this do-nothing job takes about 2.5 mins in total (vs a 10 min build if the entire cache were to be invalidated by a new base image version). I'd very much like it if I could say "if the entire build was cached don't bother pulling it at the end", maybe buildkit is the tool for that job
This isn’t really useful for frameworks like rails, since there’s nothing to “compile” there. Most rails docker images will just include the runtime and a few C dependencies, which you need to run the app.
Pulling down gems is a bit of a compilation, which could benefit, unless you're already installing gems into a volume you include in the Docker container via docker compose etc. Additionally, what it does compile can be fairly slow (like nokogiri).
There are however temporary files being downloaded for the apt installation, and while in this case it's simple enough to remove them in one step that's by no means always the case. Depending on which gems you decide to rely on you may e.g. also end up with a full toolchain to build extensions and the like, so knowing the mechanism is worthwhile.
How would you go about copying something you installed from apt in a build container?
Say `apt install build-essential libvips` from the OP, it's not obvious to me what files libvps is adding. I suppose there's probably an incantation for that? What about something that installs a binary? Seems like a pain to chase down everything that's arbitrarily touched by an apt install, am I missing some tooling that would tame that pain?
It's a pain, hence for apt as long as it's the same packages, just cleaning is probably fine. But e.g. build-essential is there to handle building extensions pulled in by gems, and that isn't necessary in the actual container if you bring over the files built and/or installed by rubgygems, so the set of packages can be quite different.
Run "dpkg -L libvips" to find the files belonging to that package. This doesn't cover what's changed in post install hooks, but for most docker-relevant things, it's good enough.
The new best practice is to use the RUN --mount cache options. Making the removal of intermediate files unnecessary and speeds up the builds too. Surprised to see so few mentions of it.
As someone whose week was ruined by an overwhelming proliferation of `--mount type=cache` options throughout a complex build system, I'm not so sure.
Externally managed caches don't have a lifecycle controlled or invalidated by changes in Dockerfiles, or by docker cache-purging commands like "system prune".
That means you have to keep track of those caches yourself, which can be a pain in complex, multi-contributor environments with many layers and many builds using the same caches (intentionally or by mistake).
Something roughly equivalent is the medium-term solve, yes. Complicated by the fact that this pattern was used by rather a lot of different folks' code.
hard enough to stay synced with all the latest docker changes, let alone at an organizational level.
Example: in the last few years docker compose files have gone from version 2 to version 3 (missing tons of great v2 features in the name of simplification) to the newest, unnamed unnumbered version which is a merging of versions 2 and 3.
Yeah this is a very widely misunderstood or unknown thing about docker files. After the nth time explaining it to somebody, I finally threw it into a short video with demo to explain how it worked: https://youtu.be/RP-z4dqRTZA
Self hoisting here, I put this together to make it easier to generate single (extra) layer docker images without needing a docker daemon, capabilities, chroot, etc: https://github.com/andrewbaxter/dinker
> create a Docker image, also known as an OCI image
I don't think this is quite right. From my investigation, Docker and OCI images are basically content addressed trees, starting with a root manifest that points to other files and their hashes (root -> images -> layers -> layer configs + files). The OCI manifests and configs are separate to Docker manifests and configs and basically Docker will support both side by side.
I wanted to explain layers really bad, Sam Ruby even put comments in there about it, but I stayed away from it for the sake of more clearly explaining how Linux was configured for Rails apps.
It is really weird looking at Dockerfiles for the first time seeing all of the `&% \` bash commands chained together.
'apt-get clean' doesn't clear out /var/lib/apt/lists. It removes cached downloaded debs from /var/cacpt/apt but you'll still have hundreds of MiB of package lists on your system after running it.
Yes, but apt-get clean is still redundant [ed: because the upstream Debian/Ubuntu images automatically runs apt-clean via how dpkg/apt is configured - and your image should inherit this behavior]. Personally I'm not a fan of deleting random files like man pages and documentation - so instead of:
As I mentioned above, I recommend avoiding Ubuntu because it violates the principle of dev - prod parity. For any significant scale, Ubuntu will be left in the dust. Don't rely on system packages, build your own stream of minimal vendored (/opt/company) dependencies and keep them current because defaults are always old and don't consistently apply necessary patches for bugfixes and functionality.
Every command in a dockerfile creates a layer, and most Dockerfile builders cache each if that line hasn't changed. Dynamic callouts to run updates or check things on the web won't rerun.
Definitely, although it's worth noting that while the image size will be smaller, it will get rid of the benefits of sharing base layers. Having less redundant layers lets you save most of the space without losing any of the benefits of sharing slices. I think that is the main reason why this is not usually done.
Typically I wind up using a different source image for the builder that ideally has (most of) the toolchain bits needed, but the same runtime base as the final image. (For Go, go:alpine and alpine work well. I'm aware alpine/musl is not technically supported by Go, but I have yet to hit issues in prod with it, so I guess I'll keep taking that gamble.)
I take advantage of multi-stage builds, however I still think that the layer system could have some nice improvements done to it.
For example, say I have my own Ubuntu image that is based on one of the official ones, but adds a bit of common configuration or tools and so on, on which I then build my own Java image using the package manager (not unlike what Bitnami do with their minideb, on which they then base their PostgreSQL and most other container images).
So I might have something like the following in the Ubuntu image Dockerfile:
Which would then work backwards from the last layer and create copies of all of the layers where files have been removed/masked (in the later layers) to use instead of the originals. Thus if I'd have 10 different images that need to use apt to install stuff while building them, I could leave the cache in my own Ubuntu image and then just remove it for whatever I want to consider the "final" images that I'll ship, which would then alter the contents of the included layers to purge deleted files.
There's little reason why these optimized layers couldn't be shared across all 10 of those "final" images either: "Hey, there's these optimized Ubuntu image layers without the package caches, so we'll use it for our .NET, Java, Node and other images" as opposed to --squash which would put everything in a single large layer, thus removing the benefits from the shared layers of the base Ubuntu image and so on.
Who knows, maybe someone will write a tool like that some day.
It isn't explicitly explained, but the reason why it must be in this command and not separated out is because each command in a dockerfile creates a new "layer". Removing the files in another command will work, but it does nothing to decrease the overall image size: as far as whatever filesystem driver you're using is concerned, deleting files from earlier layers is just masking them, whereas deleting them before creating the layer prevents them from ever actually being stored.