For image and layer manipulation, crane is awesome - as is the underlying go-containerregistry library.
It lets you add new layers, or edit any metadata (env vars, labels, entrypoint, etc) in existing images. You can also "flatten" an image with multiple layers into a single layer. Additionally you can "rebase" an image (re-apply your changes onto a new/updated base image). It does all this directly in the registry, so no docker needed (though it's still useful for creating the original image).
This is a great recommendation. It is worth noting that unlike Docker, crane is root- and daemonless which makes it work great in Nix (it's called 'crane' in the Nix repository). This allows for Nix to be used to manage dependencies for both building (e.g. Go) as well as packaging and deploying (e.g. gnu tar, crane).
Is there any performance benefit to having fewer layers? My understanding is that there's no gain by merging layers as the size of the image remains constant.
There are some useful cases — for example, if you're taking a rather bloated image as a base and trimming it down with `rm` commands, those will be saved as differential layers, which will not reduce the size of the final image in the slightest. Only merging will actually "register" these deletions.
Less performance and more security. Lots of ameteur images use a secret file or inadvertently store a secret to a layer without realizing an rm or other process in another layer doesn't actually eliminate it. If the final step of your build squashes the filesystem flat again you can remove a lot of potentially exposed metadata and secrets stored in intermediate layers
Eventually, once zstd support gets fully supported, and tiny gzip compression windows are not a limitation, then compressing a full layer would almost certainly have a better ratio over several smaller layers
If you've got a 50 layer image then each time you open a file, I believe the kernel has to look for that file in all 50 layers before it can fail with ENOENT.
It depends on your OCI engine; but this isn’t the case with containers. Each layer is successively “unpacked” upon a “snapshot”, from which containers are created.
A container runtime could optimize for speed by unpacking all those layers one by one into a single lower directory for the container to use; but at the cost of using lots of disk space, since those layers would no longer be shared between different containers.
In practice I've found the performance savings often goes the other way--for large (multi-GB) images it's faster to split it up into more layers that it can download in parallel from the registry. It won't parallelize the download of a single layer and in EC2+ECR you won't get particularly good throughput with a single layer.
Depends. If you would have to fetch a big layer often because of updates, that's not good. But if what is changing frequently is in a smaller layer, it will be more favorable
I found dive super useful for understanding how docker images work, and how to write efficient dockerfiles. Reading the docs is one thing, but making a change to the dockerfile and then seeing how it has affected the resulting layer structure is what really made me get it.
It really does sound amazing. Would have needed this when you guys (hn) and reddit helped me figure out what a rogue Raspberry Pi was doing in our server closet
I think I can answer for Docker. The first prototype was written in Python, the company was a Python shop. The main reason for a rewrite in Go was to ride the popularity of Go that was growing at the time (2012).
In hindsight, docker is probably much better off with Go, considering the use case. And I say that as someone who loves python and isn't too much into go!
> In hindsight, docker is probably much better off with Go, considering the use case. And I say that as someone who loves python and isn't too much into go!
Same. I use docker to escape the versioning hell that is modern python.
When you're trying to build a tool, the more self-contained the better.
Easiest language to (cross-)compile and distribute, stellar productivity to performance ratio, native (uncolored) concurrency, great networking capabilities in the stdlib.
Imagine if you will Docker and Kubernetes written in any of the other popular languages.
I have a question for big companies building software in Rust. How are they able to find the talent? Unlike most other common languages, it is magnitudes more difficult to ramp up entry level new hires in Rust for the simple reason that Rust requires more than normal understanding of Computer Science. Is it just the C++ devs that wanna transition or have already transitioned to Rust that those companies can recruit?
Kubernetes specifically is in go because google invented go and also invented Kubernetes. Their internal teams have a lot of go engineers due to the whole inventing it thing
K8s was a generational iteration of Borg, but it was a full re-write with an emphasis on making it more universally usable and pluggable.
> We've incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.
Borg was written in C++, but only contained container scheduling, resource allocation and some service discovery. Many other features of what is now Kubernetes were built later and essentially "shimmed" onto Borg.
Kubernetes was a re-write of Borg to rebuild many of its original features from the ground up using the lessons they had learned since originally building Borg. By this time, Go had been developed and was being actively used for many of these shims and supporting services surrounding Borg. Since the same team(s) were rebuilding Borg that had developed and maintained these other services, and because many of these shims and supporting services (which are already in Go) were being incorporated into Kubernetes, they decided to build the new version (which became Kubernetes) in Go.
I love dive and its something that I use in my tool kit multiple times a month.
I am curious if anyone knows how to get the contents of the file you have highlighted, a lot of the times I use dive to validate that a file exists in a layer and then I want to peak at it. Currently I normally revert to running the container and using cat or extracting the contents and then wandering into the folders.
Dive is an amazing tool in the container/Docker space. It makes life so much easier to debug what is actually in your container. When we were first getting started with Depot [0], we often got asked how to reduce image size as well as make builds faster. So we wrote up a quick blog post that shows how to use Dive to help with that problem [1]. It might be a bit dated now, but in case it helps a future person.
Dive also inspired us to make it easier to surface what is actually in your build context, on every build. So we shipped that as a feature in Depot a few weeks back.
This is less related to general container utilities but I’m an avid user of GoogleContainerTools/container-structure-test. It’s a handy way to run integration tests on container apps or images.
These Google open source projects seem to be in need of some TLC as a lot of the original maintainers have moved on, which is a shame. I try to throw a PR their way and close out the odd issue when I can. The testing tool in particular is invaluable to keep my sanity with a large amount of base images I have to maintain internally.
Dive is a gem. It's helped me find a lot of cruft ...
- unneeded build dependencies. Used a scratch image and/or removed build deps in the same step
- node_modules for dev-deps . Used prod
- Embeded Chromium builds (with puppetteer). Removed chromium and remoted an external build
Docker desktop now has this feature built in, but I've been using dive for years to find wasted space & potential security issues.
What's the reason docker uses tar archives instead of ordinary directories for layer contents? This tool is great but it fixes something that should not exist in the first place.
It lets you add new layers, or edit any metadata (env vars, labels, entrypoint, etc) in existing images. You can also "flatten" an image with multiple layers into a single layer. Additionally you can "rebase" an image (re-apply your changes onto a new/updated base image). It does all this directly in the registry, so no docker needed (though it's still useful for creating the original image).
https://github.com/google/go-containerregistry/blob/main/cmd...
(updated: better link)