
Docker-slim: Minify your Docker container image without changing anything - LinuxBender
https://github.com/docker-slim/docker-slim
======
yegle
Most people would benefit from the distroless base image:
[https://github.com/GoogleContainerTools/distroless](https://github.com/GoogleContainerTools/distroless)

It's a base image with binaries from Debian deb package, and with necessary
stuff like ca-certificates, and absolutely nothing else, while still glibc-
based (unlike Alpine base images).

Example images I built with the base image \- C binary, <10MB
[https://hub.docker.com/r/yegle/stubby-
dns](https://hub.docker.com/r/yegle/stubby-dns)

\- Python binary, <50MB
[https://hub.docker.com/r/yegle/fava](https://hub.docker.com/r/yegle/fava)

\- Go binary, 5MB [https://hub.docker.com/r/yegle/dns-over-
https](https://hub.docker.com/r/yegle/dns-over-https)

Another trick I use is to use
[https://github.com/wagoodman/dive](https://github.com/wagoodman/dive) to find
the deltas between layers and manually remove it in my Dockerfile.

~~~
baroffoos
What do you do when you need to debug an issue and the container contains no
utils?

I expect someone will leave a comment saying "But you shouldn't be entering
containers, you should be using Ansible/Kubernetes". Yes, that is how I manage
changes but sometimes you just have to log in and see what is going on with
htop/etc

~~~
londons_explore
My dream solution to this issue is a one liner command like:

    
    
        docker exec -it --augment=ubuntu my_container bash
    

Would start bash in the container, but also layer into the filesystem all the
rest of a standard ubuntu image _only for my tools, but not affecting the
application in the running container_.

I'm pretty sure that's possible with current linux kernel mount
namespace/overlayfs infrastructure used by docker - all that's needed is the
command line tool to support it.

~~~
dlor
The new ephemeral container support in kubernetes lets you do essentially
that. You bring the filesystem from another container image into the
PID/network namespace of a running container in a pod.

~~~
knodi123
lol, it's fun to imagine going back in time to explain what you just said to
my 2004 sysadmin self. back when I used to build servers, and colo them, and
physically maintain them.

------
itamarst
Alternatives if you don't want to risk missing some file that only gets loaded
10 minutes in:

1\. Start with a small base image, e.g. for Python there's "python:3.7-slim".
For Python I'm not a fan of Alpine, but for Go that gives you an extra small
base image (see [https://pythonspeed.com/articles/base-image-python-docker-
im...](https://pythonspeed.com/articles/base-image-python-docker-images/)).

2\. Don't install unnecessary system packages
([https://pythonspeed.com/articles/system-packages-
docker/](https://pythonspeed.com/articles/system-packages-docker/)).

3\. Multi-stage builds (in Python context,
[https://pythonspeed.com/articles/smaller-python-docker-
image...](https://pythonspeed.com/articles/smaller-python-docker-images/)).

You can find similar guides for non-Python as well. Basic idea being "don't
install unnecessary stuff, and in final image only include the final build
artifacts".

------
dcolkitt
I think the importance of small Docker images is generally oversold. I
regularly deploy multi-GB images on Google Cloud, and startup even on a fresh
node only takes 60 seconds or so. If the node's already hot (i.e. has cached
the image), starting a container takes no more than a few seconds.

I think what's more important is layering the Dockerfile in the right order.
You should be putting most of your large, infrequently changing assets in the
lowest layers. Then put smaller, more frequently changing assets in the top
layer. If you have a 4GB image, but only change the top 10MB layer, then it
only requires caching 10MB of new data when you update the container. But if
you change a lower layer, then it requires re-building and re-caching
everything above it.

~~~
cosmotic
Adding a minute to some deployment process can be fairly significant. The
docker push and pull is probably 70% of my deployment process and I already
have ~100MB or less images.

I don't think the concern is 'how long does deployment take' but 'how fast can
we iterate?'; Building and loading the images on a local dev machine to test a
2 second change would take much longer with larger images. Getting feedback of
a PR merge from a CI build agent would take minutes longer.

I don't think the importance has been oversold.

~~~
dcolkitt
> I don't think the concern is 'how long does deployment take' but 'how fast
> can we iterate?

And for iteration, the only thing that matters is the size of the top layers
that are being iterated on. Not the overall image size itself.

You can put `RUN apt-get [kitchen sink]` at the beginning of the Dockerfile
and it pretty much won't matter. When you change anything in the project
repository, that bigass giant base layer doesn't get re-pulled because it
doesn't change.

To validate that a layer, Docker daemon just compares the hashes. So, for
unmodified layers, docker pull is constant time with image size. The Docker
daemon only downloads from the registry starting with the bottom-most modified
layer relative to its cache.

If anything, throwing the kitchen sink in the base layer is better for fast
iteration. When you're being parsimonious about third-party libraries and
packages, you'll frequently have to rebuild the base image. If you `RUN apt-
get [everything]`, then you'll hardly ever never need to rebuild/re-push/re-
pull that layer, because you'll always have whatever you need already
available.

~~~
echevil
It depends on the size of project and what you want to optimize. The time
spent on push and pull of docker images used to be a real deal breaker once
upon a time when I tried to improve the deployment speed of a project by
switching from heroku build to docker based solution

------
gtirloni
It seems this is basically analyzing the running application inside the
container and only packaging what's needed to make it work, at a more granular
level than OS-level packages.

Interesting concept. I wonder how it expected to cover 100% of the app usage
if certain things aren't triggered during the analysis phase.

~~~
Twirrim
I guess the burden is on the app dev to ensure 100% functional coverage? That
seems a little "yikes".

~~~
meesles
Is that any different than anything else? Compilers, asset pipelines, and
build tools all work the same way: they make assumptions about how a system
works and try to optimize on those assumptions. Test your app, run QA, etc.
Most licenses make no promises that the software will work, so this tool
doesn't seem any different.

~~~
Twirrim
Wow, that's an oddly defensive response.

My point is merely that this is quite a significant risk. If you fail to
exercise 100% of your code paths via functional testing (so you've got to have
comprehensive positive and negative functional testing, which is pretty rare
in my experience), you risk producing an image with docker-slim that will
break. You've got to think about exercising every single possible interaction
with every other component running on an OS. That's no small feat.

Think about it. That's not just 100% of _your_ code paths, that's 100% of the
code paths that you could possibly ever trigger in any library that you
consume, and you have to think about what might influence those circumstances.
There's all sorts of angles to consider, e.g. Does latency of DNS response
matter? Does time of day matter? Does IPv4 vs IPv6 matter (answer is likely
yes in this case, so you might need to think about running the functional
tests coming from both address stacks).

docker-slim is a neat idea, but it seems to come with _significant_ risk.

------
alpb
It seems like this tool is built upon the assumption that the containerized
program will load libraries and read files while this tool is tracing it.

It seems like the biggest FAQ item is missing from the readme: What happens is
my container reads a file only every so often and this tool doesn't capture
it?

Also, do I need to keep images running for a while for this tool to minimize
the files in the rootfs? It seems impractical, especially in headless
environments like CI/CD.

~~~
kylequest
The temporary containers usually don't run for too long. The probes (http, for
now) are there to ensure that the app/services gets to do something useful
exercising different code paths. Still, it is possible that something could be
missing or you might want to keep something extra. For those cases you can
tell docker-slim what else you want to keep in your container image (it has a
few flags for that).

------
hrdwdmrbl
Their FAQ doesn't answer what it is that they are removing. Can someone shed
light on that? As others have said, it seems to watch your application running
and then remove anything it doesn't see your application using. Seems like a
very high-risk + high-reward method.

~~~
dkarras
Yes, this needs a "what is the catch?" section in readme.

~~~
kylequest
The main catch is that it targets application container images and not generic
base images. This is the most common gotcha many people encounter. There has
to be an app/service in the container that does something specific.

And, of course, it is possible that not all artifacts will be identified.
There are a couple of ways to mitigate this. First, you can create custom
probes for your app/service to make sure the app container can be analyzed
much better. Second, you can explicitly tell docker-slim what you want to keep
in your container image (you can specify files or executables)

------
slimsag
Very interesting, but I worry this will just break a lot of applications that
are run through it in subtle ways. For example, removing system packages can
have negative effects not noticed except in subtle edge cases like DNS
resolution.

------
tzickel
I've done a similar yet simpler hobby scoped project:

[https://github.com/tzickel/docker-trim](https://github.com/tzickel/docker-
trim)

Last time I've checked some of their open issues about cases theirs doesn't
work, worked on mine.

Also, mine is a few lines in python, if you want to learn how to trim a docker
image.

------
rudolph9
Is there somewhere with a good walk though of the constructs that docker
containers default to? Not these slim one the default docker one.

As someone who works with docker containers on a somewhat daily basis I only
have a vague idea of what they do under the hood and don’t have much of a
reference point when comparing this slim impl to the default one.

~~~
jbotz
I'm not completely sure what it is you want to know, but basically there are
two pieces to linux container tech (Docker and others). The first is a set of
Linux kernel features that lets us isolate various aspects of processes in
separate namespaces. The second is layered images, using filesystem features
like overlayfs and copy-on-write to avoid having to duplicated everything.
These two features of the Linux kernel are the real "container technology",
docker and others are basically just user interfaces for these.

A link about namespaces:

[http://ifeanyi.co/posts/linux-namespaces-
part-1/](http://ifeanyi.co/posts/linux-namespaces-part-1/)

A nice little introduction to overlays/etc:

[https://jvns.ca/blog/2019/11/18/how-containers-work--
overlay...](https://jvns.ca/blog/2019/11/18/how-containers-work--overlayfs/)

And if you _really_ want to learn how it all works, write your own "rubber-
docker" in Python:

[https://github.com/Fewbytes/rubber-
docker](https://github.com/Fewbytes/rubber-docker)

------
zzyzxd
Instead of starting with a bulky image and guess and remove unnecessary stuff
at the end, I would prefer the opposite: only include necessary packages for
the executable in the first place.

I am happy with my current distroless + docker multi-stage build.

------
aktuel
I wrote something similar but way simpler a while ago:
[https://github.com/ak-1/sackman](https://github.com/ak-1/sackman)

------
kylequest
i'll be happy to answer any questions about the tool (i'm the main author) :-)

~~~
mbu
This looks pretty amazing, thanks for sharing. I've just been playing with
this on a couple containers. The first went from 197MB to 71.5MB - not bad! I
did some testing and it didn't work without including a few extra pieces but
pretty painless.

The second container went from 176MB to 4.7MB!! I've not really tested that so
there's a pretty good chance things aren't going to work too well in practice
(but we'll see).

If it all continues as well as it's started then we'll definitely be using it.

~~~
kylequest
Do you mind sharing a bit of information about your container images? What's
the app language for both? What kind of application is it? How do you init
your apps in the containers? docker-slim is definitely not perfect and there's
lots of room for improvement. Can you ping me offline (on github/gitter,
twitter or email, my email is kcq.public@gmail.com)?

------
alex-ant
Would be nice to add a support for native docker flags (-t for --tag, -f for
--from-dockerfile, etc.)

~~~
kylequest
This is a great idea! Thank you for your feedback. Using native docker flags
as-is will definitely reduce friction and simplify the use. Do you mind
creating a Github issue if you have a specific list of flags you'd like to see
supported first?

------
antpls
Results are interesting. How are Go containers only 1.5MB, and Rust's ones
15MB? I would have expected Rust to be on par with Go (both compiled
languages)

