
Reducing Docker Image Size - aberoham
https://www.ardanlabs.com/blog/2020/02/docker-images-part1-reducing-image-size.html
======
yjftsjthsd-h
Seems to be a pretty decent overview; covers the usual suspects (multi-stage
builds, FROM scratch, non-scratch minmal images, ldd to check libraries), with
some nice bits that I'd not seen before (busybox:glibc). I would be curious to
see how these base images stack up against Google's "distroless" base images
([https://github.com/GoogleContainerTools/distroless](https://github.com/GoogleContainerTools/distroless)).
I also appreciate that they call out Alpine's compatibility issues (on account
of musl) but still leave it as a thing that can be good if you use it right.
(Personally I'm quite fond of Alpine, but I don't bother when using binaries
that expect glibc.)

~~~
yegle
One important thing missing from BusyBox:glibc is the SSL certificates that's
almost always needed if you are writing things like crawler etc.

As for Alpine, at least for packaging Python code, avoid as much as possible.
It cannot reuse manylinux packages on PyPI and have to recompile C modules.

~~~
gopalv
With Alpine, the size advantages of musl sometimes come with hidden overheads
as well.

Musl's malloc() seems to be a fair bit slower than glibc's when you start to
throw in more threads at it (fragments faster?).

And then if you try to replace that with jemalloc, you'll quickly find out
that jemalloc does not actually get used.

At some level, the difference between a 1Mb docker image and a 9Mb one isn't
that significant (& I asked for some things like strace, ping and nslookup to
be in it, because it gets impossible to debug the container when it obviously
caches a DNS record or at least, looks like it is doing that).

------
paxys
While such articles are usually helpful, I'd caution that making individual
image sizes as small as possible shouldn't really be your goal.

As a simple example - if you have 100 unique images in your system, having an
image size of 1 GB each where 99% of it is derived from a common layer is
going to be a lot smaller overall than "optimizing" the size down to 100 MB
each but taking away the base layers.

~~~
leipert
In my experience you 100 unique images might still have different base images.
So you end up with too many layers. Especially with interpreted languages
those python, ruby or node runtimes add up quickly

Also depending on usecase (e.g running a CI system) you might end up with high
storage costs (caching) or high bandwidth costs (no caching)

Depending where your datacenter is, coworkers with slow connections might
suffer as well. Hey, let’s just download 12GB of my everylanguage-and-
Tex:latest every workday.

But I agree, docker golfing to have minimal images is not always needed, but
don’t let your images grow like a wildfire. They are part of your
architecture.

~~~
paxys
A standardization for base images, OS etc. is just something that has to be
enforced at the organization level.

~~~
wmf
But if you do that you're giving up all public images. It makes sense in a
large organization but there are plenty of other use cases for Docker.

~~~
cpitman
If you are using public images, then you are already giving up on optimizing
image sizes anyways.

~~~
wmf
But if you are making a public image you are doing your users a favor by
minimizing the size.

------
thinkingemote
Why is image size important? Should we instead optimise for speed of build?

If storage is cheap. And CPU costs Co2 does it make sense to spend longer time
and more energy to save disk space?

~~~
dcherman
It's not just the raw size of the image, but also about what the image
includes; a smaller image often reduces the potential attack surface because
vulnerable things _just aren 't there_.

That's one of the major rationales behind the distroless images. Being space
optimized is just a really nice side effect.

~~~
carterehsmith
>> a smaller image often reduces the potential attack surface because
vulnerable things just aren't there

By the way, the article proposes blind download of artifacts from someplace on
the internet, on every build. Not only that can cripple your builds when the
source is down (which happens all the time), it can (and that has happened)
send you arbitrary infected crap instead of what you wanted.

------
danielhlockard
It's worth noting that golang builds can be smaller than that with `GOOS=linux
go build -ldflags="-s -w" .` (assuming a build on macos for linux.) From there
I usually run `upx --ultra-brute -9 program` before dropping it into a
`scratch` docker container (plus whatever other deps it needs).

~~~
zxcmx
Does the UPX help much on top of Docker's layer compression? I can imagine it
might (special purpose vs general purpose) but haven't tried it.

~~~
danielhlockard
Just saw this reply, yes it does.

------
avip
This is indeed a good overview.

It would have been _a great_ overview had it started with briefing readers
about why (or when) image size should bother us at all.

------
snicker7
Are containers really the best abstraction for providing application isolation
/ portability? What about snaps/nix/guix instead?

~~~
yjftsjthsd-h
They're probably not _best_ at any of portability, isolation, or security
(they probably lose to full VMs on all counts). But they're good enough _and_
way more convenient / easier to use.

------
jiofih
Strange advice to not use Alpine, but then recommends an empty image or
busybox...

~~~
yjftsjthsd-h
I'm pretty sure that's 100% a glibc/musl thing.

