
Optimising the disk footprint of GNU/Linux distributions for the Cloud (2013) - edwintorok
http://www.dicosmo.org/MyOpinions/index.php/2013/02/08/131-optimising-the-disk-footprint-of-gnu-linux-distributions
======
Hello71
paper:
[http://hal.inria.fr/docs/00/67/43/79/PDF/paper.pdf](http://hal.inria.fr/docs/00/67/43/79/PDF/paper.pdf)

basically the idea here is that you can select on attributes like "fewest
packages installed" or "smallest installed size" or "fewest changes to
existing system" of the resulting set rather than just picking the first
solution that satisfies the dependency tree.

I don't know how useful this is, given that (from the perspective of a package
maintainer) most code is written to use one specific library; iow, there are
few || dependencies in the tree, even for Gentoo, which supports almost every
configuration that upstream does. other distros will have significantly fewer,
given that their binaries will, by the nature of Linux linking, be linked to a
single set of libraries (assuming no dlopen).

------
derefr
Note that not all packages are created equal. The vast majority of base
packages are a few kilobytes each. The space on a minimal system (e.g. Ubuntu
core) is by far taken up by just a few packages: locales, charmaps, and
tzdata. (You might also count the kernel, initramfs, etc. if you’re trying to
create something like a Docker base-image.)

Speaking of Docker base-images, I really wish Emdebian Baked
([http://www.emdebian.org/baked/index.html](http://www.emdebian.org/baked/index.html))
was still an active project. There’s no reason “finalized” read-only cloud
images should have apt installed at all—and therefore no need for any of apt’s
dependencies (of which there are shockingly many), or any of the metadata each
package comes with to appease apt.

~~~
justincormack
To avoid all the locales and stuff, use Musl libc. Alpine Linux is an option
that supports it.

Redhat's Project Atomic can bake in RPMs using ostree.

------
asiekierka
Alpine Linux might be helpful for minimizing both RAM and disk footprint.

------
justincormack
The idea that you need 154 packages to run Apache is pretty sad. Part of it is
base system bloat, the rest is tge idea you better link packages with all
their possible options (unless you use gentoo).

~~~
_wmd
The multistrap script at
[https://gist.github.com/anonymous/50dfdfe99438e53c8e58](https://gist.github.com/anonymous/50dfdfe99438e53c8e58)
produces a 425mb Debian system including vim, Linux kernel, Apache, perl and
Ruby. After bzip2 that's 115mb, which is about a second's worth of GigE.

But that's assuming you even need to transfer or copy the base image, which in
a healthy setup it should be possible to avoid -- the majority of Linux
distributions bundle very similar content, and amongst virtual machines of the
same distribution, the base systems are almost identical. It wouldn't be
difficult to create a differential transfer protocol and/or virtualized block
device for this specific use case that minimized copies and transfers.

There is plenty of open source tech for this already. For example with LVM2
thin provisioning it's possible to clone and resize a VM image from 2GB to
40GB, ready for boot in under 2 seconds..
[https://gist.github.com/anonymous/fdcbcc27278c287ce5e0](https://gist.github.com/anonymous/fdcbcc27278c287ce5e0)

------
infocollector
Anyone can conver this article to a "apt-get install thinner" command?

