

History of largest strongly connected component in Debian bootstrap - chei0aiV
http://bootstrap.debian.net/history.html

======
tyho
Can somebody explain this to somebody without a background in graph theory?

~~~
josch1337
On top of what somerandomone said (which is already correct) imagine you have
source packages A, B, C and D which depend on each other through the binary
packages they build and the binary packages they build depend on like this:
A->B->C->D->A

This would then form a simple cycle where a dependency has to be broken to
make it acyclic. A strongly connected component now is a graph where every
vertex is in the cycle with every other vertex in the graph.

So for example if we add a package E to the above simple cycle where B depends
on E and E depends on D, then we would not have a simple cycle anymore. Now
imagine scaling this up to a couple of hundred vertices and you see how this
quickly becomes a hideous mess.

I once tried to visualize such a strongly connected component and it looks
like this:

[http://blog.mister-muffin.de/images/hideous_mess.png](http://blog.mister-
muffin.de/images/hideous_mess.png)

------
derefr
I arrived at similar data from another perspective: it's really hard to
generate a _truly minimal_ Debian (or Ubuntu) Docker base-image. Everything in
the "Essential" package set needs everything else in "Essential" in weird
cyclic ways, even though a lot of those dependencies are effectively
"maintenance-time" or "configuration-time" rather than runtime dependencies.

Even then, from the perspective of building a Docker image in particular, a
lot of "Essential" is truly inessential: a Docker app doesn't run any OS
daemons, interact directly with the hardware, or perform startup functions.
You don't need an hwdb or a charmap database, you don't need grub or
mkinitramfs, etc.

Most interestingly, for a Docker _production_ image that you're constructing
with debootstrap, _you don 't need apt itself_... but just try convincing
debootstrap to not install apt.

~~~
josch1337
here is another thing:

Instead of using debootstrap to create your chroot environment, try using
multistrap. Multistrap uses apt to do package selection and not the "stupid"
dependency resolver that debootstrap uses. Thus, you have much more freedom in
selecting which packages you want. For example with multistrap it is possible
to create a chroot that only contains Essential:yes packages (that means no
apt).

------
the_kingsley
Johannes' fine graph seems to me to show that, over time, Debian tends to
depend on more packages.

I suppose it may suggest increasing code complexity and effort to maintain.

For what it's worth, I also tried to quantify how robust Debian is my
measuring the ages of packages installed on my main computer.

My results are at

[http://morse.kiwi.nz/kingsley/lib/exe/detail.php?id=technolo...](http://morse.kiwi.nz/kingsley/lib/exe/detail.php?id=technology%3Astart&media=technology:package_ages_on_a_debian_computer.png)

------
mkesper
So, in effect, it has become nearly impossible to build Debian as a whole from
scratch?

~~~
josch1337
I don't know how big the strongly connected component has to become so that
you might call bootstrapping Debian "nearly impossible" but last year there
were just two new Debian ports: arm64 and ppc64el. So apparently it is still
within human reach to build Debian as a whole from scratch. I think what the
graph shows is just that it's getting harder and that we need automateable
tools to do the whole thing for us. This is what the new "build profile"
syntax is for that was introduced with Jessie.

------
andrewvc
This is interesting but not exactly surprising data. Of course as the distro
has the core components have been linked to more often. I'm not sure what the
purported utility of this is.

~~~
josch1337
The purpose is, to have a somehow quantifiable measure how hard it becomes to
bootstrap Debian. Because the larger the central strongly connected component,
the more software is involved in the bootstrap process and has to either be
cross compiled or compiled with fewer build dependencies to break cycles.

Ultimately I thought this graph to support the introduction of build profiles
into Debian (now so happened with the Jessie release) as well as to promote
that a tool like "botch" (you can `apt-get install` it) is necessary to
automatically make this graph acyclic for you instead of doing this through
year long manual work.

------
audleman
I see that the largest connected component is growing with each release, and
is at an all time high. What's the implication here?

Is this bad? It it neutral?

~~~
josch1337
The larger that component is, the more work is needed to make it acyclic (to
derive a linear build order). Ideally this "work" is done automatically but
for that, more meta data has to be added. In Debian, source package
dependencies can be annotated with "build profiles" to mark dependencies which
are optional and can thus be used to break cycles. But more source packages
need this information. The graph is supposed to be a visualization to convince
people that it becomes increasingly hard to break cycles manually and that an
automated way (using build profiles) is needed.

