
Show HN: distri: a Linux distribution to research fast package management - secure
https://michael.stapelberg.ch/posts/2019-08-17-introducing-distri/
======
jchw
There’s been some related work lately, too.

\- Nix packages are still archives, but they do have separate roots, and use
runpath to hardcode shared library paths into the Nix store.

\- rpm-ostree provides something of a hybrid between images and packaging,
allowing for more seamless OS upgrades while still supporting more or less
traditional packages on top.

I hope that in the near future, Linux package management can jump into the
next phase. Package management on UNIX likes already offered some neat
advantages with their complications, but it’s interesting to watch this occur
while Windows and macOS largely maintain the same model they’ve had basically
forever, for better or worse. (I guess on Windows and macOS there’s an
increased focus on an app-store-based distribution model, but it’s not really
full system package management. The most interesting bit is probably the
sandboxing.)

~~~
techntoke
The advantage of Linux packaging for distros like Arch become apparent when
you actually want to create a package, and realize how easy it is.

~~~
jchw
Arch is definitely nice, with PKGBUILDs that are basically exactly what you
want.

NixOS has its Nix language, and the Nixpkgs system, which definitely takes
some time and I'd even say is more cumbersome than PKGBUILDs, but it has its
advantages.

But going back, building packages for RPM or Dpkg based distros does not
really feel that simple or easy. I think easy packaging is something we have
today by virtue of simplifying things.

~~~
techntoke
Agreed 100%. I'm really impressed at how much is available in the AUR, and how
quick Arch gets updated, and I have to attribute it primarily to how easy it
is to maintain packages.

~~~
paulcarroty
> and how quick Arch gets updated

> and how often Arch will be broken after updates

If You Have Said A, You Must Also Say B (c)

~~~
jchw
I find this extremely untrue actually. Distros like Debian make you think that
a rolling release must be unstable by virtue of the fact that the underlying
software is unstable and it just isn’t true. Most of the software just isn’t
breaking. The downstream is just doing too much magic, and _that_ is breaking.

Arch is easily one of the most stable distros I’ve used Period. No hyperbole
at all. It is more stable than NixOS unstable, and much more stable than
Debian testing, and you don’t have to deal with the periodic massive shift
that breaks the world when you upgrade Debian or Ubuntu versions. It’s because
it’s simple. When it was a newer distro it had some issues, but for at least
the past few years it’s been very stable, and most packages require few
patches from upstream to work well.

Even using a bunch of AUR packages doesn’t usually lead to unstability in
Arch.

Upgrades that require manual intervention are documented on the front page:

[https://www.archlinux.org/](https://www.archlinux.org/)

I’ve hit two during my multi year span of using Arch. Worth noting that few
bugs will impact everyone since they are often scoped to packages that are not
installed by default.

------
return_0e
This is very interesting as this is almost identical to how the Haiku
Operating System does package management using their own packaging format
(hpkg) which uses packagefs. [0] [1]

This format is used more than just to package applications, but to update the
whole OS in a consistent manner [2] as it is also versioned in with shared-
libraries and this was implemented in 2013.

[0] - [https://www.haiku-
os.org/blog/zooey/2011-01-08_package_manag...](https://www.haiku-
os.org/blog/zooey/2011-01-08_package_management_first_draft/)

[1] - [https://www.haiku-os.org/guides/daily-tasks/install-
applicat...](https://www.haiku-os.org/guides/daily-tasks/install-
applications/)

[2] - [https://www.haiku-
os.org/blog/bonefish/2011-06-20_package_ma...](https://www.haiku-
os.org/blog/bonefish/2011-06-20_package_management_system_package/)

~~~
secure
Thanks for sharing! Repeating my comment from the other thread for visibility:

I learnt about the Haiku package management system when I had already
developed distri for several months.

I think it’s very telling that the two approaches look so similar, so I was
really happy to learn about the similarities with Haiku!

------
siscia
At work we avoid package manager altogether following the mantra that the best
package manager is no package manager at all.

All the software is installed in a global directory /cvmfs and distribute to
all the clients using FUSE and simple http servers.

Then different departments have ownership of different subpaths. General
software that can be useful to everybody is installed in /cvmfs/sft.cern.ch
software that is useful to only a specific collaboration is installed in, as
an example, /cvmfs/lhcb.cern.ch

~~~
secure
Thanks for sharing, this sounds cool! I hadn’t heard of cvmfs, but we use a
similar system at work, and it works really well.

~~~
siscia
Indeed, it is one of the, I would say, key technology developed at CERN that
we were unable to push enough in the industry.

What system do you use at your place?

~~~
secure
Learn more at
[https://landing.google.com/sre/workbook/chapters/eliminating...](https://landing.google.com/sre/workbook/chapters/eliminating-
toil/#case-study-2-decommissioning-filer-backed-home)

~~~
oneplane
Slightly off-topic, but do you happen to know what Google uses to draw those
flow charts? I've seen that format all over but I never figured what they use
to draw them.

~~~
Mathnerd314
The PNG metadata says "Adobe Illustrator CC 22.1 (Macintosh)", so I would say
that's what they used. Although they could have used Adobe to convert from
another tool, the use of Macintosh suggests a non-technical role.

~~~
oneplane
Yeah, I figured it was used to post-process it or something, but perhaps they
simply have a diagramming department that uses that.

------
codedokode
I don't think that possibly slow package manager is main Debian problem. There
are more serious issues:

\- bug tracker is email-based, not web-based, very old, and without keyword
search.

\- old versions of software

\- unnesessary duplication: repositories contain Python packages that can be
installed with Python package managers.

\- Debian is not friendly to third-party software, to closed-source software.
For example, to install Slack or VS code you have to add third-party
repository and give permanent root access to your system to Slack Inc or
Microsoft. You have a choice between backdooring your system or not using
third-party apps. Compare this to Android that is also linux-based but
provides a reliable and comfortable jail for every application except
Google's.

\- Debian doesn't protect your information from malicious software. For
example, if you install third-party app, it will be able to read your
browser's history and cookies in your home directory. Also, any unprivileged
program can read hardware identifiers like MAC address, hardware list, disk
serial number, BIOS data etc. This is perfect for tracking users even if they
reinstall or change their operating system. Again, compare this to Android.

\- package manager doesn't allow to install several versions of PHP or Node.JS
if you are working on several projects. But for language like Python they
allow to install two different versions, why Python gets such a privileged
treatment and PHP or Node doesn't I don't understand.

There are projects aiming to solve some of these problems like Snapcraft, but
as far as I am aware, they are not integrated into Debian yet.

Also, it is a bad idea to hardcode paths (like /use/share) within application,
so that you have to recompile it to change the path. All applications should
be portable by default to make installation into home directory easier.

~~~
johnmarcus
how does slack have permanent root access to my system again?

~~~
CameronNemo
The repo can theoretically hijack a package like util-linux, unless you
creatively set up apt pinning.

~~~
MayeulC
That, plus I guess the package could easily contain suid binaries.

------
oneplane
While I see the advantages and really enjoy this type of work (and the ones
like Nix and OS-as-an-image efforts), I don't think it would be a one-true-
solution in a one-size-fits-all construction.

Sometimes you want a single filesystem, sometimes you want multiple
filesystems, sometimes you want an image copied into RAM and nothing more,
sometimes you don't have the resources to even load something like FUSE.

I believe that if we are to make anything better we have to be able to either
switch or combine the many ways one can make a modular or composable system.

We can already swap bootloaders, kernels, desktop environments, transplant
storage from one system to another and continue working. It should be just a
feasible to switch filesystem/package models.

~~~
heavenlyhash
Super agree.

What would it take to make that a reality, though -- without, like secure's
sibling comment says, blowing up the text matrix?

Which things would you make standardized (or conventional) to make switching
easier?

What things would you strip _out_ of consideration to make switching easier?

One of the things I like about Distri is that by questioning whether a bunch
of inter-package interactions are necessary (and finding out that, often, the
answer is "no"), it gets closer to a model of packaging where things are less
prone to create either conflicts or sprawling dependency trees. That feels
like a step in a good direction, at least, to me.

~~~
secure
The most significant difference in this context are the separate
hierarchies—programs need to be recompiled to work when made available under
the /ro mount point.

I don’t see a way to avoid having two versions of a package when you want to
support distri-style package management and traditional-style package
management.

------
Boulth
That's super interesting! I've been pondering better package management tools
for a while as the current landscape seems stagnated.

I hope some ideas from distri will make their way into mainstream distros.
Maybe the more agile ones (Gentoo, Arch) would be interested?

~~~
secure
Thanks! Yeah, I certainly hope many distros can pick it up. I have heard from
someone at SUSE:
[https://twitter.com/fleming_matt/status/1162819502050070528](https://twitter.com/fleming_matt/status/1162819502050070528)

------
rurban
A package-specific /ro prefix is annoying and will not persuade many.

void linux uses standard paths, and still has a blazingly package manager,
without the need for images being mounted ro.
[https://wiki.voidlinux.org/XBPS](https://wiki.voidlinux.org/XBPS)

pacman is also pretty fast. apt, nix and rpm/yum are by far the slowest
package managers. suse's has the fastest undo, via btrfs snapshots.

~~~
secure
What’s annoying about the prefix?

Thanks for mentioning void linux. I gave it a shot in Docker just now, but it
seems rather traditional in that it does dependency resolution, does not allow
for co-installability, uses transactions, unpacks archives, and runs
hooks/triggers.

~~~
CameronNemo
I think his point was that xbps is fast enough for normal operation, while
still offering a traditional package management workflow. Alpine's APK is even
faster according to one of the XBPS developers and another void maintainer
that uses Alpine at work.

(paragraph 2)
[https://www.reddit.com/r/voidlinux/comments/ccppu0/opkg_vs_a...](https://www.reddit.com/r/voidlinux/comments/ccppu0/opkg_vs_apk_vs_xbps/etqq3tv/)

~~~
hawski
I tried Alpine once and it was amazing how fast it was. Many package
installations were almost instantaneous. I remember checking after
installation if the package was installed, because I didn't believe it, that
fast it was.

I would like to hijack this thread and ask: how does Alpine's package build
system works? I use Void's, because package builds are thinly isolated thanks
to user namespaces. This makes sure, that the configure scripts are only
detecting what you want. Also, what is important for me, you can mostly do
everything without root privileges. I use it to generate images of my own
Linux distribution and easy hacking around it is also a big plus for me.

------
solarkraft
This is very nice and it comes as rather surprising to me that it doesn't
exist yet in a "serious" Linux distribution.

While reading about the file system structure I was reminded a of Gobo Linux
concepts - have you heard of it?

I would really like to use something like this on my computer. Are there
disadvantages serious enough to make pursuing this not worth while?

~~~
secure
> While reading about the file system structure I was reminded a of Gobo Linux
> concepts - have you heard of it?

Thanks for the pointer! Yeah, on twitter a few people have pointed to
GoboLinux, too. I had read about it a few years ago, and indeed there are a
number of similarities.

> I would really like to use something like this on my computer. Are there
> disadvantages serious enough to make pursuing this not worth while?

I don’t think there are inherent disadvantages, other than that the project
goals I want to spend time on are experimental/exploratory in nature, rather
than building a community around it. I’m not looking to start a new Linux
distribution with a user base, I’m trying to show established distributions
how much room they have for optimizations, so that we all profit :)

------
elFarto
Interesting, this mirrors a lot of thoughts I've had on package management on
Linux.

I'm not sure I like the idea of using file system images as packages, wouldn't
that have overhead in terms of the sheer number of mount points you'd need,
and additionally the permissions needed to actually mount it.

Keeping each package separate is a good idea, I certainly think the current
'soup' method of package management is not ideal. By 'soup' I mean adding
files all over the place, and then giving it a good stir by running some
scripts; it's very difficult to get back to the state the system was in before
the package was installed without doing extra work to keep track of those
changes. I was looking through the Alpine package management system, a lot of
packages post-install script would add a user, but the remove script rarely
removed that user.

Ideally packages shouldn't care where they're installed to. If a binary in the
package needs to know where it's installed to, it's possible to work that out
at run time (even for libraries, although it's not that straight forward).

In my idea for a package manager, each package would have different types of
files in different directories, e.g. all command line binaries would be in
/bin, all libraries in /lib, all fonts in /share/fonts, etc. In addition, it
would be possible to have scripts run when another package was installed that
provided a set of files you could use, for instance: when a package that
contains fonts is installed, the package that is responsible for fonts on the
system would add the relevant directory to the config and refresh the cache.

If a package had a library in /lib, a package could update /etc/ld.so.conf and
refresh the cache. Linking files to shared directories would be avoided as
much as possible (you probably need to do it for /bin). Using an exchange
directory is somewhat of a crutch, patching the system to work as applications
expect a traditional Linux system to work.

Hmm, this post seems to have just turned into a bit of a brain dump.

------
lone_haxx0r
The only thing I want is a distro that lets me download a package from the web
and install it locally without connecting to any central repo.

I recently learned that Slackware does that, so as soon as I have a little
free time, I'll switch to it.

~~~
oneplane
Pretty much every distro ever already does this. At the same time, practically
every repository is already a web page with packages.

Take Ubuntu and Debian for example; if you download a .deb file you can
install it from the GUI and CLI as-is.

An example with pictures from a random Google result:
[https://itsfoss.com/install-deb-files-ubuntu/](https://itsfoss.com/install-
deb-files-ubuntu/)

~~~
lmz
And the problem with this approach is of course the dependency management.
Just because you have the file for package X doesn't mean it will work if
you're missing some of its dependencies. And those dependencies have
dependencies etc. Which is why now we use e.g. yum and apt to install stuff,
and not rpm and dpkg directly.

------
viraptor
I'm curious about the no-triggers claim. There's stuff that happens after
installing packages sometimes, which you can't put into an image. For example
refreshing the font cache. How does that work here?

~~~
secure
An ideal cache implementation would transparently recognize the need to update
itself when required, and do so efficiently and transparently.

In the particular case of the font cache, I’d say that the library which uses
the cache should recognize that an update is needed. I.e., the update happens
at next use, not at package installation time. On server systems, where fonts
might never be loaded, this saves some compute :)

~~~
aasasd
You might find this a problematic approach:

\- You make packages track more state―“stuff has changed”―and if there are no
triggers, you can't even set a flag for the package's runtime: instead, you
have to paranoically check for changes at the run time and compute the
difference. Meanwhile, an installer/updater knows exactly that something was
changed, and what precisely it was―because the installer just done that.

\- You're shifting work from install time to run time, which e.g. in backend
web programming is exactly the opposite of the right thing to do. Because
modifications, in most cases, occur much more rarely than usage, and because
making the user wait is a no-no. So, an admin who wants to prepare installed
packages for the use, would have to patch the ‘trigger’ stage back in, via
their scripts or something―with the caveat that they can't pause the
installation in the meantime, so that all changes are processed before the
updated programs are run.

~~~
secure
> Meanwhile, an installer/updater knows exactly that something was changed,
> and what precisely it was―because the installer just done that.

Except the distri installer does not modify files, it only ever adds images to
the package store (which, to be fair, might result in changes to the contents
of exchange directories, which are derived from the package store contents).

I agree with the larger point that state tracking might be hard, but it’s not
clear to me that it would be easier in distri’s architecture if the installer
was responsible for it.

> You're shifting work from install time to run time, which e.g. in backend
> web programming is exactly the opposite of the right thing to do

Absolutely. My observation here is that I always want to shift that work.
Frequently, I had to wait for extra work to finish that was entirely unrelated
to what I wanted to accomplish. E.g., if you don’t update your Debian machine
for a few weeks and want to install a new package, maybe that requires a libc
update, which requires service restarts, etc.

I wanted to explore whether shifting the work improves my experience, and so
far it does.

~~~
aasasd
> _the distri installer does not modify files, it only ever adds images to the
> package store_

The set of installed packages also comprises state of the system. It's the
same as in OOP: modification of data in objects is equivalent to a function
accepting prior state and outputting an updated state—only it's more difficult
to track the changes when they're done from the inside and/or in a far-
reaching manner of OOP.

So, in this example, known changes on the installer's level would encompass
the installed/updated packages, leaving determining file-level changes to
setup scripts of each package. I guess different strategies may exist, but
pretty sure this is approximately what other package managers do.

> _if you don’t update your Debian machine for a few weeks and want to install
> a new package, maybe that requires a libc update, which requires service
> restarts, etc._

Afaik, if you don't restart services, you can run into incompatibilities
between old and updated libraries _at the runtime of a service_. Suppose I'm
running a web server in e.g. Python, and I happen to call a script for the
first time after such an update. The script loads an extension which relies on
freshly updated libc (for example), while the server has the old one loaded.

In short, the system also has state in memory, which after an update differs
from the state on the disk—and using mismatching parts of those two is not
advisable.

I'm not even sure what would happen in C-level libs in this case (“symbol
missing” probably?), but I guess everyone lived through mismatched parts of
code at the level of scripting languages, and the result is usually an
exception.

If you catch this situation at the update time, you can gracefully restart the
web server while handing new requests to the newly-started instance. If you
bump into it at the run time, that's an error for at least one client.

------
compsciphd
see
[https://www.usenix.org/legacy/event/atc10/tech/full_papers/P...](https://www.usenix.org/legacy/event/atc10/tech/full_papers/Potter.pdf)

and

[https://www.usenix.org/legacy/events/lisa11/tech/full_papers...](https://www.usenix.org/legacy/events/lisa11/tech/full_papers/Potter.pdf)

------
waddlesplash
So ... this is more or less exactly how Haiku's package management system
works. (It was designed in ~2008-2009, merged into the nightlies in 2013, and
of course has been used by default since then.)

We use different terminology for a number of these things, but all the
concepts are almost identical. Is this really the case of two completely
separate creations of the same idea? Or is this another instance of the Linux
world NIH'ing something a different project had for a while without even
mentioning it?

~~~
secure
I learnt about the Haiku package management system when I had already
developed distri for several months.

I do actually reference Haiku, but in the referenced post
[https://michael.stapelberg.ch/posts/2019-08-17-linux-
package...](https://michael.stapelberg.ch/posts/2019-08-17-linux-package-
managers-are-slow/#appendix-a-related-work), not in the distri introduction
post itself.

I think it’s very telling that the two approaches look so similar, so I was
really happy to learn about the similarities with Haiku!

~~~
waddlesplash
"HaikuDepot" is just the name for the GUI interface. The package manager
itself has no real "name." The number of similarities being so high, I am
still skeptical you really did not come across Haiku before, but well...

On a different note, some key things we have discovered in working with this
system for almost a decade now are:

(1) One really needs a dedicated kernel module for this. Managing the
abstraction in userland only gets too tedious after a while, especially around
updates and the like. Moving mount management to a kernel "packagefs" makes
things so much simpler (and of course more performant). Not to mention that
you can then write a dedicated file format which supports random access
better.

(2) Perhaps the most powerful feature of this system, the ability to boot into
a previous state, is virtually impossible without kernel support.

(3) You will be surprised at how many users want to change things inside
packages; and how many Linux users refuse to switch to a system that does not
allow this. On Haiku we have mechanisms for overriding packaged files and
"blacklisting" files; but on Linux you may find people are far too wired into
the "old way" to get a radical change like this off the ground.

(4) The second most powerful feature, the ability to install packages in ~
(or, eventually, anywhere else for that matter) sounds great, but it requires
a massive amount of software patching, and for applications to use APIs (that
Linux does not have) to iterate through paths rather than hard-coding them.

So if you are going to make such a big, systemic change, why not just ditch
Linux for Haiku instead of re-creating what we have spent so long making work
already?

~~~
secure
> Managing the abstraction in userland only gets too tedious after a while

Can you elaborate on what you found tedious? Thus far, I feel like that part
works reasonably well.

> Perhaps the most powerful feature of this system, the ability to boot into a
> previous state, is virtually impossible without kernel support

Interesting. Why is resetting the package store contents to an earlier state
not sufficient? What does the kernel need to do? Or do you mean to avoid a
reboot?

> You will be surprised at how many users want to change things inside
> packages

Hah, yeah. I’m one of these users myself. In distri, you can just rebuild any
package and forcefully install the result onto your system, if you really want
to and can live with the consequences. Often, this means you need a reboot
right after. I usually iterate on a package by starting its programs from the
outside, or by using qemu which has pretty quick boot times.

> The second most powerful feature, the ability to install packages in ~

Yeah, distri doesn’t attempt to do this, for the reasons you outline :)

> why not just ditch Linux for Haiku

Copy&pasting my reply to you on twitter for the others:

I tried Haiku multiple times, but it’s just too different than what I’m used
to.

Honestly, Linux is niche enough for my preference. If I was switching OS, I
would probably look for more mainstream, not less :)

~~~
waddlesplash
> Can you elaborate on what you found tedious? Thus far, I feel like that part
> works reasonably well.

Most of it revolves around update states, i.e. if I update 100 packages at
once, the number of kernel calls to de-activate the previous 100 and activate
the new 100 is probably in the tens of thousands, and there is a lot of
inefficiency here because the kernel does not really know "what" you are doing
(just unmounting, moving files, remounting, adding links, etc. etc. etc.)
Whereas on Haiku with packagefs, the package_daemon gives the packagefs the
new state description, and it can update all of its internal states and re-
bind everything internally with a lot less overhead, because both ends of the
system know exactly what is going on.

> Why is resetting the package store contents to an earlier state not
> sufficient? What does the kernel need to do? Or do you mean to avoid a
> reboot?

I mean if you e.g. take a bad update, and now your system does not boot, how
are you going to revert to an older one? All distros today store older kernels
so if that was the problem it's easy to revert that, but if the problem was
some package, you have to pull up a recovery shell and mess around in it.

If you manage all of the package mounting in userland, how are you going to
handle booting from an older "state" if the package management tools
themselves, which must be outside the packaged area in your model, got broken?
So then you still have to drop into a recovery shell.

On Haiku, you just get into the bootloader menu, and then you choose to boot
from the previous "state" (which includes the kernel, init system, etc. etc.)
because the bootloader itself speaks "packagefs", and can load the kernel out
of a package.

> In distri, you can just rebuild any package and forcefully install the
> result onto your system, if you really want to and can live with the
> consequences.

But you miss the point: Of course you can do the same on Haiku, or use the
"non-packaged" directories, etc. We have convinced the Haiku "power users" to
go this route. But it is very ingrained into a lot of people, and getting them
to switch to it is a large cultural problem. Especially if the package manager
itself feels "bolted on" and not part of the system, which it virtually always
will be in Linux.

> Yeah, distri doesn’t attempt to do this, for the reasons you outline :)

Well, then the "Linux desktop usability" people who complain that Linux is
unsuitable for $average_user will remain correct...

~~~
secure
Thanks for elaborating!

> Whereas on Haiku with packagefs, the package_daemon gives the packagefs the
> new state description, and it can update all of its internal states and re-
> bind everything internally with a lot less overhead, because both ends of
> the system know exactly what is going on.

That sounds reasonable. Thus far, my implementation is quick enough for the
number of packages I’m dealing with, but perhaps I run into similar
limitations down the road :)

> because the bootloader itself speaks "packagefs", and can load the kernel
> out of a package.

Understood, cool. I haven’t actually explored how to revert to older systems
in the most user-friendly way in distri.

Currently, in the rare cases where I actually brick my system, I just boot
distri from a USB stick (takes 20s to write), mount my disk and run “distri
reset /var/log/distri/update/<latest>/before.txt”.

I’m thinking retaining old kernel/initrd combinations and mounting an older
set of packages from the initrd might be a viable path.

> people who complain that Linux is unsuitable for $average_user will remain
> correct

Possibly, but that’s not what my project is trying to achieve :)

------
drudru11
So is there any performance hit or gain when using SquashFS? I wonder why all
the container folks aren’t using this.

~~~
secure
I haven’t noticed a gain, but also no substantial hit for day-to-day use.

I can only speculate as to why containers are not using SquashFS, but my first
guess would be that they wanted to use something more broadly supported.
Manipulating SquashFS images is not as commonly available in different
programming languages as dealing with e.g. tar archives.

~~~
drudru11
If there isn’t a performance hit, and you get all the benefits (especially
atomicity), this seems like a nice, easy win. I will tinker with SquashFS
today.

Thanks for writing this up.

