A path out of bloat: A Linux built for VMs

moondev · on April 2, 2024

VMware and Microsoft ship Linux images built for VMs

https://github.com/microsoft/cbl-mariner

oarsinsync · on April 2, 2024

Ish. Photon is optimised for virtualisation, but it also runs on bare metal. (I don’t know anything about CBL Mariner).

The article is talking about stripping down to the bare minimum, down to the level of even removing filesystem drivers, because they’re not necessarily needed!

gorkish · on April 2, 2024

I was thinking about this yesterday; other than POSIX why do operating systems choose a filesystem heirarchy as the thing that they provide as a datastore? Why not a structured document store, a blob store, a key value store, or some mixture of all of this?

I'm very bitter that computing went into turtle stacking mode and took its 50 years of baggage with it.

WantonQuantum · on April 2, 2024

Microsoft tried this with Windows Future Storage:

https://en.m.wikipedia.org/wiki/WinFS

OS/400 is quite different too:

https://www.quora.com/On-the-AS-400-how-are-objects-stored-o...

The BeOS filesystem has many DB like features:

https://en.m.wikipedia.org/wiki/Be_File_System

yjftsjthsd-h · on April 2, 2024

Two separate things:

- I don't think that the suggestion was to remove the filesystem outright, just that if you run everything over NFS or 9P then you don't need drivers for ext/xfs/fat/...

- To the broader "why do we even use filesystems" - because those other options are less powerful while offering marginal benefit. Also because backwards-compatibility and interoperability with other systems is useful.

lproven · on April 4, 2024

There have been several.

IBM i is one, formerly known as OS/400.

The Pick OS is another.

Here's a modern one, DBOS:

https://dbos-project.github.io/

p_l · on April 2, 2024

CBL (not sure about Mariner specifically) had builds targeting hardware, AFAIK.

I recall ~2016 that Azure was running their own Linux images on various parts of the gear, like "Smart NICs" in VM hosts that cooperated with SDN driver in Hyper-V.

lproven · on April 4, 2024

[Article author here]

Not the same at all. These are not really cut down in any important way and are full fat complete standalone OSes.

That people think it's a good idea to run full bare-metal OSes in VMs is the reason why I wrote the article.

deivid · on April 2, 2024

I spent a few hours building the smallest kernel that'd boot in Firecracker ( https://blog.davidv.dev/minimizing-linux-boot-times.html ), got it booting in 6ms with network support, which opens up some interesting use-cases.

cbsks · on April 3, 2024

That’s super impressive! I have spent many hours optimizing embedded boot times, but it’s amazing how much faster it can be when the hardware is virtualized!

c0wb0yc0d3r · on April 2, 2024

What would you say is prerequisite knowledge for understanding your blog post?

peter_d_sherman · on April 2, 2024

>"Although CMS was originally designed to run on bare metal, the version shipped as part of IBM CP/CMS was dedicated to running inside a VM.

As a thought experiment, now let's think about what a Linux system would look like if it was designed with this in mind. It will only ever be a guest, running under a parent OS. (To make life easier, we can restrict any specific edition to one individual host hypervisor.)

A lot of issues ordinary distros face just… disappear. It doesn't need an installer, because a VM image is just a file. It doesn't need an initrd, because we know the host hardware in advance: it's virtual, so it's always identical. It doesn't need to boot from disk, because it won't have disks: it will never drive any real hardware, meaning no real disks of its own. That also means no disk filesystem is needed."

It also wouldn't need most drivers -- and possibly might not need any drivers, depending on how things were configured!

Anyway, an interesting set of ideas...

01HNNWZ0MV43FF · on April 2, 2024

I wonder why regular distros even need installers.

I mean yes, it's friendly to new users. Yes, some steps must be run that cannot be expressed as one-size-fits-all files in a tarball.

But could a power user not untar a filesystem and run a script and be good to go? I know that's a de-facto installer - I just wish the tarball was easier to get at, I guess. Seems like a lot of data (deb packages) tied up inside code (one big 'install' function)

yjftsjthsd-h · on April 2, 2024

It depends how you want the "install" to go; Gentoo stage3 tarballs are exactly like that (and, unsurprisingly, the BSDs at least used to favor it). On the other hand, I'm personally quite fond of the method where you point the package manager at a root directory and it creates the system there by (more or less) just downloading and unpacking packages into that directory; off the top of my head, nixos defaults to this approach, Arch at least used to, yum/dnf distros were quite happy to do it with a little prodding, debian does it with debootstrap... Anyways. The point is that this is in some ways more flexible; rather than starting with a root tarball and then adding packages, you just... install packages (almost incidentally including core/base packages).

metalspoon · on April 2, 2024

That's almost how you install Arch, btw.

antod · on April 2, 2024

Yup, all doable. 15yrs ago, I'd build custom Xen PV images for Debian and Ubuntu VMs that would start off as running debootstrap into a chroot. It was basically just that a directory you'd customise files in and turn into a tarball.

1letterunixname · on April 2, 2024

Shared read-only core volumes combined with write overlay volumes at the hypervisor level are the way out of this.

Containers are pseudo isolation lacking most/all of the isolation, capacity allocation, and rate limiting guarantees that type-1 hypervisors offer.

There is no substitute for better technology (like partially paravirtualized VMs, mem deduped) used correctly, similar to Kata Containers.

nine_k · on April 2, 2024

Containers are dependency isolation, not security isolation %)

superkuh · on April 2, 2024

It doesn't seem like the author is aware of the irony of talking about minimizing bloat while advocating for running things in virtual machines instead of natively on a single OS. I know there's still a point to minimizing bloat when VMs do* makes sense in business/institutional environments but it's still funny in most use cases.

yjftsjthsd-h · on April 2, 2024

I'm reading this as using the VM for software compatibility (Linux software on Plan 9). You absolutely can do that without a VM (see: WSL1, FreeBSD/NetBSD/illumos Linux ABI compat layers, WINE, darling, vx9, ...) but it's hard - there's a reason MS eventually created WSL2. Using a VM doesn't have very much overhead these days and gives you ~perfect compatibility with minimal effort.

lproven · on April 4, 2024

Hi. I'm the author.

It seems to me that you are unaware that this is part 4 of a series adapted from my 2024 main programme talk at the FOSDEM conference.

https://fosdem.org/2024/schedule/event/fosdem-2024-3095-one-...

This article is just the epilogue and you are misunderstanding it because you lack its context.

I turned the talk into 4 articles.

This one states the problem:

https://www.theregister.com/2024/02/12/drowning_in_code/

This one tries to explain the history:

https://www.theregister.com/2024/02/16/what_is_unix/

This one offers a path forwards:

https://www.theregister.com/2024/02/21/successor_to_unix_pla...

You just read the epilogue which suggested some ways that people who weren't kernel developers -- as I am not myself -- could help.

superkuh · on April 4, 2024

> There is an urgent need for smaller, simpler software. When something is too big to understand, then you can't take it apart and make something smaller out of it.

And the first step is stopping the practice of adding 6 extra layers of container/vm/etc abstraction on top of everything making it infeasible to debug or understand.

lproven · on April 5, 2024

I completely agree.

Which is why I propose 9front as an alternative point to start from.

But it has few apps, and is too different to readily port to.

So, I'm suggesting a way to efficiently run Linux apps on 9front without emulating Linux.

So we can use existing tools while working on replacements.

throwaway11460 · on April 2, 2024

What's your recommendation for secure workload separation?

bongodongobob · on April 2, 2024

Yeah nothing like having databases running on the print server. Gotta avoid that bloat. I smash all my applications into one server, it's so much leaner.

Seriously though, I have no idea what your point is.

01HNNWZ0MV43FF · on April 2, 2024

I wonder how much emulator it takes to run a Linux that's made to run in a VM.

I might be more enticed to build a hobby OS if I knew I could do a port of qemu's core (It's gotta be just SDL, file I/O, and keyboard/mouse plus the other 90% of the owl, right) to host Linux apps somehow.

fl7305 · on April 2, 2024

It's not that hard to emulate the basic Linux system calls. If you have the user mode compiler and libc running, then you can get simple console applications up and running in a day or two.

Having looked at qemu a bit, it feels like a bigger undertaking to understand the qemu internals enough to start modifying it.

But if you can run qemu as-is, the core kernel is not that big if you have a very limited set of HW support and don't implement the more complicated system calls.

yjftsjthsd-h · on April 2, 2024

I'm not quite clear: Is this a writeup of ideas that the author thinks should be implemented, or description of something they did implement?

lproven · on April 4, 2024

I'm the author.

It's neither.

It's the closing 2 minutes of my FOSDEM talk this year, turned into an article.

I've already linked to the talk and the other 3 articles I adapted from it in this comment thread, here:

https://news.ycombinator.com/item?id=39928177

But you know what, in fairness, it's closer to "what the author thinks should be implemented".

I reckon I might be able to have a pretty good stab at building a Linux-for-VMs like this, but it's nothing to do with my actual day job and it's not the sort of thing I do for fun. I am unlikely to try unless someone were paying me.

(My actual "vanishingly little time outside of $DAYJOB" project at the moment concerns DOS and running DOS on modern-ish hardware. It's nothing to do with Linux at all.)

I'm much more interested in trying to turn 9front into something that can run modern Linux apps, transparently, without making 9front much bigger or more complicated than it is and without the considerable maintenance overhead of emulating the moving target that is Linux.

But that is something I am not even thinking about, for two reasons:

[1] I definitely can't do it.

[2] I am more interested in non-Unix-like OSes such as Oberon or Smalltalk.

I gave a FOSDEM talk on that, too.

https://archive.fosdem.org/2021/schedule/event/new_type_of_c...

Here's an article version:

https://www.theregister.com/2024/02/26/starting_over_rebooti...

antod · on April 2, 2024

Makes me wonder, what happened to Unikernels? Are people still working on them?

Between containers and serverless functions as a service moving to the cloud, it seems they ran out of what little momentum they had.

http://unikernel.org/

nine_k · on April 2, 2024

The key enabler of market share is running unmodified software. Few can be bothered to recompile their software for a unikernel, or switch to a unikernel-specific runtime provided by a third party (compatibility, CVE reaction time, etc).

I suspect large cloud providers may be running their custom internal stuff using unikernels and reaping resource economy and security benefits, without us knowing much about that.

wmf · on April 2, 2024

People bothered to containerize their software because the benefits offset the work. Unikernels shouldn't be much harder but they haven't successfully made that argument.

arccy · on April 2, 2024

the management interfaces for VMs suck compared to all the advances made for containers, so unikernels have to be quite compelling for people to suffer the worse ux to adopt them.

gorkish · on April 2, 2024

If your programs are all unikenels running on a hypervisor, havent you just reinvented a regular operating system with more and stronger abstractions?

Bnjoroge · on April 2, 2024

folks at https://unikraft.io just released their (closed) beta. seems like there's still interest.

lproven · on April 4, 2024

Thanks for posting this!

I posted it myself at the time, but it made little impact:

https://news.ycombinator.com/item?id=39483019

It's important to know that this article is just part 4 in a series.

Here is the rest:

https://www.theregister.com/Tag/One%20Way%20Forward/

It is based on this FOSDEM 2024 talk I gave in February:

https://fosdem.org/2024/schedule/event/fosdem-2024-3095-one-...

tambourine_man · on April 2, 2024

Lots of interesting ideas but I’d like to know how much faster/less RAM would a purpose built Linux-for-VM would be in practice. I’m guessing not as much as one would think, initially.

west0n · on April 2, 2024

Why not use containers, they even shares the kernel.

yjftsjthsd-h · on April 2, 2024

Because they can't share the kernel; their goal is to run Linux programs on a Plan 9 host.

lproven · on April 4, 2024

[Article author here]

Because this article is part #4 of a series. It is not intended to run on a Linux host machine. The real reason for doing it is to confer Linux binary compatibility to a non-Linux OS.

There are links to the rest of the series here in the comments: https://news.ycombinator.com/item?id=39928177