Krunvm – Create MicroVMs from OCI Images

sidkshatriya · on Aug 16, 2022

Krunvm is extremely easy to use and is packed with some interesting ideas.

One of the biggest advantages of this VMM is that programs have access to the network inside the VMM without the admin having to setup complex virtual bridges and so forth in the host in advance or use something like slirp. This is accomplished via TSI (Transparent Socket Impersonation).

Basically sockets in the guest are bridged to AF_VSOCK via the use of a patched linux kernel (when you build libkrunfw.so) when communicating outside the VM. See https://www.youtube.com/watch?v=EGV03THGrrw for more info on TSI.

My only concern is that TSI is currently not a feature available in Linux. When do the authors plan to upstream this into Linux proper? My understanding is that this was planned in 2021 but it is now 2022...

goombacloud · on Aug 16, 2022

Maybe instead of patching the kernel the "init" process of the VM could set up a seccomp-notify sandbox to handle the socket syscalls in userspace to back them the tcp/udp sockets by a vsock (I think that read/write or sendmsg/recvmsg would work without userspace handling because they get the vsock fd).

ignoramous · on Aug 16, 2022

> ...or use something like slirp.

Does slirp support protos other than UDP and TCP? Apparently, TSI doesn't: https://github.com/containers/libkrun/blob/1af2e7236d1/READM...

sidkshatriya · on Aug 16, 2022

As far as I understand, any protocol that runs over IP (TCP, UDP, ICMP, ...) should be supported by slirp (i.e. slirp4netns) as it sets up a TAP device.

sidkshatriya · on Aug 16, 2022

P.S. Actually slirp4netns also needs to implement a network stack for the protocol in user space so it also depends on what protocols it understands.

I wonder if slirp4netns understands anything other than TCP/UDP/ICMP.

OTOH I don't think ping (ICMP) is possible with TSI (but maybe I needed to do some other config to make it work).

reactordev · on Aug 16, 2022

Mmmmmm, rust and golang, working together. Shout out to buildah team and the krunvm folks. This is very cool. The icing on the cake was the aarm64 support for apple. Kudos to you all. I can now run experimental linux images as VM's and see if they'll blow up.

dvnguyen · on Aug 16, 2022

Ask HN: what workload is not suitable for microVMs? Can I use them like a regular VM?

0xbadcafebee · on Aug 16, 2022

Since it's a VM, it's ideal for workloads with a set amount of resource use and that need strong isolation guarantees. Regular containers are better to share a pool of resources whose usage varies widely, and when you don't need strong isolation guarantees. Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster. Since this also uses TSI for networking, you will need a patched Linux kernel to use networking in the guest at all, and raw sockets don't work at all.

staticassertion · on Aug 16, 2022

> Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster.

Do you mean VM I/O can be very slow? I don't think containers should have any overhead, please correct me if I'm wrong though.

0xbadcafebee · on Aug 16, 2022

Container file I/O is very slow. It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer. For example, doing 10 containerized nodejs app builds simultaneously will swamp the host with iowait. A common hack to is to put the OCI file tree / overlays on a dedicated disk with much higher iops than the boot disk.

rascul · on Aug 16, 2022

> It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer.

That's just Docker though, right? Does LXC or systemd-nspawn do that?

staticassertion · on Aug 16, 2022

Thank you, I'll have to look into this. I was thinking from a file namespacing perspective there shouldn't be overhead, but it makes sense that adding the overlay filesystems and mounts would impact performance.

tsujp · on Aug 16, 2022

These specific microVMs are managed by: https://github.com/containers/libkrun#goals-and-non-goals (linked directly to project scopes).

In summary though (others redacted):

    # Goals
      - Be compatible with a reasonable amount of workloads.

    # Non-goals
      - Be compatible with all kinds of workloads.

staticassertion · on Aug 16, 2022

That depends on the microvm. Device support in Firecracker, like GPUs, doesn't exist, which also makes Firecracker suitable for multitenant workloads. Something like QEMU has far more device support but is also significantly easier to escape out of.

vbitz · on Aug 16, 2022

Very cool looking project. I love their approach to networking which patches the Linux kernel to intercept operations on sockets and defers that to the host.

I’ve been working in a similar area recently and networking is an unfortunate stumbling block.

0xbadcafebee · on Aug 16, 2022

It looks like crun already supports krun: https://github.com/containers/crun/blob/main/crun.1.md

polskibus · on Aug 16, 2022

Is this something like Firecracker ?

kungfufrog · on Aug 16, 2022

It runs on top of https://github.com/containers/libkrun. Similar to Firecracker but seems specifically targeted to making microVMs out of OCI containers (via CLI) as opposed to Firecracker which uses a downloaded kernel and rootfs and is managed via API ala https://github.com/firecracker-microvm/firecracker/blob/main....

But I'm no expert, just my armchair take on things.

debarshri · on Aug 16, 2022

If it is OCI compatible, it technically means you could use kubernetes or another container orchestrator to orchestrate these microvms. I wonder krunvm already works with kubernetes.

mcronce · on Aug 16, 2022

If I'm understanding correctly, it's OCI compatible in the other direction - it consumes OCI compatible images, but it doesn't expose an OCI compatible layer on top for orchestration.

kube-virt[1] is a thing, though, that provides k8s orchestration for VMs. I don't see why you couldn't use krunvm microvms with that

[1] https://github.com/kubevirt/kubevirt

Strum355 · on Aug 16, 2022

Probably more like `weaveworks/ignite`, which creates firecracker microvms from OCI images