Hacker News new | past | comments | ask | show | jobs | submit login
Krunvm – Create MicroVMs from OCI Images (github.com/containers)
127 points by tsujp on Aug 16, 2022 | hide | past | favorite | 21 comments



Krunvm is extremely easy to use and is packed with some interesting ideas.

One of the biggest advantages of this VMM is that programs have access to the network inside the VMM without the admin having to setup complex virtual bridges and so forth in the host in advance or use something like slirp. This is accomplished via TSI (Transparent Socket Impersonation).

Basically sockets in the guest are bridged to AF_VSOCK via the use of a patched linux kernel (when you build libkrunfw.so) when communicating outside the VM. See https://www.youtube.com/watch?v=EGV03THGrrw for more info on TSI.

My only concern is that TSI is currently not a feature available in Linux. When do the authors plan to upstream this into Linux proper? My understanding is that this was planned in 2021 but it is now 2022...


Maybe instead of patching the kernel the "init" process of the VM could set up a seccomp-notify sandbox to handle the socket syscalls in userspace to back them the tcp/udp sockets by a vsock (I think that read/write or sendmsg/recvmsg would work without userspace handling because they get the vsock fd).


> ...or use something like slirp.

Does slirp support protos other than UDP and TCP? Apparently, TSI doesn't: https://github.com/containers/libkrun/blob/1af2e7236d1/READM...


As far as I understand, any protocol that runs over IP (TCP, UDP, ICMP, ...) should be supported by slirp (i.e. slirp4netns) as it sets up a TAP device.


P.S. Actually slirp4netns also needs to implement a network stack for the protocol in user space so it also depends on what protocols it understands.

I wonder if slirp4netns understands anything other than TCP/UDP/ICMP.

OTOH I don't think ping (ICMP) is possible with TSI (but maybe I needed to do some other config to make it work).


Mmmmmm, rust and golang, working together. Shout out to buildah team and the krunvm folks. This is very cool. The icing on the cake was the aarm64 support for apple. Kudos to you all. I can now run experimental linux images as VM's and see if they'll blow up.


Ask HN: what workload is not suitable for microVMs? Can I use them like a regular VM?


Since it's a VM, it's ideal for workloads with a set amount of resource use and that need strong isolation guarantees. Regular containers are better to share a pool of resources whose usage varies widely, and when you don't need strong isolation guarantees. Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster. Since this also uses TSI for networking, you will need a patched Linux kernel to use networking in the guest at all, and raw sockets don't work at all.


> Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster.

Do you mean VM I/O can be very slow? I don't think containers should have any overhead, please correct me if I'm wrong though.


Container file I/O is very slow. It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer. For example, doing 10 containerized nodejs app builds simultaneously will swamp the host with iowait. A common hack to is to put the OCI file tree / overlays on a dedicated disk with much higher iops than the boot disk.


> It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer.

That's just Docker though, right? Does LXC or systemd-nspawn do that?


Thank you, I'll have to look into this. I was thinking from a file namespacing perspective there shouldn't be overhead, but it makes sense that adding the overlay filesystems and mounts would impact performance.


These specific microVMs are managed by: https://github.com/containers/libkrun#goals-and-non-goals (linked directly to project scopes).

In summary though (others redacted):

    # Goals
      - Be compatible with a reasonable amount of workloads.

    # Non-goals
      - Be compatible with all kinds of workloads.


That depends on the microvm. Device support in Firecracker, like GPUs, doesn't exist, which also makes Firecracker suitable for multitenant workloads. Something like QEMU has far more device support but is also significantly easier to escape out of.


Very cool looking project. I love their approach to networking which patches the Linux kernel to intercept operations on sockets and defers that to the host.

I’ve been working in a similar area recently and networking is an unfortunate stumbling block.


It looks like crun already supports krun: https://github.com/containers/crun/blob/main/crun.1.md


Is this something like Firecracker ?


It runs on top of https://github.com/containers/libkrun. Similar to Firecracker but seems specifically targeted to making microVMs out of OCI containers (via CLI) as opposed to Firecracker which uses a downloaded kernel and rootfs and is managed via API ala https://github.com/firecracker-microvm/firecracker/blob/main....

But I'm no expert, just my armchair take on things.


If it is OCI compatible, it technically means you could use kubernetes or another container orchestrator to orchestrate these microvms. I wonder krunvm already works with kubernetes.


If I'm understanding correctly, it's OCI compatible in the other direction - it consumes OCI compatible images, but it doesn't expose an OCI compatible layer on top for orchestration.

kube-virt[1] is a thing, though, that provides k8s orchestration for VMs. I don't see why you couldn't use krunvm microvms with that

[1] https://github.com/kubevirt/kubevirt


Probably more like `weaveworks/ignite`, which creates firecracker microvms from OCI images




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: