Hi HN,
This is Cesar Talledo and Rodny Molina, co-founders of Nestybox (www.nestybox.com).
Nestybox has developed a new container runtime that sits under Docker/containerd (it's a new type of runc) and enables containers to act as virtual-servers capable of running software such as systemd, Docker, and Kubernetes, easily and with proper isolation.
The motivation came from noticing that containers are great at running microservices but struggle to run system-level software in them such as those mentioned above. That is, in order to run such software in a container, we needed unsecure privileged containers with complex images, custom entrypoints, volume mounts, etc., or alternatively a heavier virtual machine. This did not seem right.
We studied the problem and noticed that the container abstraction was not complete enough, meaning that inside the container a root process lacked capabilities to perform certain low-level operations, the namespacing of procfs and sysfs had a few holes, there are limitations for running overlayfs-on-overlayfs, and more.
To solve this, we decided to create a new container runtime that would set up the container in such a way that it could run system software easily and without resorting to privileged containers. That is, a user should be able to do "docker run -it some-image" and get a container inside of which she can run systemd, dockerd, or even K8s without problem (much as if it were a virtual machine).
After lots of long days, we came up with Sysbox. It's a new type of "runc" and sits below OCI-based container managers (e.g., Docker/containerd). You typically don't interact with Sysbox directly, but rather use Docker (or similar) to launch the containers. Sysbox was forked from the excellent OCI runc in early 2019 and has undergone significant changes since then. It uses OS virtualization techniques such as always enabling the Linux user namespace, uid shifting via shiftfs, partial virtualization of procfs and sysfs, selective syscall trapping in user-space, setting up special mounts into the container, and more. It's written in Go.
Here is a video: https://asciinema.org/a/kkTmOxl8DhEZiM2fLZNFlYzbo?speed=1.75
Today we are happy to announce that we are open-sourcing Sysbox (Apache 2.0). You can find it at https://github.com/nestybox/sysbox . We welcome users and contributors, as it has plenty of room to grow and improve. There are plenty of docs in the repo describing how to use it and how it works.
We think Sysbox is a very useful tool to expand the use cases for containers and provides an alternative to virtual machines in many scenarios, particularly for dev environments, testing, CI/CD, and even running legacy apps in containers.
In order to pay the bills, Nestybox (the company we founded) will sell a version of Sysbox called Sysbox Enterprise Edition (Sysbox-EE). We are using an open-core model, such that Sysbox-EE is based on the open-source Sysbox and adds a layer of proprietary enterprise level features. We think this model will help us strike a healthy balance between creating useful technology that all can benefit from and keeping the lights on.
Thanks for reading and we welcome your feedback.
Best,
-Cesar & Rodny
Docker is missing a bunch of features that make some software work, which is why you can't run Docker inside Docker by default. Instead of dropping from containers all the way down to hardware virtualization, Sysbox is "augmenting" containers with the missing features by simulating them in userland. That gives you all the power of a VM, without any of the downside of slow start-up speed, provisioning blocks of memory, not being able to run them on EC2, etcetc.
It reminds me a bit of user-mode Linux [0], weirdly. There's something kinda interesting about simulating a bunch of the kernel in userland.
[0] https://en.wikipedia.org/wiki/User-mode_Linux