A zone is jargon for a virtual machine guest environment (an homage to Solaris Zones). Styrolite and Edera runs containers inside virtual machine guests for improved isolation and resource management.
We run unmodified containers in a VM guest environment, so you get the developer ergonomics of containers with the security and hardware controls of a VMM.
gVisor runs a userspace kernel that proxies syscalls to a shared host kernel. Running an "application kernel" in userspace impacts performance because it goes through two schedulers. Virtual machine isolation is more restrictive because it doesn't share any kernel state with other containers. We have a whitepaper that compares the performance of gVisor and Stylorite/Edera if you want to see the differences http://arxiv.org/abs/2501.04580
gVisor emulates a kernel in userspace, providing some isolation but still relying on a shared host kernel. The recent Nvidia GPU container toolkit vulnerability was able to privilege escalate and container escape to the host because of a shared inode.
Styrolite runs containers in a fully isolated virtual machine guest with its own, non-shared kernel, isolated from the host kernel. Styrolite doesn't run a userspace kernel that traps syscalls; it runs a type 1 hypervisor for better performance and security. You can read more in our whitepaper: http://arxiv.org/abs/2501.04580
Thanks for the explanation. So you are using virtualisation-based techniques. I had incorrectly inferred from other comments that you were not.
I skimmed the paper and it suggests your hypervisor can work without CPU-based virtualisation support - that's pretty neat.
Many cloud environments do not have support for nested virtualisation extensions available (and also it tends to suck, so you shouldn't use it for production even if it is available). So there aren't many good options for running containers from different security domains on the same cloud instance. gVisor has been my go-to for that up until now. I will be sure to give this a shot!
Yes, precisely. This also provides container operators with the benefits of a hypervisor, like memory ballooning, and dynamically allocating CPU and memory to workloads, improving resource utilization and the current node overprovisioning patterns.
Non-root containers still operate under a shared kernel. Non-root containers that run under a vulnerable kernel can lead to privilege escalation and container escapes.
Styrolite is a container runtime engine that runs containers in a virtual machine guest environment with no shared kernel state. It uses a type 1 hypervisor to fully isolate a running container from the node and other containers. It's similar to Firecracker or Kata containers, but doesn't require bare metal instances (runs on standard EC2, etc) and utilizes paravirtualization.