> I think in the future there will be a big shift off of runc (docker) as the k8s default runtime now that CRIO has made them pluggable.
I don't think there's going to be a big shift away from runc (though I'm biased, I'm one of the runc maintainers -- and runc is quite separate from Docker) for a couple of reasons:
1. Containers are still more than decent, and will always handle certain usecases and setups better than VMs can (due to the pliability of namespaces and shared kernels -- VMs have the fixed DRAM problem just like they always have).
2. At the moment, plain containers are arguably more secure than kata containers (though this can be fixed "fairly" easily with some minor memory penalties) because they disable a bunch of security features in their VM kernels -- so you don't get seccomp or AppArmor protections for your containers. Now, you do get hypervisor security, but there was a study some time ago which claimed that a well-tuned seccomp profile is about as secure as a hypervisor.
3. Hooks (like the NVIDIA ones) will always work better with plain containers because the whole idea behind hooking into a container runtime is that you can attach things to the containers' namespaces (with NVIDIA this would be vGPUs). kata is trying (and succeeding in most cases) to emulate these sorts of pluggable components with their agent, but fundamentally they're trying to pretend to be a container (which is going to cause problems).
I think kata is a really good project (and I'm happy that Intel and Hyper.sh joined forces), but I don't think it will replace ordinary containers entirely (even under Kubernetes). But hey, I could be proven wrong -- at which point I'll switch to working on LXC. :P
I noticed that Kata have started integrating firecracker which will be interesting and should help their performance and security stories going forward.
I'd agree though that kata (or other VM based containerization solutions) won't completely replace runc based solutions.
One of the things I like about standard linux containers is the ease with which you can remove part of the isolation without turning it all off or on. Being able to easily do `--net=host` or add a capability is very handy in some circumstances.
Also the security story definitely isn't as clear as VMs>containers. Every isolation layer has had breakouts in the last year, VMs, gVisor, Linux containers.
> What problems do you see/think arise from kata pretending to be a container?
There are a few.
One of the most obvious is that anything that requires fd passing simply cannot work, because file descriptors can't be transferred through the hypervisor (obviously). This means that certain kinds of console handling are completely off the table (in fact this was a pretty big argument between us and the Hyper.sh folks about 2 years ago now -- in runc we added this whole --console-socket semantic to allow for container-originated PTYs to be passed around, and you cannot do that with VM-based runtimes without some pretty awful emulation). But it turns out that most layers above us now just have high-level PTY operations like resizing (which I think is uglier and less flexible, but that's just my personal opinion).
Another one is that runtime hooks (such as LXC or OCI hooks) now are a bit more difficult to use. There's nothing stopping you from doing CNI with Kata, but it's one of those things where either the hook knows that it's working with a VM (which requires hook work) or the hook is tricked into thinking its dealing with a container (which requires lots of forwarding work, or running the hook in the VM). I'm really not sure how Jata handles this problem -- but the last time I spoke to the Kata folks the answer was "well, we're OCI compliant" which isn't really an answer IMHO (they're also cannot be OCI compliant, because OCI compliance testing still doesn't exist -- but that's a different topic). I imagine their point was "we copy runc", which is unfortunately what most people think when they say "OCI compliance".
There was a recent issue a colleague of mine (who works on Kata) mentioned, which is that currently "docker top" operates by getting the list of PIDs from the runtime and then fetching /proc information about them. Obviously this won't work with Kata and will require some pretty big changes to containerd and Docker to handle this (though I would argue this would be a good thing overall -- the current way people handle getting host PIDs for container processes is quite dodgy). There is currently some kernel work being done by Christian Brauner to add a new concept called procfds, and all of this work will be completely useless for Kata (even though it'll fix many PID races that exist).
But as I said, Kata is quite an interesting project (the work done for the agent is quite interesting) and it fulfills a very important need -- people are still worried about container security and adding a hypervisor which is lightweight will dissuade those fears.
I don't think there's going to be a big shift away from runc (though I'm biased, I'm one of the runc maintainers -- and runc is quite separate from Docker) for a couple of reasons:
1. Containers are still more than decent, and will always handle certain usecases and setups better than VMs can (due to the pliability of namespaces and shared kernels -- VMs have the fixed DRAM problem just like they always have).
2. At the moment, plain containers are arguably more secure than kata containers (though this can be fixed "fairly" easily with some minor memory penalties) because they disable a bunch of security features in their VM kernels -- so you don't get seccomp or AppArmor protections for your containers. Now, you do get hypervisor security, but there was a study some time ago which claimed that a well-tuned seccomp profile is about as secure as a hypervisor.
3. Hooks (like the NVIDIA ones) will always work better with plain containers because the whole idea behind hooking into a container runtime is that you can attach things to the containers' namespaces (with NVIDIA this would be vGPUs). kata is trying (and succeeding in most cases) to emulate these sorts of pluggable components with their agent, but fundamentally they're trying to pretend to be a container (which is going to cause problems).
I think kata is a really good project (and I'm happy that Intel and Hyper.sh joined forces), but I don't think it will replace ordinary containers entirely (even under Kubernetes). But hey, I could be proven wrong -- at which point I'll switch to working on LXC. :P