Hacker News new | past | comments | ask | show | jobs | submit login

I feel like containers and Kubernetes are microkernels revenge.

They are for all practical purposes fulfilling the same role.




Not at all? If anything they’re filling the opposite role. Microkernels are about building interfaces which sandbox parts of the kernel. Namespaces are about giving sandboxed userlands full access to kernel interfaces.


Namespaces is one form of capabilities.

Additionally a Linux kernel that exists for the sole purpose to keep KVM running, while everything that powers a cloud workload are Kubernetes pods, it is nothing more than a very fat microkernel, in terms of usefulness.


The dose makes the poison; we're still a long way from fulling embracing microkernels and capabilities. Security is a holistic property and encompasses finer details too. I want a small TCB. I want capabilities pervasively. And in pursuit of modularity and abstraction, I want to be able to choose the components I want and take those burdens myself. It's a bit silly seeing the nth SIGOPS-SOSP paper on how Linux can be improved by integrating userspace scheduling.


It is the same in safer systems programming languages, we already have the concept since 1961, but apparently making the industry take the right decisions is a tenuous path until something finally makes good ideas stick and gain adoption.


Microkernel does not mean it uses capabilities. And "very fat microkernel" is an oxymoron. The definition of a microkernel is that they do as little as possible in the kernel.


Of course it doesn't.

The point is how Linux is being tamed to provide some of the concepts, in spite of its monolithic design.

But naturally we can discuss minutiae instead.


It’s not “some of the concepts” nor minutae, though, it’s literally the difference between a microkernel and a monolithic kernel.

Presenting capabilities / namespaces to userland is a completely different and in the case of Linux, orthogonal thing to presenting capabilities/namespaces to kernel services. I guess you could argue that the concept of capabilities came from microkernels, but when it’s applied to only user space, it’s just not really related to a microkernel anymore at all.

That’s basically the whole problem with capabilities and especially their application in namespaces from a security standpoint in Linux: they try to firewall these little boxes from each other but the kernel they’re all talking to is still one big blob. And this difference is meaningful in a security sense, not just some theory hand waving. https://www.crowdstrike.com/en-us/blog/cve-2022-0185-kuberne... is just one good example, but entire classes of mitigations are rendered meaningless by the ability to unshare into a box that lets an attacker touch exploitable kernel surface area which is not further isolated.


A microkernel by the book, is just a bunch of processes doing stuff that would be on the kernel otherwise, if we are then discussing minutae.

Nothing else, critical set of OS services are no longer hosted in a single process space, rather required a distributed system of processes running on a single computer node.

The moment these processes are able to be hosted in a set of computing nodes, we enter the realm of distributed OSes, is is another discussion.

The amount of services that remain on the mikrokernel, versus what is hosted in each OS process, running outside of the kernel, depends on each microkernel, there is no rule for this division of labour and each one that was ever designed has chosen a different approach.

If you cannot see the parallel between this and a swarm of containers running on a node, doing the actual workload and only requiring the kernel services due to the way KVM is implemented, and rather focus on a security mechanism that is detail on how Linux works, well, it isn't on me to further break it down.


OK, I get what you are saying now, and I agree if we are talking about KVM.

That's not how Kubernetes works by default, so I thought we were talking about container runtimes. I still disagree strongly from both a practical security and theory standpoint if we are talking about container runtimes implemented using namespaces and cgroups.


I don't think containers, namespaces, and the like failing to provide the same benefits of a true microkernel negate the OPs point. They are ways of segmenting userspace in a more finely grained manner and they do make attacks harder.

Linux security being a shit show and undermining these efforts is kinda besides the point: they are still attempts provide a runtime closer to what microkernels would naturally provide in a backwards compatible way. Indeed, these containers could be turned into fully fleged VMs if there were the resources to make it happen.


I don't really get this argument: "you're saying that one thing, namespaces, isn't implemented in any way resembling a microkernel, but what if we replaced it with another completely different thing, a hypervisor? Then it would be similar!" Yes? Sure?

To me the word "microkernel" expresses how the kernel is structured, not what userspace interface it presents. A microkernel is built by separating kernel services into discrete processes which communicate using a defined IPC mechanism. Ideally, a microkernel offers memory boundary guarantees for each service, either by using hardware memory protection/MMU and running each service as a true "process" with its own address space, or by proving the memory safety of each kernel service using some form of ahead-of-time guarantee.

Of course, doing this lends itself to also segmenting user-space processes by offering a unique set of kernel service processes for each user-space segment (jail, namespace, etc.), but there's no reason this needs to be the case, and it's by and large orthogonal.

I do agree with what I eventually understand the grandparent poster was trying to express, which is that running a bunch of KVMs looks like a microkernel. Because then, you've moved the kernel services into a protected boundary and made them communicate across a common interface (hypercalls).

But that's not how Kubernetes works by default and in the case of containers and namespaces, I think this is entirely false and a dangerous thing to believe from a security standpoint.

> They are ways of segmenting userspace in a more finely grained manner and they do make attacks harder.

From a _kernel_ security standpoint (because we are talking about micro_kernels_ here), I actually think namespaces make attacks much easier and the surface area much greater. Which is basically the entire point I was trying to make: rather than exposing fragile kernel interfaces to exclusively system services with CAP_SYS_ADMIN, you now have provided an ability (unshare) for less-trusted runtimes to touch parts of the host kernel (firewall, filesystem drivers, etc.) which they would normally not have access to, and you have to go back and use fiddly rules engines (seccomp, apparmor, selinux) to fix the problem you created with namespaces.

To be clear, I think from a big picture standpoint, it's a tradeoff, and I'm nowhere near as anti-container/anti-namespace as it may seem. I just get annoyed when I see people express namespaces as a kernel security boundary when they are basically the exact opposite: they are a kernel security un-boundary, and Linux's monolithic nature makes this a problem.


That's fair!


Imo the microness is not about size but about the architecture of running drivers/services in fault-resistant separation from the kernel




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: