Microkernel = move all the code OUT OF the kernel.
These slides are about moving all the code INTO the kernel.
Putting your application logic into the kernel would be more like a unikernel I guess?
Ten to thirty years later, what happens is that both of them turn out to be correct in ways that neither of them could have foreseen at the beginning. It is true that what the practicals were doing was messy and it did blow up, and they did end up taking academic ideas into their code, but not quite as the academics would have thought at the beginning either.
Yeah, this is not a "real" microkernel. But if this process continues as described, you'll have a small hardcoded inner kernel, a whole bunch of sandboxed kernel code running in the kernel level of privilege but still essentially like "microkernel" components, suitably modified for the modern world, and a sane and safe sandbox to add more. And it'll be better than the original concept of "microkernels" were, and might as well co-opt the name, because we won't be going back to those ideas. The difference is that if we're getting to the point we can write sandboxes that actually work, the microkernel idea should be updated to account for that. The original microkernel ideas didn't expect that was a reasonable possibility, because at the time it wasn't. (Whether it is yet, we'll have to see, but we're at least getting closer.)
No receiver mailboxes. Either the reveicer is ready to pick it up the message, or it is lost. Just like with signals. Huge mistake.
Though I do know other permissive real-time kernels with mailboxes. Which do work fine in the industry, like in automobile and airplanes/rockets. Or even darwin. So maybe there's a second big problem in the Hurd design I'm not aware of.
No need to cherry pick Hurd for your example when there are shipping microkernels.
Move it out of the kernel to isolate and protect. If you can isolate and protect code within the kernel's memory protection boundary, I'm not sure that that should disqualify it as a microkernel.
In other words, I'm not sure that the microkernel design depends on memory protection boundaries specifically, it's a more general philosophy, akin to, "a microkernel is an operating system design which runs the minimum amount of code needed for an OS with full trust".
All of your drivers and user programs live in user space and secured via the kernel. E.g. a USB host controller is a process which talks to the hardware and provides some sort of interface which USB device drivers can talk to to implement their respective device interfaces. A file system on a disk is handled by a file system process speaking e.g. ext4 to a disk and then a socket is provided to mount and access the files.
It's a very nice setup because if your IPC mechanism is how everything talks then you can think of the kernel as a microservice host and IPC router which your processes talk through. They can provide or consume resources. Now if you can push that IPC mechanism over the network transparently then you have a distributed system. Then you eliminate a lot of code and protocol nonsense talking over networks.
What's the difference besides technicalities of the implementation? Everything else you are saying about using IPC interfaces and isolating the kernel code still applies in both cases.
Although i think the SPIN base system was a lot smaller than the Linux kernel - for example :
> The Web server application, as well as the file system interface, entire network protocol stack, and device infrastructure are all linked into the system after it boots.
If the kernel, at its core, is just an execution environment for sandboxed programs that are sent into it, the source-level decomposition is solid, and the kernel is micro, relatively speaking, to the total amount of functionality. Drivers etc. can still be developed outside the kernel and need only interact with the sandbox API. Drivers may be upgraded live if the sandboxed programs can be replaced. The point of sandboxing is to ensure (ideally prove) that there can be no crashes due to violations of memory safety, or busy loops.
Is this project moving things from user space to kernel space? If so it's the opposite of a microkernel.
E.g. if the Linux kernel becomes aware of the application layer, that really does sound like stuff that used to run in user space is now running in kernel space.
For example, nebulet https://github.com/nebulet/nebulet wanted to run everything in ring 0, but "userspace" was WebAssembly code that had been compiled by "kernelspace" to run sandboxed in ring 0
If a CPU architecture was implemented to only offer ring 0, would a microkernel be impossible? Or would we accept this concept of kernelspace/userspace being implemented in software?
The analogy is like microservices vs monolith, but with kernels, not big applications.
Also, why use JIT rather than offline verification and ahead-of-time compilation?
Aside: the idea that the web delivers on the requirement of Programmability must be provided with minimal overhead is pretty laughable. Think Microsoft Teams (a chat application) would consume 600MB of memory if it were built with C++ rather than Electron? I realise not every JIT-powered technology needs to be as bloated as the web, but it seems a poor example.
I'm not entirely certain, but my impression is that EBPF has more limited capabilities, and so the API can be kept stable more easily. Of course that also means that you cannot do everything in EBPF that you can do in ordinary C modules.
Hence my questions elsewhere in the discussions if you could write device drivers in EBPF. If yes, that might enable much easier when the toolchain eventually matures. If not, much is explained.
How do you dynamically instrument things? How do you write programs which decide, at run time, to move compute closer to the hardware?
My thinking was that I would trust the offline verification, and this would be enough for me (as superuser) to load the precompiled module into the kernel. I believe LLVM does something vaguely comparable, where it can verify that bitcode modules are well-formed, to protect against certain classes of compiler bugs. (Java of course does its class-verification at runtime.)
I don't think this idea is all that different though. If the JIT implements caching of its generated native-code (assuming it can do this securely) then we'd get the best of both worlds: I don't need to be a superuser, and we avoid needless recompilation.
> How do you write programs which decide, at run time, to move compute closer to the hardware?
When would this make sense? If you've got a working kernel implementation, which is robust and trusted, why would you not use it?
The point of a safely programmable kernel is that the user gets to inject third party code into their kernel without needing to know if it's safe, because the kernel will take care of it.
Anyway, if I'm understanding things correctly, the point of using JIT is to handle the compilation in a trusted context rather than having it run as the user.
I think you're fundamentally not understanding why we have virtual machines in language implementations, or process boundaries in operating systems, and why these things are good and useful. Because if you did, you'd see that a sandbox in the kernel is a hybrid of the two ideas.
No. My understanding is fine.
> You wouldn't write the code in the kernel because writing safe code is almost impossible for humans without a lot of tooling help, and that tooling looks a lot like a sandbox.
Right, but with EBPF, you have exactly that.
My question was in response to your How do you write programs which decide, at run time, to move compute closer to the hardware?
To rephrase, my question was this: If you have the ability to move code into the kernel without concerns of stability or security, why would you wait until runtime to decide whether to do it? Why wouldn't you just do it unconditionally?
Of course. My question there was about the use of JIT rather than ahead-of-time compilation. As we've now both said, the answer is that EBPF is able to move the compilation out of the hands of the user, avoiding having to trust the user. It may also be helpful that the input to the JIT can be built up at runtime, as with the routing example you mentioned, but this could be done even if we trusted the user to handle the compilation.
This doesn't mean my suggestion is unworkable. You could entrust the user with the compilation process, and you'd still get the robustness guarantees, but, well, you'd have to trust the user. Better to have the kernel handle the compilation (and ideally caching).
You can use proof-carrying code. There is a residual "online" verification of course, but it ought to be quick and efficient.
You mean to upgrade EBPF itself? Well of course. Same as any kernel upgrade.
> Of course this might open you up to security problems if you're relying on incorrect assumptions while doing that.
I don't follow. It's giving the system a full proof of safety. What assumptions are there? It seems very similar to Java's class verification, which doesn't suffer from issues with ungrounded assumptions.
> this project also has trusted components of its own, such as the JIT. A proof verifier can be a lot simpler than a JIT.
Interesting point. It might reduce the total amount of highly-trusted kernel code to approach things that way.
It just provided a new _additional_ extension mechanism which is sandboxed and much nicer to use.
But to make the Linux kennel into a micro kennel eBPF would need to have the capability to replace _all_ existing kernel modules. Including file system drivers, and graphic drivers. Which is not something it's cable of sand at least currently it's only meant for new kennel functionality in to of the "core" which we have.
This maybe could change at some point in the (not very close by) future. But for now it doesn't yet turn Linux into a micro kennel.
Dathinab, maybe do an "edit" pass? :)
EDIT: fixed my own mess, thanks :)
I think you meant "so many"?
currently it links to the 2nd to last slide and not the beginning.
(I tried to localise this for a predominantly US audience.)
To be fair, you probably aren't any better or worse than Dilfer with or without the gatorade.
> (I tried to localise this for a predominantly US audience.)
Well, and HN is not the internet. What is your point?
We built a rust tool chain that can output ebpf elfs :). https://github.com/solana-labs/rust-bpf-builder
If you had problems working with kernel modules before, you probably should expect struggling with writing correct code for eBPF too. It's not for everyone.
For both traditional kernel modules and eBPF programs, you compile the code ahead of time. For kernel modules, if you have a bug, you load it into the kernel and the kernel hard crashes at runtime. For eBPF programs, the kernel will reject the program before you inject it.
In practice to deploy eBPF programs, you end up adding the kernel verification step into part of your CI/dev workflow so that by the time you ship your programs, you know that they will safely load and safely run in real environments.
With that logic, could we argue loadable kernel modules (perhaps with proper memory separation) are a sign of a microkernel architecture?
normally the microkernel means the minimum needed primitives to implement the OS and after that everything is build on top of that, not pluggable modules.
For all intents and purposes the Linux kernel is a monolithic one and the eBPF capability make it more extensible / less of a pain to do certain things but definitely do not turn it into a microkernel.
Sure, the minimum amount of full trust code. In this case, the full trust code is the eBPF VM which enforces protection boundaries instead of the MMU as in a classic microkernel. I'm not sure a microkernel classification ought to depend on the MMU specifically, it's a general system design philosophy.
the eBPF vm uses the capabilities of the kernel, it is not the kernel. No kernel, no nothing.
also, following your train og thought I could say that containers make this a microkernel. it would be a claim that would get you laughed out of a room.
A microkernel provides a minimal set of trusted runtime services for an operating system, and relies on some protection mechanism for isolating subsystems to avoid corrupting the trusted core. Preemptive scheduling is not necessarily part of it; depends whether your system requires "time" to be a protected resource.
eBPF is a kernel service, just like processes, scheduling, IPC. If eBPF can isolate subsystems and supports safe collaboration of eBPF programs despite all running at ring 0, then the eBPF VM in the Linux kernel could qualify as a microkernel once you remove everything else.
> also, following your train og thought I could say that containers make this a microkernel.
If you could run all of the device drivers in containers such that they couldn't corrupt the kernel's data, then sure, you could run it as a microkernel because you wouldn't have anything left in the kernel except essential services like threading, IPC and containers.
Check out xok, it had three in kernel virtual machines.
Original BPF is in most Unix kernels, it was just a way of writing simple packet filtering programs that run in-kernel. For example, tcpdump is effectively just a frontend that emits BPF bytecode.
eBPF expands the capabilities of the VM, but it still has tight restrictions on what can run: no unbounded loops, arbitrary memory access, etc. I would recommend trying out bpftrace as a first step:
"While eBPF was originally used for network packet filtering, it turns out that running user-space code inside a sanity-checking virtual machine is a powerful tool for kernel developers and production engineers."
"The eBPF virtual machine more closely resembles contemporary processors, allowing eBPF instructions to be mapped more closely to the hardware ISA for improved performance."
"Originally, eBPF was only used internally by the kernel and cBPF programs were translated seamlessly under the hood. But with commit daedfb22451d in 2014, the eBPF virtual machine was exposed directly to user space."
"What can you do with eBPF?
An eBPF program is "attached" to a designated code path in the kernel. When the code path is traversed, any attached eBPF programs are executed. Given its origin, eBPF is especially suited to writing network programs and it's possible to write programs that attach to a network socket to filter traffic, to classify traffic, and to run network classifier actions. It's even possible to modify the settings of an established network socket with an eBPF program. The XDP project, in particular, uses eBPF to do high-performance packet processing by running eBPF programs at the lowest level of the network stack, immediately after a packet is received.
Another type of filtering performed by the kernel is restricting which system calls a process can use. This is done with seccomp BPF.
eBPF is also useful for debugging the kernel and carrying out performance analysis; programs can be attached to tracepoints, kprobes, and perf events. Because eBPF programs can access kernel data structures, developers can write and test new debugging code without having to recompile the kernel. The implications are obvious for busy engineers debugging issues on live, running systems. It's even possible to use eBPF to debug user-space programs by using Userland Statically Defined Tracepoints."
There, now you understand eBPF.
It is not a Microkernel.
It is an in-kernel Virtual Machine, with access to all of the kernel, whose programs can register for, receive, filter, and optionally act upon or act to moderate, kernel events.
Quite the powerful tool indeed -- but not a Microkernel...
With eBPF, hot-patching servers will take a very short time to start the extensive downtime, plus the consequent reboot of 20,000 servers.