- There is the argument of reduced attack surface/typechecked safety. I get that, but I don't see how this is different from compiling custom Linux/NetBSD kernels per VM?
- There is the argument for increased performance. I don't get that at all -- I've done experiments on unikernel system for MirageOS vs IncludeOS vs Kernel on a web and memcache server and the performance results were worse (15%), 0.4% better respectively for MirageOS respectively but for a tonne more effort and really painful debugability.
- There is the argument for isolation but I don't see how it's different because the biggest cause of churn is the hypervisor? My results with Xen 4.x didn't show any difference in isolation performance with both Include and Mirage.
- There is the argument that OSes don't map to existing hardware as well, e.g. in the InfoQ talk in the comments the speaker talks about PCIe VFs for example. All of these VFs map to physical resources, and you end up doing a much worse job when you can't centrally manage these -- that's one of the main reasons SR-IOV hasn't been adopted (including the fact that migration is non-existent). Similarly with CPU hardware threads or NVMe device drivers -- they are abstractions provided for convenience and reducing the burden of porting existing drivers to work in hypervisor environments but surely the same issues that appear with VM contexts (hardware scheduling and isolation) still exist in unikernel environments? I feel I'm missing something in particular about this point as I see it made often and it doesn't make sense to me.
- How do unikernels work in a world of containers? This has always puzzled me as I thought people moved away from VMs due to the heavyweight nature but of course the flip side is the underlying isolation abstractions are more lightweight -- so in essence are unikernels in containers user-level processes or are they replacing the kernel bits too?
1. I can build a bootable binary without the full TCP and UDP, shaving off 60% of the whole IP stack.
LTO removes a large amount of code as well. It makes all the difference.
2. Performance comes from building a fixed binary (all memory addresses are constant), as well as using LTO and C++.
IncludeOS is not at its most performant because we haven't been working on performance. Project is still young. We should be faster overall compared to just about anything that isn't compiling to a fixed binary layout.
3. Isolation can come in many forms. On traditional hypervisors you have the same old same old. While on a newer type of hypervisor you can do just about anything from avoiding VMEXITs altogether to removing the ability to even talk to the hypervisor after configuration stage.
If the hypervisor is just a thin configuration program...
4. Yes doubly so if your OS uses threads as well, and the schedulers are almost always beating differently on the hypervisor and the guest. This causes even more trashing than normal. In any case, IncludeOS is not using threads at all. It is fully async and matches the hardware really well. The bottleneck is the multiplexing ability of the cloud platform itself. We can live with that.
5. I don't know much about containers other than that I avoid them. Containers share kernel. Sorry, I can't answer this.
So, no (K)ASLR?
1. So, in production, we do this all the time with the kernel. We have specific .config's that we use on a per-app basis so I'm not sure why this point is made against the kernel? Is it because in general people run what they're provided by the distro? I don't really understand the LTO argument as well, is it that the kernel is smaller? When I compared IncludeOS against the production kernels we run for memcache they were 1.5x bigger. In general, how much of an issue is kernel code size? a 50M kernel, while big pales in comparison to the amount of memory required for runtime operation. Have you done any measurements on runtime costs?
2. Do you have any benchmarks/evidence you could point me to? My benchmarks showed the exact opposite at a huge development cost (debugging unikernels is very painful - at least for me)
3. What do you mean by "same old same old"? The primitives for isolation are provided by the hardware with VT-x and VT-d and your point about "removing the ability to even talk to the hypervisor after configuration stage" is referring to VT-x from what I gather. All modern hypervisors use these primitives and in combination with VT-x and PCIe VFs provide the illusion of complete access to hardware so I think I'm missing your point.
4. I'm sorry but I find the argument that "threads automatically mean worse" hard to understand. The cost of a context switch is the cost of a context switch. The cost of a context switch between threads/processes on the same CPU is still much cheaper than the cost of a context switch between VMs on the same CPU. If a program requires threading then, with the Unikernel either I need an abstraction for threading (e.g. co-operative scheduling) that bypasses the cost of the switch _or_ I need to write the program differently. In your answer, you make the point you don't support threads which means that either I move to an event-driven model (which I can do in Linux too) until I run out of CPU or I switch between VM contexts, which in my understanding is much more costly. Do you mind elaborating further?
5. Do you have an official line on this? I would assume the people you're selling against will be deploying containers so it would be good to get a feel for unikernels on containers.
Thanks again for your response. I found it most helpful.
1 and 2: IncludeOS is a service where more stuff is compiled and inlined as a whole, compared to a normal kernel. The whole thing will be baked into one, in many cases inline. This gives the compiler a chance to simplify more things and optimize better, and while the size doesn't have to be smaller, we do have less code overall. Especially after the linker removes unused objects. IncludeOS is not being built with LTO at all presently (although it does have that ability on the Linux userspace platform), nor does it have any major performance work done other than making sure there are no big performance bugs around. I am working on this, but it's not a trivial thing. In a perfect world I would just enable LTO and it would just work, like I already do on the userspace Linux platform of IncludeOS.
I have written a beginners-kernel project where I show that LTO reduces image size to 2/3 from 32kb to 11kb: https://github.com/fwsGonzo/barebones
2. There is a paper showing this linked to in the repo itself. You can at least build the unikernel on linux userspace, so it shouldn't be too hard to debug it while you are creating it. Other than that, debugging is not easy in production just like you say.
3. Traditional hypervisors manage access to hardware all the time, while a new type of unikernel hypervisor wouldn't. It would just sit there after the configuration/startup phase and never be exited into. The number of VMEXITs something has to do for each packet is a good estimator for how performant something is. If the number is zero, then it must be performant. See bareflank and others. I can't elaborate further because it's an area of research, and I'm not an expert. Afaik there is nothing like this running in production yet, even if we can prove that it's possible to run exitless and with no kernel privileges! Maybe next year we will know more.
4. We absolutely support threads, its just not working right now. We also have an SMP API that is quite good, and the virtual CPUs are scheduled on the host side. We also have fibres. While we do have tests, I don't know the state of things. We have so many other things going trying to make the OS stable in a production environment. I couldn't find a single thing proving my assertion when I made a quick search, so I will just retract until I can find it again. As I'm sure you know unikernels tend to have a single address space, which removes the need to do a full context switch.
5. I don't think our customers will be deploying containers. Rather, I think they will be running hundreds of instances per physical host in their private clouds. Instances will act as Internet gateways, firewalls etc. Some network function. That is the stage we are at now. I really don't see containers being used after the binaries are built, but I could be wrong.
It looks like this: https://github.com/hioa-cs/IncludeOS/blob/dev/api/smp
With that you can schedule work to CPUs directly.
I could write about this in much more detail, but then I would have to have help writing it. So I won't.
I should also add that unikernels traditionally run with kernel privileges, but that is not a requirement. They can run just fine with some kind of modern hypervisor that sets everything up beforehand. In that case, the unikernel runs like a normal userspace program, just inside a VM. All communication would be with MMIO and MSRs that don't exit.
The Modern Operating System In 2018
I know not a lot of people are using FreeBSD, but combined with OpenBSD I am pretty sure there are more than 1%.
Edit: Turns out it wasn't a lie.... at least in terms of Web Server usage... FreeBSD has less than 1%.
All Unix and Windows-derived systems are meant to have the ability to reconfigure themselves. For a lot of systems there isn't really a good reason why the system itself should have this capability.
My Wifi access point is a good example. The fact that it runs Linux means that a security flaw gives the attacker the tools to fundamentally modify the devices' behavior. The attacker can install sniffers, blockchain miners or spambots, without vendor approval. This makes any RCE bug critical and in general means the device is dangerous to use if the vendor stops patching it.
How do you do that if there is no facility to load and run code? That's what he's getting at, the binary installed has fixed functionality and has no facilities to do anything but what it's programmed to do. It may have no file system nor almost any other feature that an OS has. It likely supports network ports, but they are unlikely to allow arbitrary code execution becuase again, they will have a fixed function.
The only way to do what you're talking about is replacing or modifying the binary, which means you have physical access or there is a serious breakdown in security which nothing could withstand.
In my understanding even if I have a piece of code that just loops printing "hello world" and nothing else if it's got a vulnerability that allows me to execute code all bets are off.
FYI if anybody @ IncludeOS is here, the page at http://www.includeos.org/get-started.html automatically made a link out of the file path `./seed/service` as well as `your_service/service.cpp` and 404'd on me, which was confusing for a moment. Looks like putting them in code blocks would prevent it.
Currently if you want to use containers and Serverless you'll need to spin up 200+MB of linux bistro + tons of dependancies that have nothing to do with what you originally planned, that's to say running a container with a virtual machine on it.
This kernel would allow to replace Unix for a much more lightweight (1MB) and efficient kernel to run programs on it with an hypervisor.
This is promising.
So yeah, it would be interesting to see this on a serverless platform. Hit me up if you have a serverless platform. :-)
The main examples I know of are Mirage (OCaml compiler) and IncludeOS (C++ compiler).
Though, I think there are experimental / research systems that try to understand a program composed of multiple languages that each compile to LLVM, or anything that compiles to JVM bytecode. So those might not be limited to a single language.
I have nothing against VMs either. But even when they're tiny (as in the case of unikernels), do VMs scale as much as processes? I'm skeptical. Maybe someone has data.
Containers kinda suck for those 3 things but no other tool does all 3. It's a jack of all trades master of none situation.
I hate the trend of "download this container cause packaging a .deb is hard" etc.
It could and historically has been solved even without linking everything statically, it's just that Linux people have some shared delusion that a stable and consistent base system is bad, there should be no distinction between system and application, and having two copies of a library is an unforgivable sin. Consequently we get the nightmare complexity and inflexibility of the package management scheme to do things we did in DOS on a 286 with no special software at all.
You also get a network namespace, process namespace, user namespace and ability to set cpu/memory limits.
Since you would have a container for each application, you have an easier time setting and testing restrictive apparmor/selinux capabilities and even get some hardening out of the box.
Sure you could get them without containers, but the whole benefit is doing it in a standard, easy way.
"You Could Have Invented Container Runtimes: An Explanatory Fantasy"
Containers on their own aren't enough to deploy a reliable production system. Even on a small scale. To use them for reals you need a lot of domain knowledge and the support of a large ecosystem like Docker. And even then it's easy to get stuck in a rats nest of brittle tooling.
I don't have anything against containers. I'm just tired the exaggerated "ease of use" claims (eg: I talk about container management being a non-trivial problem and people telling me "just user Kubernetes!").
Docker became popular because of a ubiquitous CLI. Having good UX is only part of the battle: the massive network effect (and associated tooling boost/community support) is significant and should not be discounted.
I'm sure there are similar things to Docker's all-in-one UX that have emerged on the BSDs. I'm also sure there are programs to configure Dropbox using an SCM and FTP links in a fast, seamless way on par with the Dropbox setup flow, but they aren't going to take off, because the predecessor is not only convenient, but is also pre-existing with massive popularity.
The same is true for Docker:containers. Some competitor may replace them in time, but whoever they are, "as good as" is not going to be their value prop.
1. Can I "run" and "manage" unikernels on my development machine like I would a Docker image? (Use case: keeping build and deploy environments identical)
2. What's the workflow for inspecting running containers?
3. Is it possible to work in languages other than C++?
I'm wondering how this is performing against the raspberry SOC chip. I'm not sure what is the most expensive part, the licensing or fabricating the chip.
Here's a nice list of unikernel projects, some with more portability in mind than others: https://github.com/cetic/unikernels#existing-projects
Obviously things like fork() won't work.
And that's a pretty tough constraint in non-GCd environments, as now you can't rely on a host O/Ss process resource management. Manual memory management in C and C++ apps and long-running services make all the difference in terms of development complexity because of memory fragmentation, async (where memory can be required/released at any time), etc.
Also, I should point out that relying on fork/exec to clean up long running processes sound like something of a hack. Would you really like to have a system that is required to restart itself ever to often as a strategy to manage memory?
We've just spent quite a bit of time implementing a buddy allocator to combat memory fragmentation on long running systems. It seems to do a good job and I believe it is more or less a solved problem.
Other than that I think you are right. Unless IncludeOS can prove that it can offer something akin to the same level of convenience as current Linux-based systems have I think it'll be hard for the OS to get traction.
The upside would be stronger isolation and a system that would be a lot harder to break into. The absence of system calls makes most attacks very, very inconvenient and the absence of self-modification functionality is strengthening it further.