Hacker News new | comments | ask | show | jobs | submit login
IncludeOS – A minimal, resource efficient unikernel for cloud services (includeos.org)
231 points by unixhero 5 months ago | hide | past | web | favorite | 84 comments



I have never really got unikernels, maybe someone could educate me?

- There is the argument of reduced attack surface/typechecked safety. I get that, but I don't see how this is different from compiling custom Linux/NetBSD kernels per VM?

- There is the argument for increased performance. I don't get that at all -- I've done experiments on unikernel system for MirageOS vs IncludeOS vs Kernel on a web and memcache server and the performance results were worse (15%), 0.4% better respectively for MirageOS respectively but for a tonne more effort and really painful debugability.

- There is the argument for isolation but I don't see how it's different because the biggest cause of churn is the hypervisor? My results with Xen 4.x didn't show any difference in isolation performance with both Include and Mirage.

- There is the argument that OSes don't map to existing hardware as well, e.g. in the InfoQ talk in the comments the speaker talks about PCIe VFs for example. All of these VFs map to physical resources, and you end up doing a much worse job when you can't centrally manage these -- that's one of the main reasons SR-IOV hasn't been adopted (including the fact that migration is non-existent). Similarly with CPU hardware threads or NVMe device drivers -- they are abstractions provided for convenience and reducing the burden of porting existing drivers to work in hypervisor environments but surely the same issues that appear with VM contexts (hardware scheduling and isolation) still exist in unikernel environments? I feel I'm missing something in particular about this point as I see it made often and it doesn't make sense to me.

- How do unikernels work in a world of containers? This has always puzzled me as I thought people moved away from VMs due to the heavyweight nature but of course the flip side is the underlying isolation abstractions are more lightweight -- so in essence are unikernels in containers user-level processes or are they replacing the kernel bits too?


I am a kernel developer working on IncludeOS. Take everything I say with a grain of salt as this is just my thoughts at 9 in the morning in written form.

1. I can build a bootable binary without the full TCP and UDP, shaving off 60% of the whole IP stack. LTO removes a large amount of code as well. It makes all the difference.

2. Performance comes from building a fixed binary (all memory addresses are constant), as well as using LTO and C++. IncludeOS is not at its most performant because we haven't been working on performance. Project is still young. We should be faster overall compared to just about anything that isn't compiling to a fixed binary layout.

3. Isolation can come in many forms. On traditional hypervisors you have the same old same old. While on a newer type of hypervisor you can do just about anything from avoiding VMEXITs altogether to removing the ability to even talk to the hypervisor after configuration stage. If the hypervisor is just a thin configuration program...

4. Yes doubly so if your OS uses threads as well, and the schedulers are almost always beating differently on the hypervisor and the guest. This causes even more trashing than normal. In any case, IncludeOS is not using threads at all. It is fully async and matches the hardware really well. The bottleneck is the multiplexing ability of the cloud platform itself. We can live with that.

5. I don't know much about containers other than that I avoid them. Containers share kernel. Sorry, I can't answer this.


> Performance comes from building a fixed binary (all memory addresses are constant),

So, no (K)ASLR?


Yes and no, if you really want it there is no reason you can't have it. I know that some OSes now do some link-stage procedures at boot, but we don't. Unikernels seldom need to reboot, and if they do, you might as well update it while preserving all the important bits (including TCP connections) by doing a live update. It's not exactly a simple solution as now you suddenly need a management solution. We also have the management part, but it's not a part of the open-source OS.


Thank you very much for your response. Some follow-up questions, in the same order as your responses:

1. So, in production, we do this all the time with the kernel. We have specific .config's that we use on a per-app basis so I'm not sure why this point is made against the kernel? Is it because in general people run what they're provided by the distro? I don't really understand the LTO argument as well, is it that the kernel is smaller? When I compared IncludeOS against the production kernels we run for memcache they were 1.5x bigger. In general, how much of an issue is kernel code size? a 50M kernel, while big pales in comparison to the amount of memory required for runtime operation. Have you done any measurements on runtime costs?

2. Do you have any benchmarks/evidence you could point me to? My benchmarks showed the exact opposite at a huge development cost (debugging unikernels is very painful - at least for me)

3. What do you mean by "same old same old"? The primitives for isolation are provided by the hardware with VT-x and VT-d and your point about "removing the ability to even talk to the hypervisor after configuration stage" is referring to VT-x from what I gather. All modern hypervisors use these primitives and in combination with VT-x and PCIe VFs provide the illusion of complete access to hardware so I think I'm missing your point.

4. I'm sorry but I find the argument that "threads automatically mean worse" hard to understand. The cost of a context switch is the cost of a context switch. The cost of a context switch between threads/processes on the same CPU is still much cheaper than the cost of a context switch between VMs on the same CPU. If a program requires threading then, with the Unikernel either I need an abstraction for threading (e.g. co-operative scheduling) that bypasses the cost of the switch _or_ I need to write the program differently. In your answer, you make the point you don't support threads which means that either I move to an event-driven model (which I can do in Linux too) until I run out of CPU or I switch between VM contexts, which in my understanding is much more costly. Do you mind elaborating further?

5. Do you have an official line on this? I would assume the people you're selling against will be deploying containers so it would be good to get a feel for unikernels on containers.

Thanks again for your response. I found it most helpful.


Hey, IncludeOS is mostly a work in progress, and so the answers will reflect that.

1 and 2: IncludeOS is a service where more stuff is compiled and inlined as a whole, compared to a normal kernel. The whole thing will be baked into one, in many cases inline. This gives the compiler a chance to simplify more things and optimize better, and while the size doesn't have to be smaller, we do have less code overall. Especially after the linker removes unused objects. IncludeOS is not being built with LTO at all presently (although it does have that ability on the Linux userspace platform), nor does it have any major performance work done other than making sure there are no big performance bugs around. I am working on this, but it's not a trivial thing. In a perfect world I would just enable LTO and it would just work, like I already do on the userspace Linux platform of IncludeOS.

I have written a beginners-kernel project where I show that LTO reduces image size to 2/3 from 32kb to 11kb: https://github.com/fwsGonzo/barebones

2. There is a paper showing this linked to in the repo itself. You can at least build the unikernel on linux userspace, so it shouldn't be too hard to debug it while you are creating it. Other than that, debugging is not easy in production just like you say.

3. Traditional hypervisors manage access to hardware all the time, while a new type of unikernel hypervisor wouldn't. It would just sit there after the configuration/startup phase and never be exited into. The number of VMEXITs something has to do for each packet is a good estimator for how performant something is. If the number is zero, then it must be performant. See bareflank and others. I can't elaborate further because it's an area of research, and I'm not an expert. Afaik there is nothing like this running in production yet, even if we can prove that it's possible to run exitless and with no kernel privileges! Maybe next year we will know more.

4. We absolutely support threads, its just not working right now. We also have an SMP API that is quite good, and the virtual CPUs are scheduled on the host side. We also have fibres. While we do have tests, I don't know the state of things. We have so many other things going trying to make the OS stable in a production environment. I couldn't find a single thing proving my assertion when I made a quick search, so I will just retract until I can find it again. As I'm sure you know unikernels tend to have a single address space, which removes the need to do a full context switch.

5. I don't think our customers will be deploying containers. Rather, I think they will be running hundreds of instances per physical host in their private clouds. Instances will act as Internet gateways, firewalls etc. Some network function. That is the stage we are at now. I really don't see containers being used after the binaries are built, but I could be wrong.


So how can you build a parallel application on top of this that can share all cores of the CPU?


I wrote an SMP API once, and while its fully implemented still in the OS, it won't work right because we changed C API to MUSL and it enables/disables locks internally by counting up/down number of threads. So, that means you cant use it at the moment.

It looks like this: https://github.com/hioa-cs/IncludeOS/blob/dev/api/smp

With that you can schedule work to CPUs directly.


The idea is that things you would normally get from an OS are now libraries that are compiled into your application. If you want a scheduled, you compile it in. If you want a TCP stack, you compile it in.


I was reading this thinking that security would be mentioned at some point in your response! Presumably a big benefit of smaller codebases is a more realistic possibility of auditing the security?


Less code means less attack surface and easier auditability. VMs are traditionally very isolated, and guests running in special hypervisors like uKVM (and others) are even more so. The simpler hypercall API simplifies things. No shared kernel, like with containers. Immutability.

I could write about this in much more detail, but then I would have to have help writing it. So I won't.

I should also add that unikernels traditionally run with kernel privileges, but that is not a requirement. They can run just fine with some kind of modern hypervisor that sets everything up beforehand. In that case, the unikernel runs like a normal userspace program, just inside a VM. All communication would be with MMIO and MSRs that don't exit.


While the code base of a non-unikernel can be larger a mainstream project like Linux also attracts more reviewers, who spend more time reviewing the code (especially in the critical areas, some hardly used device driver might see less attention)


Any reasons you are avoiding containers?


I am simply unfamiliar with them. Created one container once, and that's it. If I knew them better I would probably use them more.


Worth watching is Justin Cormack's recent presentation on unikernels, LinuxKit, eBPF and databases such as SeaStar:

The Modern Operating System In 2018

https://www.youtube.com/watch?v=dR2FH8z7L04


Yeah, that's a really good overview. I think it's interesting how progress in the DevOps field is characterized by removing things more than adding them, and I think it's great. In a way I think that's the highest and hardest aim of computing technology, not deciding what to build, but deciding what to leave out.


How could Windows + Linux be 99.9% of the marketshare?

I know not a lot of people are using FreeBSD, but combined with OpenBSD I am pretty sure there are more than 1%.

Edit: Turns out it wasn't a lie.... at least in terms of Web Server usage... FreeBSD has less than 1%.

Sigh.


FreeBSD may have less then 1% of server market share but FreeBSD drives a large amount of network traffic. For example FreeBSD runs the open connect CDN that Netflix uses and that accounts for almost 32% of North American downstream traffic [1]. Limelights edge CDN nodes run FreeBSD and Limelight is one of the Bigger CDN companies out there [2]. Juniper routers are FreeBSD based and are the second most used routers after Cisco. There's plenty of other examples but FreeBSD may not be widely deployed but it serves a significant amount of the worlds internet traffic.

[1]: http://testinternetspeed.org/blog/half-of-all-internet-traff...

[2]: https://www.freebsdfoundation.org/testimonial/limelight-netw...

[3]: https://en.wikipedia.org/wiki/Junos_OS


Really enjoying this talk. Thanks for sharing!


Great talk! Thanks for sharing it.


FWIW the most interesting tidbit I've learned after working with IncludeOS for couple of years is this; The operating system cannot reconfigure itself.

All Unix and Windows-derived systems are meant to have the ability to reconfigure themselves. For a lot of systems there isn't really a good reason why the system itself should have this capability.

My Wifi access point is a good example. The fact that it runs Linux means that a security flaw gives the attacker the tools to fundamentally modify the devices' behavior. The attacker can install sniffers, blockchain miners or spambots, without vendor approval. This makes any RCE bug critical and in general means the device is dangerous to use if the vendor stops patching it.


Your response confuses me so I must be misunderstanding it. Code is code, all I need is an indirect jump to a pointer somewhere in memory to run the arbitrary code I've installed to compromise the system, so why are unikernels not susceptible to this attack?


"arbitrary code I've installed"

How do you do that if there is no facility to load and run code? That's what he's getting at, the binary installed has fixed functionality and has no facilities to do anything but what it's programmed to do. It may have no file system nor almost any other feature that an OS has. It likely supports network ports, but they are unlikely to allow arbitrary code execution becuase again, they will have a fixed function.

The only way to do what you're talking about is replacing or modifying the binary, which means you have physical access or there is a serious breakdown in security which nothing could withstand.


Surely the moment I have a vulnerability that allows me to execute code (via a buffer overflow or similar) I have the ability to run the code that adds that feature?

In my understanding even if I have a piece of code that just loops printing "hello world" and nothing else if it's got a vulnerability that allows me to execute code all bets are off.


I guess he means that the OS cannot do more than it was intended to do. A software with a very limited set of features and functionality that you can't expand even if you wanted to. Which would mean an attacker would have to get by with only that instead of the "endless possibilities" an OS usually provides.


Would love to dig into this and try some small C servers.

FYI if anybody @ IncludeOS is here, the page at http://www.includeos.org/get-started.html automatically made a link out of the file path `./seed/service` as well as `your_service/service.cpp` and 404'd on me, which was confusing for a moment. Looks like putting them in code blocks would prevent it.


OSv is another, compatible with some unix apps http://osv.io


It seemed interesting back then but it's kind of dead since the core developers (Avi Kivity, Nadav Har'El and others) have moved on to another startup called ScyllaDB that is a re-implementation of Apache Cassandra in C++.


The OSv project is indeed now moving much slower than it used to when we had an entire startup company devoted to it, but is not dead - it still has three committers from three companies - myself (Nadav Har'El), Waldek Kozaczuk and Timmons Player. It still works, and you are welcome to try it. OSv is not as minimal as includeOS, which of course depending on your viewpoint and use case, is either a pro or a con. OSv supports more types of hypervisors, and has support for a larger subset of the Linux ABI, can run pre-compiled, unmodified, Linux shared libraries, and supports multi-core VMs.


OSv is a much friendlier entree to unikernals which makes it much more productive for most existing code. Glad to hear development is continuing.


This is potentially a big step forward for the cloud, this is also the missing component to push Serverless.

Currently if you want to use containers and Serverless you'll need to spin up 200+MB of linux bistro + tons of dependancies that have nothing to do with what you originally planned, that's to say running a container with a virtual machine on it.

This kernel would allow to replace Unix for a much more lightweight (1MB) and efficient kernel to run programs on it with an hypervisor.

This is promising.


You'd need a small and efficient hypervisor. Ukvm and Solo5 provides such a system and if I remember correctly IncludeOS can boot and shutdown in less than 10ms on those systems.

So yeah, it would be interesting to see this on a serverless platform. Hit me up if you have a serverless platform. :-)


Had the same impression. I was surprised for a long time there hasn't been any such project under the k8s umbrella.


Do these unikernel OSs tend to be language specific? Like if this app is written in x language, use y OS?


Yes, because they rely on the compiler (for a specific language) to build and deploy only what the application depends on.

The main examples I know of are Mirage (OCaml compiler) and IncludeOS (C++ compiler).

Though, I think there are experimental / research systems that try to understand a program composed of multiple languages that each compile to LLVM, or anything that compiles to JVM bytecode. So those might not be limited to a single language.


Here is a list of Unikernel projects, with their associated languages: http://unikernel.org/projects/


Some do, and some don't. OSv (https://github.com/cloudius-systems/osv), for example, is also a unikernel in the sense that it can only run a single application at a time (it has no kernel-user boundary, nor support for multiple isolated processes). But is designed to run any Linux executables (using a dynamic linker which is part of the kernel) so it is not limited to a particular programming language.


Can I use gdb to debug my applications?


Yes, build for userspace linux and run it as a normal program. Then use all your favorite tools. You can also attach GDB to a qemu process that is waiting for a connection (abit harder). In a production environment on a public cloud it gets even harder. For that I can only recommend UBsan (undefined-sanitizer), logging and memory dumps.


finally the end of lx* containers on the way. not the actual end of course, who knows docker guys could buy this one as they did to some other unikernel companies.


I don't understand all the hatred towards containers. It's simply an OS abstraction to facilitate isolation. What's wrong with that? In fact I'd love to see more containers, especially on the Linux desktop: for example a container aware compositor that could render windows from container processes and color code their borders, like Qubes does.

I have nothing against VMs either. But even when they're tiny (as in the case of unikernels), do VMs scale as much as processes? I'm skeptical. Maybe someone has data.


People have turned containers in to a packaging, deployment, and orchestration tool rather than the original intent of lightweight virtualization.

Containers kinda suck for those 3 things but no other tool does all 3. It's a jack of all trades master of none situation.

I hate the trend of "download this container cause packaging a .deb is hard" etc.


Docker-style containers only shield you from mixed-library situations eg. where one "service" needs version A and another needs version B. It's painful because this situation is entirely accidental and could be trivially solved by linking everything statically. Then once you have two or more containers up and running, these libraries still need security updates and other fixes which is the entire point of shared libs in the first place. Basically, Docker is not a solution but part of the problem. But with Docker comes a ton of constraints; for example, basic file access and anything related to permissions is a PITA. And you need overreaching orchestration software such as mesos/marathon, k8s, or openshift. It's not that these tools are bad, but rather that they're nuclear weapons for relative little benefit, requiring expensive devops experts. All so you can pack multiple services on a single physical host (which would be much easier using classical service runtimes), and so your HTTP services (because everything has to be HTTP because of the firewalls) can all listen on virtual port 80/8080.


> It's painful because this situation is entirely accidental and could be trivially solved by linking everything statically.

It could and historically has been solved even without linking everything statically, it's just that Linux people have some shared delusion that a stable and consistent base system is bad, there should be no distinction between system and application, and having two copies of a library is an unforgivable sin. Consequently we get the nightmare complexity and inflexibility of the package management scheme to do things we did in DOS on a 286 with no special software at all.


Containers have more than just filesystem namespacing.

You also get a network namespace, process namespace, user namespace and ability to set cpu/memory limits.

Since you would have a container for each application, you have an easier time setting and testing restrictive apparmor/selinux capabilities and even get some hardening out of the box.

Sure you could get them without containers, but the whole benefit is doing it in a standard, easy way.


I think this is a very helpful hypothetical journey to explain why containers are so useful:

"You Could Have Invented Container Runtimes: An Explanatory Fantasy" https://medium.com/@gtrevorjay/you-could-have-invented-conta...


> I don't understand all the hatred towards containers. It's simply an OS abstraction to facilitate isolation.

Containers on their own aren't enough to deploy a reliable production system. Even on a small scale. To use them for reals you need a lot of domain knowledge and the support of a large ecosystem like Docker. And even then it's easy to get stuck in a rats nest of brittle tooling.

I don't have anything against containers. I'm just tired the exaggerated "ease of use" claims (eg: I talk about container management being a non-trivial problem and people telling me "just user Kubernetes!").


Docker ruined containers. I've been using LXC exclusively on my server and it's much better than Docker, especially since I don't have to retool and it behaves like any other server while still being a container. If all else fails, I can easily bootstrap it into a virtual machine (basically, install the same distro on a virtual machine, tar up the container and dump the contents at the root, adjust networking configuration, done)


Jails and Zones did their jobs before LXC was a thing. The better technology doesn’t always win out — see Windows history.


Jails and Zones are inferior to Docker overall, the advantage of Docker is not the isolation technology but the whole workflow of using those containers.


You can get a similar workflow with Jails. Docker became popular because of the CLI but there are many similar options now on FreeBSD.


> Docker became popular because of the CLI

Docker became popular because of a ubiquitous CLI. Having good UX is only part of the battle: the massive network effect (and associated tooling boost/community support) is significant and should not be discounted.

I'm sure there are similar things to Docker's all-in-one UX that have emerged on the BSDs. I'm also sure there are programs to configure Dropbox using an SCM and FTP links in a fast, seamless way on par with the Dropbox setup flow, but they aren't going to take off, because the predecessor is not only convenient, but is also pre-existing with massive popularity.

The same is true for Docker:containers. Some competitor may replace them in time, but whoever they are, "as good as" is not going to be their value prop.


Jails, maybe. Zones, definitely not.


Zones are much more developed than jails, but not quite as flexible as lxc.


I did not know that, I thought lxc was behind zones in terms of features/flexibility.


From what I understand, Linux let’s you configure independent namespaces for network, disk, CPI, etc separately. Though sure entirely what lxc allows you to configure. Whereas zones have a more one-to-one mapping of namespaces so it’s easier to secure. Just what I picked up from using Joynet’s Triton system. I really enjoyed the zones interface on smartos, decent JSON api and cli tools, built from the ground up on ZFS datasets. Jails are terrible in comparison by not really having a good standard api interface, imho.


What other unikernel companies did they buy?


Unikernel Systems, which is cool because it is OCaml based. Their stuff is open source too:

https://mirage.io


What I would love more than really any other tech improvement would be a “trusted” mirageos hardware/software environment which would remotely attest to its integrity and the degree of hardware protection it gave running programs. Ie a regular dedicated server, a VM, or a HSM attested to by some authority.


It sounds like you'd like Intel SGX. However, it seems Intel botched the implementation and their SGX VMs are leaking data through a number of attack vectors.


Haha I did a startup based on SGX. I will never trust Intel again.


A few silly questions from someone who comes from containers (Docker, specifically).

1. Can I "run" and "manage" unikernels on my development machine like I would a Docker image? (Use case: keeping build and deploy environments identical)

2. What's the workflow for inspecting running containers?

3. Is it possible to work in languages other than C++?


I just found about OpenRISC https://fr.wikipedia.org/wiki/OpenRISC

I'm wondering how this is performing against the raspberry SOC chip. I'm not sure what is the most expensive part, the licensing or fabricating the chip.


Every time I see this project pop up I think about how awesome an idea it is. Not sure about the practical side... but the project is very cool indeed.


Cool! Can this be used with languages other than C++?


Lua and Micropython have been ported - mostly as PoC. I expect us to do some proper ports next years, likely Node and Python. C support is OK, but it should be fairly simple to expand it as MUSL provides IncludeOS with the relevant bits of POSIX as long as IncludeOS can provide the underlying syscalls.


Appears to be C++ only based on my reading.

Here's a nice list of unikernel projects, some with more portability in mind than others: https://github.com/cetic/unikernels#existing-projects


Looks nice, thank. There is github.com/solo-io/unik as well. Had no time to check it out yet.


Unik is pretty nice, but it's basically a wrapper for other unikernels. I use it for microservice deployments every now and then. It does make packaging Java for OSv a lot simpler, however. For a more language-agnostic unikernel that can run on bare hardware, rump/rumprun has been the de facto baseline for me.


i have to say that unikernels needs proper toolchain to be used widely.


But, in theory, you can embed V8 and run Javascript on top


runtime.js - javascript library operating system for the cloud

It's built on V8 JavaScript engine and uses event-driven and non-blocking I/O model inspired by Node.js. At the moment KVM is the only supported hypervisor.

http://runtimejs.org/


Was also curious if you have write your application in C.


And how about Rust?


It seems that hypervisors have developed a whole new level of OS API/ABI, sort of like the POSIX specification.


Except there is no common specification, and there's not going to be one either because "standards are for loosers". No meaningful standard has been developed in this decade. It's almost as if the situation with two or more super-stable, compatible POSIX O/Ss, compiler suites, databases, etc. we had was too good to be true.


IncludeOS can support most of POSIX. It uses Musl, which provides a full POSIX implementation for the system calls that are implemented.

Obviously things like fork() won't work.


> Obviously things like fork() won't work.

And that's a pretty tough constraint in non-GCd environments, as now you can't rely on a host O/Ss process resource management. Manual memory management in C and C++ apps and long-running services make all the difference in terms of development complexity because of memory fragmentation, async (where memory can be required/released at any time), etc.


Good riddance, I say. The cloud and programming in general could stand to take some pages from the embedded programmers cookbooks, security in general will improve. Plenty of programs are written such that they allocate all the memory ever needed up front, and that's it.


fork() requires support for multiple processes. IncludeOS only has one process, so a fork() doesn't make sense there.

Also, I should point out that relying on fork/exec to clean up long running processes sound like something of a hack. Would you really like to have a system that is required to restart itself ever to often as a strategy to manage memory?

We've just spent quite a bit of time implementing a buddy allocator to combat memory fragmentation on long running systems. It seems to do a good job and I believe it is more or less a solved problem.


I don't agree multiple processes are hackish at all. CPUs/MMUs have dedicated hardware for managing memory, O/Ss have tools to limit and monitor memory usage, and separate memory regions/segments can be very effective and simple means for security isolation (save for meltdown/rowhammer-style attacks). I know this could be used as a general argument against any innovation, but single-process-per-requests have worked well since inetd (4.3BSD in 1977). Considering hardware has improved orders-of-magnitudes since, what is the advantage of doing the same with very restricted programming models, especially when developer costs are dominating new service developments?


I'm not saying multiple processes are a hack, just relying on fork/exec to clean up memory. Safeguarding against memory fragmentation is imho a solved problem - or at least solvable if you apply known methods.

Other than that I think you are right. Unless IncludeOS can prove that it can offer something akin to the same level of convenience as current Linux-based systems have I think it'll be hard for the OS to get traction.

The upside would be stronger isolation and a system that would be a lot harder to break into. The absence of system calls makes most attacks very, very inconvenient and the absence of self-modification functionality is strengthening it further.


You present good points, but with IncludeOS you can simply liveupdate it to itself in a few milliseconds to get a pristine heap, and perhaps a few fixes as well. Won't count as downtime.


Well, type 1 hypervisors are the OS.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: