Hacker News new | past | comments | ask | show | jobs | submit login
Gone Full Unikernel (deferpanic.com)
179 points by deferpanic on June 16, 2016 | hide | past | web | favorite | 92 comments



It seems this article gleefully admits many of the downsides of unikernels mentioned in https://www.joyent.com/blog/unikernels-are-unfit-for-product..., while being very brief and naive about the upsides (mainly the very contested security argument).

I admittedly haven't studied the whole unikernel space yet, but intuitively they do seem unfit for production unless we spend a decade rebuilding tooling (debuggers, process diagnostics tools, etc.). And even then, other downsides apply, as laid out in the Joyent article.

Happy to change my mind over time if it proves to be the other way around, but for now I'm very skeptical.


I wouldn't say that unikernels were entirely undebuggable. I spent a few hours hacking and came up with a proof of concept dom0 profiler, and learned some debugging benefits: one symbol table for the entire binary, one place to turn on frame pointers for everything, etc.

http://www.brendangregg.com/blog/2016-01-27/unikernel-profil...


That requires having access to dom0, which is not available on Amazon EC2 (for good reason). Running in UNIX binary mode in a container on bare metal seems like the way to go here. That does not have the overhead of hardware virtualization. That means no internal memory fragmentation from partitioning RAM, the ability to avoid duplication of cache and any hypervisor upcalls become normal system calls. At that point, the unikernel is just a normal UNIX process that can be debugged with conventional tools.


We already get hypervisor statistics from dom0, via cloud watch. Could that include on-demand profiling as well? I don't see why not. And that's not the only way to solve this.

So work needs to be done to make unikernels profileable & debuggable. I wouldn't claim that this was impossible.


I do not dispute that it is possible to profile/debug unikernels on future cloud infrastructure. However, I am skeptical that unikernels offer any benefit to merit the work of enabling that when the whole system is considered.

System calls might be additional overhead when there is a hypervisor, but hypervisors are unnecessary when we have containers. You stand to eliminate much more overhead from eliminating the hypervisor than you stand from eliminating the syscalls. Some of that overhead is internal fragmentation from memory partitioning, duplication of driver code, potential double caching, etcetera.

The industry is in the early stages of a transition from hardware virtualization to containers because containers are a better abstraction than hardware virtualization. Joyent offers Illumos zones, Swiscomm offers docker containers with flocker (full disclosure: my employer is the author of flocker), Microsoft has deployed drawbridge on their Azure cloud, etcetera. We will only see more of this in the future.

Once the transition is complete, I see no advantage to unikernels. You could use them in UNIX binary mode, but that makes them little more than a standard process on a traditional system. That is a very different role than the one that their creators intended for them.


Price/performance always wins.

Can an application unikernel on a hypervisor (which is really a lightweight OS that nowadays supports many pass-through features) beat performance of an application on a regular OS? (I didn't mention containers, since they should ultimately be irrelevant to hot path performance). So can it? With a lot of work, I bet they can.

So who has has the better price/performance? That's going to depend on how much engineering work it is to adopt, fix, and use unikernels, when they are competing with an established ecosystem around Linux and containers. And that may be where unikernels actually loses on price/performance, where price includes total cost of ownership. We'll see!


I suspect that the performance of a unikernel on a hypervisor vs an application on a POSIX system is somewhat analogous to the performance of hardware RAID vs ZFS. The performance of the abstraction used in the former is inherently worse than the performance of the abstraction used in the latter. The example of memory partitioning gives the latter a price/performance advantage and it is not the only one. I would be happy to be proven wrong though.


Strawman. Lets talk about Xen and Unikernels. So you said:

> You stand to eliminate much more overhead from eliminating the hypervisor than you stand from eliminating the syscalls.

So my hypervisor application talks directly to devices, thanks to pass-through. What's that about syscalls again?


It is not just syscalls. You either have inefficiency from double caching or inefficiency from a lack of a global page replacement algorithm. You also have internal fragmentation from memory partitioning, which prevents you from running as many applications and/or reduces memory available for cache. I consider these to be fundamental disadvantages.


> You either have inefficiency from double caching

No double caching: most of our applications have an in-memory working set, and no disk state. Some do (eg, Cassandra databases).

> from a lack of a global page replacement algorithm

Oh, so it's more efficient to be running where paging (aka swapping) is allowed? Sure, for memory footprint, but for runtime performance you're banking on it reaching a state where paging is minimal. The amount of memory saved is depending on the working set, maybe a lot, maybe a little. One downside is you're paying a small CPU tax to manage this (maintaining kswapd lists, and scanning them).

I think this would sometimes be a benefit, and sometimes not. And if not, is there anything stopping a Unikernel -- which must manage its own memory anyway -- from implementing its own pager?

The inefficiency isn't technical resources, but human resources: having Unikernel engineers reinvent what modern kernels already do.

> You also have internal fragmentation from memory partitioning, which prevents you from running as many applications and/or reduces memory available for cache.

Again, usually no file system cache in use. And most apps are started with a fixed heap size that consumes all of memory. There's no left-over/wasted memory that could be used by other apps.

If you want to page out cold memory to make room, uh, sure, but see previous comment. I bet that sometimes works, and sometimes doesn't.


> No double caching: most of our applications have an in-memory working set, and no disk state. Some do (eg, Cassandra databases).

I see no disadvantage for unikernel on hypervisor setups versus applications on a container host setups in applications where there is no disk state. However, I see no advantage either. The techniques used to talk to hardware directly work in userland too. netmap is a fantastic example of this.

I had expected unikernels on hypervisors to have a disadvantage against a container on a traditional kernel, but after reading your remarks, I think that the two ought to perform identically (at least where there is no file system IO), with neither being theoretically better. However, the world is adopting containers in traditional kernels and unless a unikernel on a hypervisor can be better, I do not see much value in devoting resources to unikernels too.

> Oh, so it's more efficient to be running where paging (aka swapping) is allowed? Sure, for memory footprint, but for runtime performance you're banking on it reaching a state where paging is minimal. The amount of memory saved is depending on the working set, maybe a lot, maybe a little. One downside is you're paying a small CPU tax to manage this (maintaining kswapd lists, and scanning them).

I was referencing cache efficiency when I talked about page replacement algorithms rather than paging to disk. Imagine a global ARC algorithm in a traditional system versus each unikernel having its own. The global hit rate would be better with a global algorithm than it would be with a local algorithm in each unikernel.

Even if your application does its own cache, the principle of a global algorithm being best ought to apply to filesystem metadata.

> Again, usually no file system cache in use. And most apps are started with a fixed heap size that consumes all of memory. There's no left-over/wasted memory that could be used by other apps.

This is not the sort of application that I had in mind. I am still skeptical that unikernels are better, but I agree that they are not worse here. In this case, it seems to me that they are (theoretically) just a different way of doing things and are not better or worse.


> However, the world is adopting containers in traditional kernels and unless a unikernel on a hypervisor can be better, I do not see much value in devoting resources to unikernels too.

Unikernels allow more experimentation. The interface that the hypervizor provides is generally lower level (especially with hardware passthrough) than the traditional operating system's interface.

Unikernel 'programs' would normally use a library as an abstraction layer to bridge the gap. These libraries are easier to swap and change and experiment with than traditional OSs. (At least that was the whole justification for exokernels in the 90s, the approach our current hypervisors grew out of.)


That is nice, but it is very different than the statements of unikernels being more secure from being smaller and being more performant that I have heard lately. The more I learn about unikernels, the more I think these are false promises.

By the way, I am a fan of rumpkernels, which also offer the ability to do experimentation. Rumprun is apparently a unikernel design, while rump kernels are building blocks. Rump kernels need not be used in unikernels. They can be used in whatever you want them to be, with unikernels being one place that they can go.


Hypervisors offer decent security and performance guarantees, which means they are good for sharing resources among potentially hostile customers. Their simple resource semantics and small ABI makes for a fairly secure abstraction.


Kernels do as well. Both have had security exploits that lead to privilege escalation. As containerization matures, I expect the security of a container host to become similar to that of a hypervisor. They are essentially doing the same thing. The only place on which they differ is the kind of abstraction that they use.

LPARs/LDOMs are a much more secure abstraction for "sharing resources among potentially hostile customers". Those physically partition at the hardware. LPARs are used on the IBM mainframes and are "EAL5 Certified". LDOMs are the SPARC equivalent, but I do not know their EAL. Both traditional kernels and various hypervisors are EAL 4 (some are called EAL4+), which is not as secure.


I don't think kernels are inherently less secure than hypervisors, but as they stand, current hypervisor implementations have a better security track record than kernels. The basic point that I am trying to make is that both hypervisors and kernels are just pieces of software meant for partitioning and sharing hardware. Software that has simpler and smaller interfaces also has a lower probability of having bugs that lead to vulnerabilities. I agree that that there are better hardware partitioning implementations out there but unfortunately they are not so popular. I am looking forward to having formally verified kernels like seL4 become more popular.


Kernels usually provide quite a lot of abstraction in addition to secure partitioning and sharing. And that's arguably wrong: providing abstractions is complicated (thus inherently less secure), and one size does not fit all.

In a unikernel setup abstractions can live much more comfortably in libraries.


IBM terminology might be confusing me - but looking at published security targets it appears LPAR's themselves have only ever been evaluated at EAL4 with flaw remediation (ALC 2) and PR/SM being evaluated at EAL 5 but neither to any specific protection profile. This means that IBM created their own evaluations and gave themselves a "certification".

Protection profile less CC evaluations are worthless in the eyes of most governments and CC schemes, but kudos to IBM product management and marketing for creating competitive FUD.

As of a year ago LDOM's (Oracle VM for SPARC) hasn't had a CC evaluation and I'm not seeing anything currently in evaluation. Solaris Zones have been evaluated under the Solaris OSPP EAL4 + extensions evaluation.

The biggest reason that virtualization technologies haven't had a CC evaluation with a protection profile is that no US NIAP approved protection profile existed and the draft ones that were circulated were crap.

Assurance levels (EAL) are deprecated for newest NIAP protection profiles as the higher assurance levels (EAL4) were cost and time prohibited for vendors to complete before the product was outdated. Many people wrongly think common criteria is a security evaluation (free of bugs) - it's not - it's a security architecture evaluation (is the documented behavior working correctly).

There is a schism in CC - everything is changing - anything we know today is wrong and will change.

TL;DR: Common Criteria is a joke and doesn't actual mean what you think it does.


There is nothing stopping people from creating a unikernel for a dynamic language that also includes the development tools.

A Lisp Machine on Xen would be one model.


I feel like Erlang-based unikernels are an extremely compelling alternative to traditional UNIX deployments. Immutable systems with safe hot swap and excellent debugging tools like `observer` and `debugger`.



Erlang on Xen is already that way. You can use the full Erlang profiling/tracing/debugging/observing toolkit on an EoX node.


That's great, but with processes (containerized or not), I can use the full variety of standard diagnostic tools built over many decades. Of course you can start building similar tools for unikernels, but it's gonna take you a long time before you get to a similar state.


To counter the OP, you'd have to use an existing profiler.


See the discussion about that post over at https://news.ycombinator.com/item?id=10953766


Yep, I did. It still doesn't really change my view, especially on the front of operatability/debuggability. If you have ever operated really large services, you know how big of a problem the lack of that is.


The good news is, for the average user, unikernels are pretty much guaranteed to be mainstream and streamlined at some stage in the future, thanks to Docker acquiring Unikernel Systems, and the awesome work that the like of deferpanic are doing. :D

[1]: https://blog.docker.com/2016/01/unikernel/

[2]: http://www.linuxjournal.com/content/unikernels-docker-and-wh...


I'm with you - Docker has done an excellent job of showing application developers the minimum they need for a runtime.

Side-thought: Can Android be dockerized?


> Can Android be dockerized?

Why? Aren't Android apps already sufficiently sandboxed?


Well you'd think it's not really necessary to run Android in a VM. The Android x86 project should make it easy to run atop an existing Linux in a container.

Just as an aside: Docker doesn't seem very security-focused, I would not [yet] count on its containers being properly sandboxed. :>


They're pretty sandboxed, but every Android app gets a JVM, no exceptions allowed (even Android's pure C++ API is just a wrapper around JNI calls). And in the case of the old Dalvik VM, it's a terrible JVM.


> They're pretty sandboxed, but every Android app gets a JVM, no exceptions allowed (even Android's pure C++ API is just a wrapper around JNI calls). And in the case of the old Dalvik VM, it's a terrible JVM.

Wasn't Dalvik deprecated & replaced by ART (Andriod Runtime)? ART compiles apps AoT - IIRC; upon installation pre-Marshmallow, and while charging/idle Marshmallow going forward


Yep, hence me specifying the old one. ART cleans up a lot of things: no more 8-16KB main thread stacks, better code gen via AoT, fully precise collection, moving GC. It has some really ingenious features as well, like switching to a compacting GC with better throughput and space efficiency when an app goes in the background and latency is irrelevant. The switch to ART as default was actually in 5.0 (Lollipop).

It's almost enough to make me stop cursing Android developers and their children's children. Unfortunately version updates for non-Google devices are rare and everybody is still stuck supporting the majority of devices that are pre-Lollipop. Also it didn't make the APIs any better >:(


I can't help but think this is just a severe reaction to the tire fire that most Linux distros are, especially RedHat/CentOS and Ubuntu. BSD or Alpine Linux get in the way a lot less, are much more customization and compact, and have a smaller attack surface while still catering to production operations where you can run shells, profiling, logging, etc inside the execution environment.


There's more to unikernels than just being another virtualization technology. Most of the conversation on HN (as well as the content of this article) seems centered around unikernels vs. containers vs. a traditional OS in a VM, etc. But that conversation sort of misses the point.

Rather than just being a competing virtualization solution, "Unikernels" are really about eschewing the existing OS paradigm altogether. For example, the Mirage folks seem to have asked themselves about how they could create a "safe" OS and landed on the solution that they could achieve that by trusting the OCaml compiler and runtime for "safety" and so wrote a brand new OS from scratch in OCaml. That is a very different thing than a reaction to the "tire fire" that you are describing!

Similarly, for rump kernels Antti Kantee (with the help of others I presume) took several years to re-architect the NetBSD kernel to minimize the inter-dependency of different components of the kernel through the creation of a "hypercall" interface[0] and a carefully thought out separation of concerns.[1] One of the end results of this architecture is that you can run NetBSD drivers outside of the NetBSD kernel "just" by implementing the rumpkernel hypercall interface. Want to write your own OS (in a "safe" language language like OCaml for instance) but don't want to write a tcp stack or a filesystem implementation or USB driver from scratch? Rumpkernels could be an solution to that problem. Again, that is a very different problem space than the "tire fire".

[0]: http://netbsd.gw.com/cgi-bin/man-cgi?rumpuser++NetBSD-curren... [1]: http://lib.tkk.fi/Diss/2012/isbn9789526049175/isbn9789526049...


The Mirage folks didn't discover anything new in that regard.

It is how the safe OS from Burroughs, DEC, Xerox Parc, ETHZ and many others used to work.

Those OSes were written in strong typed systems programming languages, the whole stack.

Part of their security was based on the language type system.


Fair enough! I definitely wasn't trying to suggest that is a completely new feature, nor that it is the only feature of Mirage OS ... really was just trying to make the point that there is more to the unikernel story than just figuring out whether it is better or worse for running my buggy crud application than some other virtualization technique. Thank you for the info though, I will have to read up on those things you mentioned.


You can find some links to those systems here

https://news.ycombinator.com/item?id=11856479


I'd rather say it's a response to technical advancements in virtualisation. You want an app that can talk to other things, but is otherwise completely isolated as far as crashes and exploitation goes. We wanted that before protected memory was a thing. We wanted that when networks happened. We wanted that when selinux was created. etc. etc.

This is just the next step. I've got an app which needs communication channels and possibly persistent storage - isolate everything else. This is what unikernels provide. If it gets rid of some of the redundant system parts is just a cherry on top.


The problem is you reinvent kernels when doing this before too long. Containerization is a reaction to virtualization being too expensive - and unikernels are still pulling in huge amounts of redundant code and runtime compared to containers where the kernel af least is shared.


I'm not sure that's a bad thing. If you're running only one app, there's a lot of things you don't need. No process groups, no scheduling hierarchies, no user privilege checks, likely no filesystem caching (maybe even no filesystem?), no legacy device handling, no terminals. We're kind of going towards replacing the big kernel with a posix-to-virtio layer already, and it may not be a terrible idea.


Same arguments as those in favour of exokernels in the 90s.


> Try 5, 10, 20 megabyte small.

OpenWRT/LEDE will happily work on a system with 4MB of storage:

https://www.lede-project.org

QNX had a graphical environment, a web browser, a web server, a text editor, image viewer, various games, a package manager, etcetera on a 1.44MB floppy:

http://m.youtube.com/watch?v=K_VlI6IBEJ0

Less is definitely more, but you do not need a Unikernel to achieve such sizes and you lose observably by going with a Unikernel. If something goes wrong with your application such as it becoming non-responsive, you need to attach gdb or get a core dump like a kernel developer would to understand what happened. Your production systems that are likely EC2 instances that lack such functionality, which means debugging is much harder with a unikernels than it would have been with a monolithic, hybrid or micro kernel. Furthermore, disk space is cheap, which is why few opt for OpenWRT/LEDE over more full featured Linux distributions in datacenters.

If you want the experience of a single address space and little more code than your application, you could run FreeDOS, which also fits on a floppy and has a code base that is mature. There are guides for doing this online. Here is one for doing a web server:

http://www.instructables.com/id/Retro-dos-web-server/?ALLSTE...

The world moved away from such designs because the observability and stability were awful. We might have "safe" languages now that improve stability of the application, but those could just run as a process in an environment where proper debugging can be done when something goes wrong. The few percentage points of performance that you get from eliminating the mechanisms that enable you to understand what went wrong do not out justify discarding them.

Also, you lose the advantage of a shared memory pool with unikernels, which are generally intended to run in VMs. Partitioning memory in VMs causes internal fragmentation, which artificially lowers the density of applications per machine. It also can lower block IO efficiency from double caching between the host and guest. Hardware virtualization is a useful technology, but it is an inefficiency that we need to eliminate with containers, rather than one that we should to embrace with unikernels.


> The world moved away from such designs because the observability and stability were awful. We might have "safe" languages now that improve stability of the application, but those could just run as a process in an environment where proper debugging can be done when something goes wrong. The few percentage points of performance that you get from eliminating the mechanisms that enable you to understand what went wrong when things go wrong does not out justify discarding them.

I think there is a lot of design space here that is unexplored, so I'm not so sure it is as clear cut as you say. You might like this talk given earlier this year at Compose Conference, entitled "Composing Network Operating Systems" (I was a speaker at Compose and I <3'd this talk a lot.)

https://www.youtube.com/watch?v=uXt4a_46qZ0

It is not just about performance in all cases. Mirage is the particular case in question here - but with OCaml functors, it becomes possible to compose components of kernel in truly modular ways. I was continuously surprised by this talk.

Something that needs to write to a block device only needs an abstract functor describing the interface to the device and some primitives to read or write to it. There are many implementations of this interface.

This seems quite obvious but it allows powerful ideas. For example, in the talk, you can see examples similar to this. But what if you want to test your kernel? You can simply substitute in a new implementation that has failure modes. You can write a block device that randomly ignores every 100th write; one that has unexpectedly high latencies, one that outright hangs on all I/O requests... Doing this kind of fault injection today is possible, but it's conceptually a lot nicer if it's just a "Mock" at the "block device" level that you can easily control and extend. You can do all kinds of other things; like have your system timer freak out, skew in random ways, run in reverse.

You mention observability, but when your systems are truly modular, this is nothing more than an obvious follow up. An example in the talk is interposing "Irmin", which is a distributed, Git-esque storagre system, into the network subsystem of your kernel driver. Any time interface properties of the device change, you write entries into the append-only Irmin log which are distributed. Irmin also has a git interface for read-only analysis.

The short story is that means in the talk, there is a live example where you can query a git repository to get a read-only changelog of all the networking state in your application. In the particular example, I believe it was interposed into the ARP implementation; every ARP packet and ARP response was logged into Irmin, and every system change propagated as a result was logged too. This gives you really amazing levels of persistent analysis and introspection with very low developer cost. It's true you could do something similar in a system today; but this is truly modular, works for any application built to use a particular Functorised-API, etc. It's a programming interface! And in theory there's also nothing stopping conventional tools like `ocamldebug` from working either.

Mirage also abstracts over the true underlying runtime. So that same device API can be switched with one that just talks to a POSIX-compliant filesystem, you get an ELF executable, etc. This all works on normal systems too; Unikernels are merely a different deployment target (for the most part).

This is not to say that Unikernels are the future or we should abandon our stable systems we have now (I definitely won't be doing so anytime in the future). But I found myself very surprised at what was quite easily possible, and I wouldn't so quickly write it all off as a fad. Maybe for Huge Enterprise, yeah... Operations experience separate from development is very useful, and a lot easier to find. But there's definitely some really cool uses for these things, especially in helping rethink and improve on some previous ideas.

> Also, you lose the advantage of a shared memory pool with universals, which run in VMs. Partitioning memory in VMs causes internal fragmentation, which lowers densities. It also can cause double caching between the host and guest, which lowers block IO efficiency.

This is a good point that's often overlooked. But I don't look to Unikernels for outright performance, either; to me, they are more interesting for researching newer operating system designs with a much better ROI than previous methods. I'm glad to see that happening, personally. And I might even take a performance loss if it meant winning some other guarantees in return.


Unikernel proponents seem to assume that hardware virtualization will forever be the abstraction of cloud computing. However, hardware virtualization is the wrong abstraction, which is why the industry is beginning to adopt containers. There is no reason why you cannot run a unikernel in UNIX binary mode inside a container, but then it is really just a different way of developing a userland process rather than a unikernel. You definitely could still call it a unikernel. You would get the advantages of modularity that you specified and you would have all of the debuggability and observability that regular applications have today with the tools that we have today. However, that is rather different than the role in which they are intended to operate.

I guess my point is that the unikernel is always going to be the equivalent of a userland process. The question is whether your bare-metal kernel is going to be a traditional one or a hypervisor. They have definite performance advantages over a traditional kernel when your bare metal kernel is a hypervisor, but I believe that is the wrong abstraction when I consider overhead.


>> Try 5, 10, 20 megabyte small.

> OpenWRT/LEDE will happily work on a system with 4MB of storage:

> https://www.lede-project.org

> QNX had a graphical environment, a web browser, a web server, a text editor, image viewer, various games, a package manager, etcetera on a 1.44MB floppy

This is all true, but you skipped the sentence following the one you quoted: "Depending on your needs, you can go down into the kilobyte range - and that's not just the app - that's everything".


A microkernel could go into the kilobyte range too. seL4 is definitely in that range. The smallest Linux kernel bzImage that I ever compiled was something like 570KB, so embedded Linux might be able to reach that range too. That would of course include an application. The QNX demo likely could reach such sizes too if most of the things in it were removed. For those that are unaware, QNX is a microkernel based system.


seL4 doesn't provide very much to the programs, does it? I think it closer in spirit to Xen than to the linux kernel?


The Xen hypervisor uses a microkernel architecture. A kernel could implement a userland (e.g. traditional UNIX), a VM interface (e.g. KVM) or nothing at all (e.g. a unikernel). My point is that unikernels do not have a monopoly on small sizes and they are not worth mentioning as an advantage.


No, when your service that's running on 10k machines do not behave as expected, you do not attach a debugger to it nor dump a core.

UNIX is a great OS to share a machine amongst many users and many programs. It's not that great when your app is made of thousands of asynchronous programs. Last decade tools are to be reinvented regardless of microkernels.


An oft-overlooked advantage is the ability to specify a holistic system with a fine-grain of accuracy.

With a monolithic kernel in the way you have to make some black-box concessions.


Any kind of modularity will result in people treating modules as black boxes, even when they are not. If you use a monokernel/hybrid that is OSS, then there might be plenty of code and you might treat it as a black box, but it is not a black box. If you use a microkernel, then the amount of code is likely even less. Microkernels exist that are formally verified and while unikernels cannot be formally verified without formally verifying the application that is a part of them. Formally verifying an application would be awesome, but few would ever do that.


Tell me if I'm missing something, but the premise of Unikernels seems to be that a ring-0 x86 hardware environment is the perfect fit for a universal container/host interface.

Or to put it more charitably, since cloud compute services are based around booting VM images based on this model, we'll just go with it instead of trying to use an abstraction that is actually designed for this.

Correct me if I'm wrong, but it seems to me that the first thing any unikernel is going to do when it boots is switch the (virtualized) CPU out of x86 Real Mode (which all x86 machines boot into for legacy reasons, but virtually no one has needed since circa 1995) into protected mode.

Is it just me or does this seem a little bit crazy?


The premise of unikernels is that:

1. The job of an OS is to ensure that multiple programs can run on a single box without interfering with each other.

2. The job of a hypervisor is to ensure that multiple OSes can run on a physical box without interfering with each other.

3. In many cloud deployments, a single VM instance only runs a single user-defined program, which is programmed to a higher-level runtime than the OS (eg. Node.js, JVM, Rails/Django, SQL).

4. Why do we need #1 then?

IMHO, the real interesting stuff happens when you start re-implementing the APIs that we actually program to, without the OS. For example, what if:

1. You could take any command-line ELF executable and build an AMI out of it. This AMI would have an HTTP interface that only accepted connections from certain security groups. It would take in the command-line args via query params, and let you construct a virtual filesystem containing only the files you operate on via request body. Imagine say a compile server that runs Clang on user-defined code and serves the executable back, to be run on its own VM. And the crucial part is - there is no persistent storage on the box, nor any code that would be worth attacking. If there's a bug in the executable and an attacker pwns the box, the worst he can do is corrupt the request. There is no shell. There is no filesystem. There is no TCP stack to make outgoing connections with.

2. You could re-implement Node.js for stateless webservers. Again, you'd have no filesystem; once the initial program starts, it's guaranteed to never touch disk, since it has no disk access. Node does its own scheduling, and this way Node's scheduler doesn't need to fight the OS scheduler. You could store preformatted HTTP packets or response fragments in read-only memory pages and send them out directly via RDMA.

3. You could do a database or search engine that bypasses the filesystem entirely, instead writing directly to raw disk blocks. It can choose these disk blocks based on locality, since it knows the particular index structure and access pattern for the data, and doesn't have to fight the OS's attempts to hide the disk blocks under a file abstraction.

The point of unikernels is to take away stuff - it's not about which mode the CPU boots into, it's about removing all the code that is on a typical cloud computing image but has nothing to do with the job the instance is actually doing. All of this - shell, filesystem, DNS resolvers, etc. - is attack surface for a potential hacker, and it's often overhead when processing.


I tried to read up on this, but I'm not too familiar with the terminology. Are unikernels the formal name for the idea of running your application 'bare metal'?

In the parent post, does AMI mean Amazon Machine Image, or some Application M____ Interface?


Yeah, "running your application bare metal" is a useful first approximation. Technically, they consist of the toolchain and libraries necessary to replace OS functionality with userspace library calls, which then run on the bare metal. (Or technically, in any practical deployment they would run on a hypervisor, which presents an interface that looks like bare metal.) MirageOS, one of the first unikernel designs, works by statically analyzing an Ocaml program to identify OS calls and then only linking in the libraries required to support those particular calls, all of which have been re-implemented from the ground up for security.

Right now, much of the research on unikernels focuses on implementing a POSIX API. In other words, it replaces libc so that instead of eg. write() making a syscall into a kernel, write() inlines the code that the kernel would've run and talks directly to the hardware.

IMHO, the real wins for unikernels come when they start implementing higher-level interfaces, eg. Node or Rails or Django or HTTP or SQL or the JVM. Many programs are already written to these frameworks, with no knowledge of (or in some cases, access to) the underlying POSIX APIs, and the frameworks themselves often re-implement a large portion of the OS to create better domain-specific abstractions. Node or Python's asyncio, for example, implement their own schedulers that each run inside a single OS thread. Databases work in terms of pages, built on top of a filesystem; they effectively try to recreate the abstraction of a block device on top of a stream on top of a real block device. Websites often have large quantities of text that are sent back with every request (think of page layout in a templating engine, or JS bundles for a SPA). This data is usually copied and concatenated multiple times within a framework, while a bare-metal-aware web framework would store it in a buffer somewhere and write it out directly to the network card.

And yes, I meant Amazon Machine Image. Doesn't have to be Amazon, but I'm focused on the pragmatics of how you might deploy a real unikernel to solve problems, and wanted to make the point that you're going to be loading it into Xen or some other cloud hypervisor at the end.


Some (relational) databases go to great lengths to recreate on top of a file system an interface that looks more like the block level storage that's underlying.

A unikernel can cut out the middle man here.


We've gone full circle. Originally there was shared hosting whiched hosted your app in ring 3 with other users on the same physical machine running which ran in ring 0. Then we got fancy virtualization hardware where the hypervisor ran in ring -1, your VM ran in ring 0 and your apps ran in ring 3. But that's a lot of indirection so micro kernels move your app into ring 0. So now we're basically back at shared hosting where your app runs one level higher than the host OS. Except now your app also bundles a partial OS and has weak debugging tools. It does have better isolation though then shared hosting so that's a plus.

But containers are basically the same thing but with better debug support and a more familiar OS environment. Problem is containers need to be deployed on metal to be effective, not VMs. Unfortunately not many providers do this yet.

So yeah it is all kinda crazy.


> Unfortunately not many providers do this yet.

Samsung just acquired Joyent, which provides multi-tenant container hosting on bare metal via Illumos and LX-branded zones. So to me, the acquisition further validates that approach.


Based on this blog post, unikernels also need a special hypervisor (presumably with femto-sized VMs with second-granularity billing) so you might as well run containers.


> Tell me if I'm missing something, but the premise of Unikernels seems to be that a ring-0 x86 hardware environment is the perfect fit for a universal container/host interface. > Or to put it more charitably, since cloud compute services are based around booting VM images based on this model, we'll just go with it instead of trying to use an abstraction that is actually designed for this.

That is my understanding of part of the premise of unikernels. Another is security from having less code, although nothing stops you from having less code with Linux. LEDE/OpenWRT are Linux distributions that are often smaller than the sizes that are advertised for unikernels.

I consider containers using syscalls on a kernel that operates on bare metal to be a better abstraction.

> Correct me if I'm wrong, but it seems to me that the first thing any unikernel is going to do when it boots is switch the (virtualized) CPU out of x86 Real Mode (which all x86 machines boot into for legacy reasons, but virtually no one has needed since circa 1995) into protected mode.

That is only on x86/amd64 systems. It is different on other architectures.

> Is it just me or does this seem a little bit crazy?

The more I learn about unikernels, the more skeptical I become of them.


Xen offers a couple of ways to load and start a kernel and you'd want to start a kernel in long or protected mode. Ideally, you also use the hypervisors virtual device interfaces instead of scanning for emulated devices. I don't know if the sane startup protocol is final but here are some pieces of documentation: http://xenbits.xen.org/docs/4.7-testing/misc/hvmlite.html http://xenbits.xen.org/docs/4.7-testing/misc/pvh.html

The nice thing about ring-0 is that on modern hardware with SR-IOV, a VM can be associated with devices and the multiplexing that had to be done within a kernel or hypervisor can now be done in hardware.


Yeah. The right thing is hosting providers offering Java application servers or equivalent. Unfortunately there are political and commercial reasons that's not happening.

Still, there are worse container/host interfaces. It could be the full suite of POSIX system calls.


> Still, there are worse container/host interfaces. It could be the full suite of POSIX system calls.

Why would a Java application server provided by the host be better than the full suite of POSIX system calls?


My sense is the set of "system calls" (i.e. `native` functions) is smaller and more rigorously specified.


I think the premise unikernel is that of a library or app as an OS.

I'm not following why you think turning on protected mode is crazy. Can you elaborate?


Yeah, but if you use containers you have to suffer the indignity of the creat() system call. These seem like the smallest possible objections.


You are assuming that our only options are "x86 hardware ring-0" or "Linux system call interface." Both are crufty in their own ways, but more importantly, neither of these was designed to be this. The right answer might be an interface that is designed with containerization in mind.


Can you explain why the creat() sys call lack dignity and how that relates to containers?


That was just a joke. I think haberman is spot on that both x86 and Unix are crufty in their own ways and thus cruftiness isn't a good metric to judge these abstractions on.


Oh hah, that's funny now I feel silly :)


The EMC reference is probably to UniK[0], which provides a Docker-compatible API and can be used under the supervision of Kubernetes or Cloud Foundry.

I was happy to see UniK, because I've long seen unikernels as an "architecture-buster" for Cloud Foundry. Yet in practice the shift to Diego made it much less painful than expected.

[0] https://www.cloudfoundry.org/unik-build-run-unikernels-with-...

Disclaimer: I work for Pivotal, the majority contributor of engineer to Cloud Foundry. EMC is a major shareholder in Pivotal.


I never understood why these people didn't bother at all upstreaming their Go port. I even offered myself to be the person responsible for the review. I would gladly do that.

I'm very happy about more Go ports, I've done the arm64 and the Solaris ports, and now I am finishing the sparc64 port, but ports need to live upstream.


If you want to help out we would be more than grateful for this - there's a lot of work involved.


I look forward for a bare metal runtime to become available, so that people see how close Go features are to Oberon and the system programming is actually possible, even if not all goodies from structure package or functions entry points can be controlled in the current //go:..... flags.


"Nowadays very few developers interact with actual real hardware - it’s been completely abstracted away."

And i think this is a very big problem, for them its some magic pizza box, and complaint when things don't perform the way they expect it to be.

Good read though thanks!


Can someone explain why these rump kernels can not be run on AWS if deferpanic has Xen as a target? AWS is Xen-based. I understand that there currently isn't a target for Docker so that takes Google Cloud out of the equation. The following two statement seem to be contradictory:

Can I use Google Cloud or AWS? You could - although you won’t write much more than a toy app - not until things are changed.

DeferPanic offers managed services for both public and private cloud environments and it's platform targets KVM, Xen, bare metal, and ESX.

Perhaps that falls under the "unfit"statement about these Cloud provider but that seem pretty nebulous for a such a technical discussion.


> I understand that there currently isn't a target for Docker so that takes Google Cloud out of the equation.

Google Compute Engine runs a lot more than just Docker images. It allows you to run arbitrary x86 VMs, just like EC2. It is not based on Xen, however (it is a combination of KVM and a non-QEMU VMM about which I wish I could say a whole lot more, but I don't think we're prepared to do that just now).


Right I believe that GCE is docker but it runs in a KVM container, I'm not sure why they do that however. Maybe someone else can explain? My guess would be that its a hedge on container security.

However what they hand you is a docker container I believe so provided there's docker target for whatever rump kernel it should theoretically just work. No?

It sounds like you work on GCE?


GCE is just plain old VMs, no Docker involved.

There's also GKE which is managed Kubernetes complete with Docker containers.

(And yes, I work on the virtual machine monitor backing GCE)


Unikernels run fine on AWS.

It's just that it's a bit fiddly to make it happen. My guess is that it's the fiddliness that Ian is suggesting is impractical.


What languages do you support for unikernel? Is it just Go, or do you plan to support others in the future (e.g. OCaml for MirageOS)


So this is kind of a two part question:

1) What languages right now - Go, php, javascript, ruby through the rumpkernel project - rumpkernel.org.

2) We are implementing support to support user supplied images which will let you run mostly anything in the very near future. We plan to be completely agnostic.


Not from deferpanic, but there's also rumprun for rust: https://gandro.github.io/2015/09/27/rust-on-rumprun/ (it's even easier these days - integrated into cargo target)


This whole movement seems strange to me. It's like these are statically linking the entire OS to run a single app. Why not ditch the OS completely? I'd say this is taking the whole container concept a bit far, but who knows what will come next!


If I'm not mistaken, the whole idea _is_ to ditch the OS completely. To avoid a fully functioning kernel with lots of juicy device drivers to exploit, code intended to work on systems your app will never need to worry about running on, and layers upon layers of abstraction ready and waiting to be exploited (e.g. shells).

One reason MirageOS uses OCaml, for instance, is for its memory safety properties. A truly staggering amount of vulnerabilities (e.g. Heartbleed) are due to abusing unintended ways of accessing memory in programs which face the public Internet. Since we've proven over and over again at this point that we can't reliably write safe C code, there's a reason folks are interested in eliminating as much of it as possible, all the way down to the hypervisor level. Since so many devices will be Internet-connected soon, having a way to write apps without even a possibility of "Oops" bugs like this is even more critical.


Interesting. I'll have to do a bit more research on this.


Traditional VMs suck. Containers a la Joyent and Unikernels are really two points on the same spectrum of distribution of complexity between host and client. Eventually they will converge, because neither POSIX nor (virtual) hardware are interfaces designed this purpose.

The one-language library-centric ideology of e.g. MirageOS especially is really orthogonal to questions of provisioning data centers. It is truly a huge step in the right direction, and before the unikernel-container convergence, could be applied to the host OS of a container rig.


Docker is investing a lot of time and energy on unikernel as well. Makes a lot of sense, but I agree with a lot of the comments here. It might be a year or so before we start to see faster adoption.


For debugging you can use qemu, and for "hypervisor level orchestration" you can use CloudFormations with AWS.

Maybe I'm missing something?


awesome sauce! why did you go (sorry :D) with Golang?

(garbage collection would seem to be an issue with a bare-metal language or?)

vs Rust...


Not OP, but deferpanic is pretty invested in the Go ecosystem according to their past HN posts.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: