Hacker News new | more | comments | ask | show | jobs | submit login
Harvey OS – A Fresh Take on Plan 9 (harvey-os.org)
226 points by antonkozlov on May 27, 2016 | hide | past | web | favorite | 79 comments

So, a funny thing happened on the way to Linux dominance of the server: We started container-izing things and focusing on building one-service per container (or VM). Suddenly the OS matters a lot less; each container only needs the libs and kernel services that it needs to do its job. It doesn't need the whole OS and doesn't benefit from the whole OS being present.

I suspect there's an opportunity for a lot of alternative systems to make inroads in that space. But, then again, if the OS doesn't matter...it may be that we all just end up using the path of least resistance, which is almost always a small Linux distro with all the stuff we're used to. But, for someone that loves Plan 9 or Solaris or OpenBSD and wants to deliver service containers, they can probably get away with deploying containers using those systems without people balking at the idea.

Containers and the direction they're evolving is the revenge of Tenenbaum:


My take on that debate: They were both right, but they were optimizing for different things.

Tenenbaum was right in a theoretical sense and an eventual sense. Microkernels are better. Separation of concerns is better.

Torvalds was right in the early-mid 1990s and right in a pragmatic sense. More monolithic (or vertically integrated) kernels are easier to build and ship and are faster on smaller machines. A microkernel made little pragmatic sense on a early-mid 90s box. But keep in mind that these machines were far less powerful than a Raspberry Pi. My first Linux machine was a 66mhz 80586 clone with 32MB of RAM.

Once you have machines with dozens or even hundreds of cores, NUMA, and that host massively diverse multi-tenant work loads, microkernels start making a whole lot of sense. Once you have to containerize apps for security, they make sense too. They also make sense if we want to eliminate reboots, etc. They make much more pragmatic sense now than they did in the 90s.

Ultimately microkernels or microkernel-like systems will win.

What do containers have to do with microkernels?

If we're talking about BSD jails, Solaris/Illumos zones, and Linux cgroups + namespaces, there are basically two scenarios: Either multi-tenancy is handled through hardware virtualization (mainstream cloud providers), or all of the containers are run on a shared monolithic kernel (Joyent's Triton, running Linux binaries in containers via LX-branded zones on Illumos). I suppose the former scenario could be seen as a vindication of microkernels, if you consider the Xen hypervisor a microkernel. But that has nothing to do with containers.

>What do containers have to do with microkernels?

Xen is becoming a major OS, and it is a microkernel OS.

Containers are not VMs.

But, they can be (and are) used in similar ways. I think it's reasonable to talk about VMs and containers in the same conversation.

Sure, but the conversation brought up microkernels in relation to containers specifically - which is nonsensical.

Microkernels have already won.

That is what most cars, factory control units, avionics and many other embedded real-time OSes are.

Every phone also has a tiny OS running their radio band unit, which happens to be OKL4.

Just the Linux crowd seems to be unaware of them.

sounds like the mainframe has been reinvented...

Yeah, technology-wise, we're approaching what we had in the 1970s before microcomputers.

Except now it's all a lot cheaper.

To really be back to them, we need to replace C and C++ with the safer systems programming languages that those mainframes already enjoyed in the late 60's.

ESPOL, NEWP, Algol-68RS are a few examples.

That is going to take a while, maybe a few developer generations still.

Go, Rust, Swift

Swift is the only one being actively pushed into developers by an OS vendor.

Apparently Genode OS has now integrated Rust, but they still aren't using it.

It would be nice if the Android team recognised the work being done by the Go team and integrated Go into its OS at very least on the NDK, but I don't see it happen. Specially after seeing their reaction about Java alternatives at this Google IO. The attitude "nothing but Java" is still the official one.

On Windows I would like to see .NET Native become more relevant and bring System C# goodies to mainstream Windows.

But...isn't Java a "safer systems language" in much the same way Swift is? So, it's a lot like Swift in that regard, in that Google is pushing it. And, there are other JVM languages, including some that are at least as nice as Swift. Kotlin looks really promising.

I'm not sure why Java should be excluded if we're talking about systems languages other than C/C++ that are safer.

Java is not a systems programming language due to the lack of value types, com.misc.unsafe not being an official package and AOT compilation is not part of the standard toolchain (only third parties have JDKs with AOT compilers).

Also Google is pushing Android Java, not Java, with its partial support of standard Java features.

If you see their Google IO talks, they really mean Java and not other languages that happen to target the JVM.

Because it's an applications language, and not a systems language, Java uses garbage collection, which has unpredictable performance, while Swift doesn't.

For me as a developer, the OS stopped to matter when I moved away from C and C++ back into languages with richer runtimes and libraries.

I also started to see POSIX as the extended C runtime that didn't make it into the ANSI C standard.

Interestingly, this argument was also examined in a recent conference [1] and the same conclusions are drawn. Specifically: applications are using high-level frameworks more (and POSIX less) which use platform-specific extensions, ioctl dominates, and new (disjoint) abstractions are introduced by the major OS platforms.

[1] http://dx.doi.org/10.1145/2901318.2901350

Suppose we remove everything from an OS that the particular service doesn't need, then you tie the app+OS together as a unit. How is that different from just a regular application running on the host OS? If it is something better, then why don't we write applications to be that way to begin with? Serious question.

> If it is something better, then why don't we write applications to be that way to begin with?

Isn't that the whole idea of unikernels?

It is the idea of unikernels, but the question of "why isn't everybody doing that?" is still valid.

And the answer (IMO) is that in the end your unikernels are just fat applications and your hypervisor is just an emaciated OS. The abstractions are at the wrong places and you can't properly optimize resource usage because the hypervisor has to treat the guest OS as a black box. Otherwise it's eventually going to cease being a hypervisor and be a cruft-laden kernel.

"The abstractions are at the wrong places and you can't properly optimize resource usage because the hypervisor has to treat the guest OS as a black box."

If our only concern was to get the most performance out of a single piece of hardware, that would be the end of the discussion. But, that's not the problem this kind of architecture is setting out to solve. We've already reached the point where raw performance on a single box is less important than being able to virtualize. EC2 proved that a decade ago; we're willing to trade raw performance for the flexibility of virtualized infrastructure in some cases. Hell, EC2 is about twice as expensive as similar physical hardware (even today); but subtracting the cost of managing the hardware and adding the ability to spin them up and down makes them more cost-effective for some deployments.

So, sure, people are trying to make it efficient. And virtualized and container based hardware has improved remarkably in that time (virtual disk and network I/O is now very close to bare hardware performance, whereas it was very far from it in the early days). But, even when they fail to achieve near peak performance there's still a compelling benefit for many uses.

"How is that different from just a regular application running on the host OS?"

The idea is to have each app in a "ready-to-run" state, so that it can be popped into existence anywhere on a pile of infrastructure and shutdown when it's not needed anymore. A container+app is a self-contained unit. If you're running several apps on an OS, and you update the OS, you now have to test all of the apps to be sure the update doesn't break things. If a security vulnerability happens for any of those services, you may no longer be able to trust the entire system. Apps that misbehave might impact all of the other services on the system. Containers are not a panacea, mind you...they introduce new headaches of their own, and the way a lot of folks are building and deploying containers is a joke (it's been a great excuse for people to start pushing out big binary blobs of crap again, which was a thing I'd hoped was behind us years ago).

But, the principle benefit is that a container provides a level of abstraction at the service level rather than the application level. This "thing" (which is a container that has an app in it) can be spun up anywhere and do its job. It can be upgraded or modified (including libraries and other packages) without impacting any other thing in your stack. Part of a self-healing infrastructure is being able to think of any service as a cog in a greater machine, rather than some individual piece of complexity you have to interact with to keep it running. And, ideally, we'll be able to outsource maintenance of those cogs to someone else, the way we outsource packaging binaries to OS vendors, today. Right now, that's a mess, and most of the people building containers have no clue what they're doing; so it's a minefield. But, it won't always be so.

It's just another abstraction for us to work with, is what I'm trying to say. A container can be a black box that we only need to open up occasionally (or never, if we trust our vendor to build it right).

"If it is something better, then why don't we write applications to be that way to begin with? Serious question."

We probably will, eventually. At least, it'll be closer to that than big full OS + a bunch of apps running on it.

The benefits are many; the costs and tooling are still high and fragile. But, it'll get better, and we'll get more fine-grained in how we're deploying these pieces. Most people, including myself, still think in terms of "I have a server, now I put things on it, and they combine to make a website/mail server/whatever". That's probably the wrong way to approach big web scale problems (though the number of people actually managing web scale systems vs. server-scale systems is probably small; most web sites and mail servers will never need to expand beyond a single server, and it's probably a stupid waste of time and resources to do otherwise).

The future of big web service looks like Kubernetes or something along those lines (one-app per container, very small and focused OS images, somewhat ephemeral in nature...they will come and go). It does not look like one big server with a bunch of apps running on it.

That's pretty much what people do with containers.

Right, the question is "how is that different from a statically-linked application?"

It is different because it is assembled from manageable pieces and still enjoys the benefits of memory separation.

For example, running a tiny jobs server:

* Scheduler in one process which spawns

* Application code running in many subprocesses

The scheduler might be in Ruby and the jobs might be in Ruby; or the scheduler might be Cron and the jobs in shell. In either case, if a job crashes the other jobs and the scheduler are very likely to carry on their work.

It is also nice that you can bring together tools from different communities into a single application. A "Rails" container might utilize Nginx (C), Rails/Unicorn (Ruby) and Node (JS/C++) together. A single statically linked application would, barring some really amazing innovations in something like Clang modules, have to be in one language therefore from one community.

> A single statically linked application would, barring some really amazing innovations in something like Clang modules, have to be in one language therefore from one community.

Not necessarily. If they all can compile to native machine code with the same C-like semantics, they can (in theory) be combined in that sort of manner. This is true of both Rust and Go, last I checked, as it is with pretty much any compiled language that has a "convert to C" step (Chicken Scheme is an example that I'm particularly familiar with) or perhaps even a "convert to LLVM IR" step.

Not that this is especially feasible for the majority of languages being used in web development (at least by those with a lick of sanity; Go and Scheme are the only exceptions that I'm aware of off the top of my head), but it's still worth considering.

Last I used Go (granted, 2010), it refused to use the platform ABI, and shipped with kenc so that you could compile your C code for its ABI. So, it had C interop, but not binary interop with C. For the kind of interoperability you're talking about here, if each language uses its own ABI, we get O(N*N) complexity. Have things changed with the Go ABI, or at least its FFI?

I'm sure the Plan9 ABI variant used by Go makes more sense than the platform ABI, but supporting the platform ABI for FFI doesn't add much complexity at all to the compiler.

In that case, I stand corrected with Go (I incorrectly assumed that - being supposedly a systems language - it would support the platform ABI, but I guess Go is even worse than I thought ;) ).

I'm 91% sure my point still stands with Rust, though, at the very least if everything's compiled through LLVM (and uses the same calling conventions).

Well that sounds like Erlang.

You're not wrong.

Erlang solved a lot of problems of scale in very effective ways. These new architectures are solving the same problems in ways that look similar from a distance. This new(ish) approach just means we can use existing software mostly unchanged, while still getting a lot of the resiliency benefits of something like Erlang; doing it with Erlang means everything has to be written in Erlang. That wouldn't be a crazy idea for some deployments. But, for others, it's not tenable.

Erlang's support for C modules was always good, though. PyPy, Swift and Rust do all seem to have a fairly good export-to-C story. With Clang modules something like Erlang could become the "cloud shell".

Interference from other running apps (memory, network, filesystem, etc).

Operating system upgrades.

Configuration management.

Startup/teardown of multiple instances.

Symbiotic or dependent processes/programs.

Programs not specifically built with isolation in mind.

Inter-application security.


In some regards that's currently a myth perpetrated by the folks making money on container-based deployments.

Containers don't actually contain very well on Linux (yet, though some nice things are happening on that front, and you can already build very solid things with SELinux, but I haven't seen anybody actually doing so; except probably the Project Atomic folks).

But, yes, in theory, a container-based deployment is hella tight on the security front. One service being compromised is no big deal: Kill it and instantly spin up a new one on a new IP (hopefully with a fix for the security issue). If your visibility into the system is high enough, you may even be able to architect detection of compromises; e.g. if your container for database service starts sending email or opens an IRC port, you know it's fucked, and you kill it with prejudice. In a system built to think of containers as black boxes with APIs, you don't need to keep any particular one around if you have reason to tear it down.

Note that I haven't actually seen people doing any of this with containers. But, it's a thing that one could do that cannot be done when all of your services are applications running on a single big machine. The narrower the purpose of your container, the more secure it can become. You still have lots of moving parts in your total infrastructure (more, actually, since the container management has a cost, too), but each moving part has very well-defined boundaries, and misbehavior is easier to spot and easier to shut down quickly and via automated means.

Some details would be a little more helpful.

It allows developers to operate like they have a statically-linked application without having to undergo the cognitive dissonance of questioning the received "wisdom" that they are bad.

> each container only needs the libs and kernel services that it needs to do its job. It doesn't need the whole OS and doesn't benefit from the whole OS being present.

Yes, but the kernel services are the part that's incompatible between systems. Containers do not run their own kernel, but instead run on their host OS kernel.

A Linux/UNIX ABI is probably the minimum cost of entry for an OS to participate in this idea, at least in the short-term, since everything for the server is currently built primarily for Linux.

Er. The OS minus the OS services?

This idea would mean you're not running the OS's kernel, and you are only running the userspace components that were written against Linux emulation, which are often just Linux binaries copied from some Linux distribution.

We have that. It's called Linux.

And NT and Solaris both support the Linux ABI now. OS X / iOS is basically the only holdout with nontrivial market share.

I don't know that Oracle Solaris supports the Linux ABI. The revival of Linux ABI support in zones (containers) is a SmartOS (and illumos) thing, and we've diverged significantly from Solaris at this point.

Aw, looks like it was Oracle Solaris 10 only.

(Is there a good generic word for Oracle Solaris + SmartOS + Illumos + etc.?)

SmartOS is really just a distribution of illumos; so are OmniOS, OpenIndiana, etc. We are all very similar as we push and pull from our common source base.

When Oracle took the source closed again, they essentially forked their own private product. They have diverged significantly enough from everybody else that it's not really helpful to include them in the same family anymore.

tl,dr: we just call it all "illumos" now.

Is anyone still using Solaris?

I knew it had fans but I assumed people moved on after Oracle closed the open source development.

I will bet lots of places in the enterprise, which didn't care how Solaris was developed in first place, and were already Sun customers before they open sourced it.

I'm sure.

But where I was it seemed like it was open sourced there was a lot of excitement about its future, and after being bought by Oracle and closed source, not so much.

I've had similar thoughts - that containerization might start opening the path for microkernel style OSes.

Microkernels have actually won in the embedded and real time space.

Not sure why you're being downvoted, because this is absolutely the case. Examples include L4, QNX, FreeRTOS, Minix, etc. The vast majority of kernels that have been designed specifically for embedded or real-time applications are microkernels.

That, of course, doesn't mean that only microkernels are used in embedded and real-time applications. It's not uncommon to see, for example, a stripped-down Linux or even Windows in some applications. In some cases, even MS-DOS and its kin are still used. But by and large, the most of the kernels you're looking at in this space are indeed microkernels.

The microkernel architecture also offers some advantages in the high-assurance space. See seL4, for example. This doesn't necessarily have anything to do with any technological superiority of microkernels, per se, but they tend to be far smaller than monolithic kernels, which in turn makes them much easier to audit and verify. The Minix 3 kernel, for example, is something like 6,000 lines of C -- roughly 2/3 the size of GNU grep -- which makes it comparatively easy to audit, simply because there's less code to audit.

Your OS is now your hypervisor. If you don't see that, then your containers are so loosely coupled they barely have to interact with each other. Wait until you need to tightly couple several components and you'll be back to square one.

Hello. I'm the guy that funded the amd64 port of the Ken C compilers (later used by the Go team) and amd64 port for Plan 9, did the Xen 2 and lguest ports of Plan 9, wrote the first 9p file system implementation for Linux, and ran the project that ported Plan 9 to 3 generations of the Blue Gene supercomputer (to name a few Plan 9 things). I wanted to make a few comments about Harvey since it came up here.

C compiler: yes, years ago, Ken C was faster than alternatives like gcc. To my surprise, that's no longer true. A full build of all the plan 9 libraries, tools, and kernels takes about 2-3 minutes with ken c, gcc, and clang. It's why on every push to github we can do a full build with both gcc and clang using travis. There's no longer a good reason to stick with the Plan 9 C compiler and, once the Go team dropped it too, there were lots of good reasons to move away from it. So we did. As the original paper from Ken pointed out, it only ever produced medium quality code. It also had several bugs with 64-bit scalars that people had to work around in their C code. It was fast but in the end not a very good compiler.

Unix wrinkles: there are so many, but just consider the sockets interface. Sockets completely break the Unix model of named entities that you can open and close and read and write. Sockets don't have names and are not visible in the file system. It's a long list of what they got wrong and that's why, in 1978, other mechanisms were proposed (if you're willing to dig you can find the RFC and you'll see ideas like /dev/tcp/hostname and so on). But to our dismay (I was there at the time) the current BSD socket interface won. Many of us were disappointed by that decision but it was the fastest way to get network support into Unix.

On the acceptance of Harvey: Presotto and Hume showed up for our BOF at the 2015 Usenix in Santa Clara and were very supportive of what we are doing with Harvey. Other former members of the Bell Labs community have been similarly supportive.

On the logo: I'm glad people like it.

On the way forward: we welcome new contributors, there's lots of work to do.


I like the fact that this is happening, with a live Git repo and all.

9front is alive, sure, but bootstrapping things atop a modern compiler and a (at least partially) Linux compatible ABI makes a lot of sense.

I didn't get it, what could gcc/clang potentially bring to Plan9?

Plan9 uses a different dialect of C written by Ritchie (his last compiler) than most of today's software. Also, the standard library and programming model are quite substantially different than the ANSI C model. There is something on Plan9 called APE (ANSI Posix Environment), which is kind of like WINE (an emulation layer, not an emulator or compatible ABI), for porting ANSI C apps, but it's built with a rather old version of GCC and is far from complete.

Meaning, most software won't run on Plan9 without major modifications, so bringing a modern GCC/CLANG would give access to better compilation, but really there needs to be more work on APE to get more software over (lots of low hanging fruit there from what I've seen).

APE isn't anything like wine. It's just some libraries that give you a posix-ish api. It also doesnt use GCC at all, it uses kencc (the native plan 9 compiler) and cpp.

then they could have just updated ape, should be less work than trying to babysit gcc.

A modern optimization pipeline.

A lot of the fun of the Plan 9 compiler and source organization is that it doesn't optimize; instead, it compiles fast. Like the whole OS in about a minute (on a Raspberry PI). Or maybe it's 2 minutes, I don't recall.

The Go folks have adopted some of the same ideas (minimal optimization and no nested include files), and they build on the experience of the Plan 9 compiler to better divide the work between the "compiler" and "linker", so we gain more from parallel make.

Whole system takes about 55 minutes on a Raspberry Pi. Just the kernel takes a bit over a minute.

The build speed doesn't matter much if you don't like the runtime speed.

Note that the pi is about as fast as a 233 MHz pentium 2.

Aha! That makes so much sense to me. My first PC, which I still have (runs OpenBSD nowadays), is a 350 MHz Pentium II. When I got my Raspberry Pi (a version 1 model B), running a 500 MHz ARM, I was a little taken aback by just how much slower it seemed than my old P2 box. It's got a faster clock and more RAM, how could that be? But it certainly feels slower!

Do you have any sources on this? Can you perhaps link me to something that explains the differences between the ARM and the P2 that lead to such a difference in performance despite the clock speeds?

Runtime speed isn't much of an issue for us. There's nothing in the codebase that large or complicated enough to be slow.

I remember benchmarking plan9 C against gcc waaay back; 1999 or so. Gcc'd programs were a fair bit faster. And I was compiling and running the programs under cygwin under Windows95!

I did this partly because plan9 had published some source code and times on a system with exactly the same hardware as mine - AMD k6/350.

I suspect the plan9 compiler hasn't had nearly as many changes since then so the gap would be much wider.

I acknowledge compile time is important also - and this is where the plan9/golang people have made a reasonable trade-off.

What does an operating system gain from maintaining its own slightly special C compiler?

independence from complex, unmaintainable, ever moving code that is mostly controlled by other people who don't share our values. for example nobody in the loonix community seems to be able to ensure that cross-compiling works.

I don't see what you gain with either of these that's worth enough to put in that much work and add that much complexity. linuxemu is a better solution for running Linux programs, all of the added unix junk is contained. Using GCC or clang over kencc could be useful in some cases, but there, too, I don't see what fantastic gains you get. I do see major issues there like shipping GPL code with GCC and adding a massive new codebase to the tree (it wouldn't suprise me if GCC or clang's source trees were larger than 9front's). All I really see in Harvey is feature creep and trying to make Plan 9 into Linux.

linuxemu has been dead for years - it's pretty much impossible to run modern 64-bit stuff in Plan9. Like, you know, a 2010-era browser...

For, and if anybody was wondering, the name is a reference to the Jimmy Stewart movie "Harvey"[1] , in which Jimmy Stewart is a grown man who has an imaginary best friend, a 6 foot tall bunny rabbit.

[1]: http://www.imdb.com/title/tt0042546/

The Harvey character and shadow are one of the most creative things I have seen in a long time. Bravo!

I can't think of Harvey without thinking "Crichton!"


Hope this project makes real inroads!

w.r.t. the speed comment. The Plan 9 C compiler (which I've used for 20 years) no longer has any speed advantage over, e.g., clang. And the code it produces is not very good. Time to move on.

Aren't we all supposed to dream about unikernels moving forward anyway?

If I'm just using a library operating system that is linked directly into a single unikernel that merges the application code and system code, then I don't really care if I'm running on a hypervisor, on bare metal, or a host operating system. I'm only using a few system capabilities, and I'm not really taking advantage of other services running on the OS.

So I don't really care if the host OS is Linux, or Plan 9, or BSD. It just has to be UNIX-y enough to host a VM for my unikernel.

A unikernel isn't really much more than a weird system call ABI, though. There are some APIs that are better performing than standard (eg, virtio vs sockets) but that's a design issue, and operating systems can provide similarly performing apis. For example, netlink.

operating system that does away with Unix's wrinkles

What would those be?

Ever taken a look at the sockets API?

What's a pooka?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact