Hacker News new | past | comments | ask | show | jobs | submit login
Rethinking the guest operating system (lwn.net)
120 points by justincormack on Sept 19, 2013 | hide | past | web | favorite | 48 comments

A lot like: http://www.openmirage.org/

This is an operating system that runs directly on the Xen hypervisor, written in OCaml, which does away with the usual OS abstractions.

Or http://erlangonxen.org/ which does away with a "traditional" OS entirely, running an erlang runtime directly on top of Xen.

Right, they all sound similar.

What the article doesn't really emphasize enough is the aim of Cloudius is entirely about running the JVM. This of course will guarantee popularity amongst "enterprise" types.

However it's still very similar to running a JVM in a process directly on the host. You could do something similar by running a JVM on the host and using cgroups to confine it.

Cloudius's USP is that existing clouds are already running full guest operating systems, so their OS-v running a JVM fits into this landscape naturally. Architecturally it's nothing new.

I wish Dor & Avi well though :-) (ex Red Hat associates)

BEA (now Oracle) had their JVM (JRockit) running on a hypervisor. I've never seen it in real life, but apparently it was a real thing[1] at one point.

[1] http://www.javaworld.com/community/node/4304

The current product page has download links:


Sun also tried this with Maxine Virtual Edition (nee Guest VM) JVM. I can't find any updates on it so it looks like it was canned.

https://kenai.com/projects/guestvm http://www.youtube.com/watch?v=iHIaH12f2Ek

While interesting for bespoke applications the problem is, if you take this concept to the extreme, you end up with a traditional operating system... except it'll only run Erlang.

Also https://github.com/GaloisInc/HaLVM (Haskell). Though I think it doesn't handle parallelism.

Stanford has another somewhat related project: http://dune.scs.stanford.edu

In their case, the application runs both as a process in the host and as a guest. It gives the application access to traditional OS APIs, and allows use of processor extensions to directly access virtualized hardware. The benefits of doing this include ability for an application to do low level custom IPC, to use the page tables for garbage collection, to trace the use of system calls much more efficiently, to hook into page faults, and so on. Very cool stuff.

So in the end, we're reinventing the OS with a hypervisor acting as the kernel? Are the current hypervisors better or worse than the Linux kernel at this task? What's the reason people don't just run these processes straight in a regular Linux install with properly configured users/chroots/quotas?

I for one, really dislike this drive towards VMs for everything.

VMs are mostly about solving the same problems that operating systems were there to solve in the first place, only slower.

I personally think that part of the problem is the way software is installed and configured by modern package managers and distributions. It makes people see the installed software as part of the operating system almost. If you want two webservers with different configurations you therefore need two operating systems.

The concept for creating a couple of different users and running the software out of each one seems foreign to modern system admin these days.

One additional reason for the popularity of VMs in enterprise environments is that software vendors will often refuse to support applications on servers that have other applications installed.

Supporting high-availability of guest servers by allowing "live" migration from one host to another is pretty cool as well - either for fail-over or to allow hardware maintenance.

The default position these days in enterprise environments seems to be that a server will be a VM unless you have a really strong case for having dedicated hardware (and that only ever seem to apply for database clusters).

Just creating a couple of different users does not give nearly the level of isolation a VM does.

The sole reason for Xen etc is that people were too lazy to do microkernels in the first place. It is exactly the same thing as we are seeing in the web browser with all the low level primitives like asm.js and WebGL: the good ideas inevitably start to creep back in (even if in a crap, watered-down form at first) given enough time. In fact, we could easily have proper MMU sandboxing in the browser by now if we were running proper microkernels. Instead we'll pay big performance and complexity overheads to do it in userspace.

Just off the top of my heads, some benefits of VMs vs processes: - you get the encapsulation/isolation out of the box - abstraction from the hardware; may not seem like a big deal if you're thinking in terms of a single machine but very useful for migration (whether live migration or cold migration) between different machines - ability to snapshot the state - arguably a smaller attack surface since you deal with fewer drivers inside the virtual machine versus a general purpose OS (VM escape exploits do exist though - no denying that).

Using a VM can also reduce copying of data, particularly in the network stack.

cgroups help solve the same problem, for tasks.

I suspect running properly isolated JVMs directly on host kernel would be faster than going through the hypervisor. I would love to see benchmarking numbers.

But there are two problems on why people keep reinventing the wheel:

- Current solution Isn't Sexy Enough, aka Not Invented Here.

- Current corporate culture drives acquisitions of tech startups based on a) business, i.e. number of users, and b) innovative technology that doesn't exist anywhere else, reinforcing previous point.

All this work to come up to yet another solution to the same old problems (IBM solved machine partitioning around 70s ?) is happening because engineers love to work on their specific pet projects and not solve other people's problems.

Not that there's anything wrong with that.

Architecturally, there's no real difference.

However in the real world there are self-service clouds which offer cheap, easy to consume VM containers. So having a JVM which runs directly in these containers makes some sense. (The alternative would be to persuade Amazon to let you run your JVM as a host process using LXC or something .. good luck with that)

OK, so run your JVM on something like Heroku or dotCloud that does use LXC. Too bad those are in turn running on EC2, not on bare metal.

I guess I'd have to see some compelling benchmarks of a JVM on OSv vs a JVM in a more traditional VM image.

Yes, processes are a long forgotten form of virtualization. You are spot on there.

They are pretty good :)

The hardware has added a new form of isolation in addition to just memory protection though, that the process alone can't use.

This isolation comes at the cost of dramatically decreased performance.


Hardware virtualization is best viewed as a hack for running multiple operating system kernels at the same time, where each kernel is designed to have a machine to itself. In any sanely designed system, this shouldn't be necessary; multiple processes under a single shared kernel should be good enough.

While that is a nice ideal, reality tends to start creeping in eventually, and a hack or two become necessary so work can get done.

Like others here, I am unconvinced that there is any benefit to basically using virtual machines as heavyweight processes. The existence of at least one multi-tenant IaaS provider (Joyent) and a few multi-tenant PaaS providers (Heroku, dotCloud) using OS-level virtualization suggests that a shared kernel running multiple processes provides enough isolation.

I am not so sure. I would say there is a market for both. There are things that LXC based solutions cannot do yet. Solutions around VMs are very mature and offer features like live migration of apps for e.g. by which I mean they even do monitoring of apps running on the data center.

This seems a bit ridiculous to me.

In the beginning you had

[OS] -> App

Then, people would put those Apps into a VM, the trend going to one VM per app.

[OS] -> [VM] -> [App]

Just to realize that the VM may be too much of an overhead, so now OSv comes along to cut that down, relying on the OS for memory management, task scheduling, etc, effectively ending up with

[OS] -> [translation layer] -> App

So that's just a glorified sandbox, why not just use LXC?

Actually just screw all of that and have:

[OS] -> App

Cheaper, less administrative overhead, less abstraction, less vendor tie in (if you go POSIX for example).

I think that might upset the virtualization proponents though...

This is my ideal solution as well. Processes are already fully isolated just like virtual machines or containers. They may only execute during time slices where the kernel schedules them, can only communicate and access system resources through the system calls the kernel provides them access to.

If processes are insufficiently isolated, it's the system call interface that's broken, not the isolation model.

It seems to me that virtual machines and containers could be implemented on top of the existing process hierarchy by allowing a parent process to intercept and reinterpret its child process' system calls. Simple example: Want to implement chroot? Intercept all open() calls and prepend the root path (taking care to prevent escaping with '../').

Wouldn't that be less secure? I understand that it's possible to use chroot and stuff, but isn't the whole point of virtualization/LXC that you have better isolation and more control over quotas and stuff?

I am just curious as to why people would use virtualization at all if it is possible to accomplish the same thing using regular processes.

Would more code be more secure? To quote Theo de Raadt, who sums my opinion up nicely as well:

"You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes."

Quotas are easy enough to enforce. Most UNIX derivatives (including Linux) have disk and process quotas, some for over 3 decades.

Virtualization seems to be best used for reselling (and overselling) hosts that are smaller than the physical machine and not much else. Migration/failover is a non issue if you know what you are doing and if you need larger machines, it's just more overhead on top of a dedicated host. Plus it's increased administrative cost and more expense as a whole.

I am little skeptical of VMs but not so much that I don't see any benefit in it.

In theory, VMs should help reduce the attack surface by a lot. For example, all the system calls in the VM are handled by the guest OS. The actual system calls made to the host should be minimal and can be more easily audited.

With OS/360 actually you already had

[VM] -> [OS] -> [App]

Which is the approach used by Windows Hyper V and VMware vSphere Hypervisor.

For example, you can improve performance by using copy-on-write via the MMU. However, Linux/Win/OSX will not allow you access the MMU. Within a VM you can do that.

This sounds like a great opportunity for anyone wanting to get into os development with a few easy tasks still todo.

I'm curious how you would configure and manage your applications. Like are you able to attach to the input and output streams from the host or would you still get some basic form of bash to manage it?


I've played with CoreOS a bit, but this is a much more radical change.

I love how people are beginning to rethink many of the things that all successful operating systems have had in common so far.

The idea of using virtualization as an inherent layer in the application architecture (ht IBM OS/360) is great for flexibility.

So we have a guest operating system that doesn't implement multitasking/timesharing because you can safely leave that up to the hypervisor. Is it just me, or is this more or less the same thing that was done with VM/CMS on IBM 370s in 1972?

If the virtualization tax is high when you care about throughput, then isn't that a case for avoiding virtualization altogether?

Someone asked me to compare the two:


tl;dr amount of code rewritten vs. reused

Will this make it possible to use docker.io directly under OSX without having to use vagrant / virtualbox to run a linux host OS for LXC containers?

Right now, only Xen, KVM and EC2 HVM are supported hypervisors. Hopefully an OSX might come with vmware support later.

No, this is sort of the opposite of containers.

Instead of having an full OS-like isolated system within a single OS without virtualization, this is using virtualization but avoids having a full OS-like system inside the guests.

Yes I understood that. But right now if you want to use docker.io on your OSX dev box you have to run linux inside a VM to be able to leverage the docker.io features (building and testing apps in lightweight containers) for later deployment on the target cloud infrastructure.

If you have OSv working both under virtualbox or vmware on your OSX dev box and under KVM or EC2 HVM on your cloud production environment, then it might be possible to have a docker.io features directly under your OSX dev box.

Guess I'm confused: What features of Docker does this provide?

Also, if you're willing to virtualize something, and want Docker features, why wouldn't you virtualize Docker?

docker is more like a frontend to pack, ship and deploy specific applications and their configuration as "containers" to be run in lightweight isolated environments (such as LXC currently).

Currently if you want to use docker under OSX you have to run it inside a Linux VM (typically using the vagrant / virtualbox). But the Linux VM is using a bunch of memory on your dev environment. If docker could run the app inside virtualbox + OSv rather than having to use virtualbox + full linux + LXC I assume you would get a more lightweight dev environment (faster boot times and less memory usage).

I'm confused why can't docker run on bsd?

I am not sure why this is opposite of containers?

Wouldn't it be possible to run a minimal POSIX OS inside a container? (genuine question, I don't know the answer to)

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact