DIOS: Data Intensive Operating System

jcr · on Nov 25, 2014

The APSYS 2013 paper mentioned and linked is also available from the author:

"New wine in old skins: the case for distributed operating systems in the data center"

http://www.cl.cam.ac.uk/~ms705/pub/papers/2013-apsys-dios.pd...

kazinator · on Nov 25, 2014

Alan Perlis Epigram #5: If a program manipulates a large amount of data, it does so in a small number of ways.

You're not going to very much that it is interesting with abstractions designed to pump around a huge volume of data through a distributed system.

Maybe that's why the papers are vaguely written, avoiding the topic of describing concrete use cases.

The paper hints at the cloud, but not everything in the cloud handles a large, monolithic chunk of data.

Nothing in the paper convinces me that you need a special operating system with new abstractions right in the kernel, rather than just middleware.

ms705 · on Nov 25, 2014

Thanks for the comments! :) [Full disclosure: I'm the first author of the APSYS paper and one of the researchers working on DIOS.]

Cautionary note: the APSYS paper was written in April 2013 -- at the time, there was little more than the idea of "maybe someone should look into distributed OSes for data centres". Much time has passed since, and there is now a DIOS prototype as well as far more design work. Things have concretized significantly since.

The target environment for DIOS is a shared, multi-user, multi-job cluster, such as Google's warehouse-scale data centres: an environment where many different tasks run, which selectively share data but which must also be isolated from each other. (Consider, for example, a front-end web server that obtains records from a back-end key-value store, and which produces logs that are later processed by batch jobs.)

"Data intensitivity" comes into this by virtue of the scale of the overall system (thousands of tasks/machines), but also via the applicable optimizations. For example, one key optimization is caching of data in memory (cf. Spark, Tachyon). True, this can be done in middleware (as these examples show), but arguably the OS already does it (viz. the buffer cache). Unifying the OS notion of caching (usually at inode/block level) and the distributed system's notion (at object level) seems like a reasonable proposition. Likewise, the OS kernel scheduler's long-term load-balancer and the cluster scheduler likely have information that they would benefit from sharing (e.g. about the interaction of different tasks sharing a machine).

DIOS is a research project that aims to find out if there is gain to be had from changing the OS -- we set out to answer the question, rather than knowing what we wanted to find! (The APSYS paper was testing the waters in terms of what other people think, and we got some useful feedback from it.)

So far (bearing in mind that the work is ongoing), it looks like the compelling advantages from changing the kernel abstractions are:

1. Complete control over data-flow: being able to actually enforce policies like "the task may only use its inputs to deterministically generate its outputs" (as commonly assumed -- but not enforced -- in systems like MapReduce, Dryad, CIEL etc.). When running an application with DIOS system calls only, we know for sure that there is no way I/O can have happened other than via these system calls and on the objects exposed to the task. This is something only the kernel can ensure.

2. Opportunities for end-to-end performance optimization: by having clear semantics of abstractions that are valid across machines, we can move away from the conventional wisdom that the OS should treat all communication equally. Concretely, the buffer copy implied in using BSD sockets for network communication and the generality of a kernel network stack are two examples of one-size-fits-all OS design that we can side-step: in one use-case, we fast-track UDP packets destined for a coordination service in user-space through a network stack bypass that delivers them at far lower latency. DIOS can make this optimization because it has semantic information about the low-latency needs of the application available. (For sure, there are other ways of implementing this, but they either involve kernel changes or give up the ability to track network data-flow [e.g. kernel bypass solutions]).

3. Convenient removal of scalability-inhibiting abstractions: it's well known that some POSIX APIs induce poor scalability (see e.g. the Commuter work from MIT in SOSP 2013). One key example is the notion that FD numbers are allocated in a monotonically increasing sequence (thus requiring synchronization between threads). By redesigning the abstractions for data-intensive applications, we can fix these sorts of problems in passing.

Hope that helps!

jff · on Nov 25, 2014

It's always Linux, isn't it.

Everyone always says they're making a specialized, lightweight OS for whatever task and then the first thing they do is saddle it with a 20-year-old clone of a 40-year-old kernel.

I also disagree with the assertion that distributed desktop operating systems are a bad idea :)

ms705 · on Nov 25, 2014

(Full disclosure: I'm one of the researchers working on DIOS and first author on the linked paper above.)

Agreed -- the reliance on Linux is definitely something that we've found to be a mixed blessing.

We actually looked fairly seriously into other candidate OSes as starting points for building DIOS. We considered L4::Fiasco, xv6 and Barrelfish, and investigated Barrelfish in depth. (L4 and xv6, at least at the time, were available for 32-bit only, which clearly wasn't going to cut it for a warehouse-scale OS.) Ultimately, we went with Linux over Barrelfish due to the more comprehensive driver support, better documentation and the fact that it works on our test machines.

Barrelfish had great trouble booting on one of our machines with a lot of physical memory; after patching it, we managed to boot it, but did not have PCIe devices available (required for networking). To their credit, the Barrelfish team were very supportive, but progress was slow and working with Linux allowed us to get started right away.

However, using Linux has also bought us into a bunch of annoying non-scalable implementation choices. For example, DIOS relies on fast memory mapping, but Linux serializes all access to the mm_struct for a process using a single semaphore.

That said, DIOS is (deliberately) written in such a way that it should be possible to port it to other host kernels. Specifically, the kernel code consists of three parts: 1. a small patch to route the new system calls to handlers; 2. a BSD-licensed core module that contains the DIOS logic, but which is implemented in standard ANSI C and does not rely on any Linux-specific kernel features; 3. a GPL-licensed "DIOS abstraction layer" (DAL), which offers access to the Linux kernel facilities for process and memory management, VFS calls, etc.

While our current prototype is for Linux only, we intend to revisit the Barrelfish port and will also look into porting DIOS to a BSD OS in the future. Barrelfish especially should be interesting -- it's a very good fit for DIOS's abstractions.

jff · on Nov 25, 2014

Thanks for the reply!

As someone involved in computing research (sometimes OS kernel research, even) I can appreciate the need to actually get something going.

Ideally, if I was starting off a project with the intent of focusing on something like distributed data management, I'd want to sit down and figure out what I actually need and write a kernel to do that, rather than pulling in Linux with its DECnet drivers and 300+ system calls and what-not. Maybe a Unix-like system with POSIX calls isn't the best way to approach the problem. The problem is that you end up spending a lot of time dicking around with the scheduler and figuring out why memory allocation sometimes screws up, and less time implementing the real deal.

I've had good experiences working with a 64-bit Plan 9 kernel, since I was already familiar with the code and it was quite minimal, but I still ended up fighting the compatibility/driver crap you mentioned. It's also pretty easy to get functionality, but getting functionality and performance is a pain.

adl · on Nov 25, 2014

DIOS means God in spanish, and in all caps it almost always means the Christian God. I hope some people aren't turn off by the name.