

DIOS: Data Intensive Operating System - mrry
http://www.cl.cam.ac.uk/research/srg/netos/camsas/dios/

======
jcr
The APSYS 2013 paper mentioned and linked is also available from the author:

"New wine in old skins: the case for distributed operating systems in the data
center"

[http://www.cl.cam.ac.uk/~ms705/pub/papers/2013-apsys-
dios.pd...](http://www.cl.cam.ac.uk/~ms705/pub/papers/2013-apsys-dios.pdf)

------
kazinator
Alan Perlis Epigram #5: _If a program manipulates a large amount of data, it
does so in a small number of ways._

You're not going to very much that it is interesting with abstractions
designed to pump around a huge volume of data through a distributed system.

Maybe that's why the papers are vaguely written, avoiding the topic of
describing concrete use cases.

The paper hints at the cloud, but not everything in the cloud handles a large,
monolithic chunk of data.

Nothing in the paper convinces me that you need a special operating system
with new abstractions right in the kernel, rather than just middleware.

~~~
ms705
Thanks for the comments! :) [Full disclosure: I'm the first author of the
APSYS paper and one of the researchers working on DIOS.]

Cautionary note: the APSYS paper was written in April 2013 -- at the time,
there was little more than the idea of "maybe someone should look into
distributed OSes for data centres". Much time has passed since, and there is
now a DIOS prototype as well as far more design work. Things have concretized
significantly since.

The target environment for DIOS is a shared, multi-user, multi-job cluster,
such as Google's warehouse-scale data centres: an environment where many
different tasks run, which selectively share data but which must also be
isolated from each other. (Consider, for example, a front-end web server that
obtains records from a back-end key-value store, and which produces logs that
are later processed by batch jobs.)

"Data intensitivity" comes into this by virtue of the scale of the overall
system (thousands of tasks/machines), but also via the applicable
optimizations. For example, one key optimization is caching of data in memory
(cf. Spark, Tachyon). True, this can be done in middleware (as these examples
show), but arguably the OS already does it (viz. the buffer cache). Unifying
the OS notion of caching (usually at inode/block level) and the distributed
system's notion (at object level) seems like a reasonable proposition.
Likewise, the OS kernel scheduler's long-term load-balancer and the cluster
scheduler likely have information that they would benefit from sharing (e.g.
about the interaction of different tasks sharing a machine).

DIOS is a research project that aims to find out _if_ there is gain to be had
from changing the OS -- we set out to answer the question, rather than knowing
what we wanted to find! (The APSYS paper was testing the waters in terms of
what other people think, and we got some useful feedback from it.)

So far (bearing in mind that the work is ongoing), it looks like the
compelling advantages from changing the kernel abstractions are:

1\. Complete control over data-flow: being able to actually _enforce_ policies
like "the task may only use its inputs to deterministically generate its
outputs" (as commonly assumed -- but not enforced -- in systems like
MapReduce, Dryad, CIEL etc.). When running an application with DIOS system
calls only, we know for sure that there is no way I/O can have happened other
than via these system calls and on the objects exposed to the task. This is
something only the kernel can ensure.

2\. Opportunities for end-to-end performance optimization: by having clear
semantics of abstractions that are valid across machines, we can move away
from the conventional wisdom that the OS should treat all communication
equally. Concretely, the buffer copy implied in using BSD sockets for network
communication and the generality of a kernel network stack are two examples of
one-size-fits-all OS design that we can side-step: in one use-case, we fast-
track UDP packets destined for a coordination service in user-space through a
network stack bypass that delivers them at far lower latency. DIOS can make
this optimization because it has semantic information about the low-latency
needs of the application available. (For sure, there are other ways of
implementing this, but they either involve kernel changes or give up the
ability to track network data-flow [e.g. kernel bypass solutions]).

3\. Convenient removal of scalability-inhibiting abstractions: it's well known
that some POSIX APIs induce poor scalability (see e.g. the Commuter work from
MIT in SOSP 2013). One key example is the notion that FD numbers are allocated
in a monotonically increasing sequence (thus requiring synchronization between
threads). By redesigning the abstractions for data-intensive applications, we
can fix these sorts of problems in passing.

Hope that helps!

------
jff
It's always Linux, isn't it.

Everyone always says they're making a specialized, lightweight OS for whatever
task and then the first thing they do is saddle it with a 20-year-old clone of
a 40-year-old kernel.

I also disagree with the assertion that distributed desktop operating systems
are a bad idea :)

~~~
ms705
(Full disclosure: I'm one of the researchers working on DIOS and first author
on the linked paper above.)

Agreed -- the reliance on Linux is definitely something that we've found to be
a mixed blessing.

We actually looked fairly seriously into other candidate OSes as starting
points for building DIOS. We considered L4::Fiasco, xv6 and Barrelfish, and
investigated Barrelfish in depth. (L4 and xv6, at least at the time, were
available for 32-bit only, which clearly wasn't going to cut it for a
warehouse-scale OS.) Ultimately, we went with Linux over Barrelfish due to the
more comprehensive driver support, better documentation and the fact that it
works on our test machines.

Barrelfish had great trouble booting on one of our machines with a lot of
physical memory; after patching it, we managed to boot it, but did not have
PCIe devices available (required for networking). To their credit, the
Barrelfish team were very supportive, but progress was slow and working with
Linux allowed us to get started right away.

However, using Linux has also bought us into a bunch of annoying non-scalable
implementation choices. For example, DIOS relies on fast memory mapping, but
Linux serializes all access to the mm_struct for a process using a single
semaphore.

That said, DIOS is (deliberately) written in such a way that it should be
possible to port it to other host kernels. Specifically, the kernel code
consists of three parts: 1\. a small patch to route the new system calls to
handlers; 2\. a BSD-licensed core module that contains the DIOS logic, but
which is implemented in standard ANSI C and does not rely on any Linux-
specific kernel features; 3\. a GPL-licensed "DIOS abstraction layer" (DAL),
which offers access to the Linux kernel facilities for process and memory
management, VFS calls, etc.

While our current prototype is for Linux only, we intend to revisit the
Barrelfish port and will also look into porting DIOS to a BSD OS in the
future. Barrelfish especially should be interesting -- it's a very good fit
for DIOS's abstractions.

~~~
jff
Thanks for the reply!

As someone involved in computing research (sometimes OS kernel research, even)
I can appreciate the need to actually get something going.

Ideally, if I was starting off a project with the intent of focusing on
something like distributed data management, I'd want to sit down and figure
out what I actually need and write a kernel to do that, rather than pulling in
Linux with its DECnet drivers and 300+ system calls and what-not. Maybe a
Unix-like system with POSIX calls isn't the best way to approach the problem.
The problem is that you end up spending a lot of time dicking around with the
scheduler and figuring out why memory allocation sometimes screws up, and less
time implementing the real deal.

I've had good experiences working with a 64-bit Plan 9 kernel, since I was
already familiar with the code and it was quite minimal, but I still ended up
fighting the compatibility/driver crap you mentioned. It's also pretty easy to
get functionality, but getting functionality and performance is a pain.

------
adl
DIOS means God in spanish, and in all caps it almost always means the
Christian God. I hope some people aren't turn off by the name.

