
IncludeOS: C++ unikernel now free and open source - AlfredBratterud
https://github.com/hioa-cs/IncludeOS
======
meesterdude
> A minimal bootable image, including bootloader, operating system components
> and a complete C++ standard library is currently 693K when optimized for
> size.

WOW. That's just nuts! I mean I know linux can be slim, but thats less than a
meg!

Really very exciting stuff. I don't know if I should play JC3 or tinker with
this.

~~~
creshal
I'm pretty sure you can find MIPS Linux images with Busybox et. all. that
weigh less than that. A lot of routers run Linux on 2M eproms including user
space.

~~~
david-given
I have a port of Alan Cox's Fuzix for the MSP430 where the entire bootable ROM
image is 27kB.

(That's actually quite big for a tiny-computer OS, although it's the smallest
Unix I know of.)

~~~
cbsmith
I seem to recall the QNX kernel is pretty slim as well.

~~~
david-given
That's just the microkernel, though --- it didn't contain any functionality
other than sending messages (and a few other things, microkernels very).

Fuzix' 27kB contains the scheduler, drivers, tty, file system and Posix
subsystem!

~~~
cbsmith
Fuzix definitely was end to end. On the other hand, the QNX kernel is a more
capable foundation to build on.

------
marssaxman
Oh, wow, this is exactly what I've been working on with a project called
"fleet", though they've gotten a lot further than I did. I guess I'll have to
consider ditching my code and using theirs instead.

------
DiabloD3
So anyone know how small the L4 kernel is compared to this?

~~~
nickpsecurity
5-10 times smaller for regular ones with it closer to 50 for separation
kernels. That's not a fair comparison, though, as you need extra components to
support the unikernel or just components on the kernel. A more fair comparison
might be L4Re, a stripped GenodeOS, OKL4 w/ necessary software, or NOVA with
necessary software.

L4Re was the first attempt I believe at an environment for apps on L4:

[https://os.inf.tu-dresden.de/L4Re/overview.html](https://os.inf.tu-
dresden.de/L4Re/overview.html)

------
mtgx
Any chance it will support Xen, too, in the future? Also, unikernels should
preferably be written in memory safe languages like Rust.

~~~
easytiger
> unikernels should preferably be written in memory safe languages like Rust

There is a rust project. How is that relevant to this?

~~~
yaantc
When using a unikernel design there is no isolation between a single node
software components like an OS can provide. All the code share the same memory
space.

Isolation can be useful. For example the Postfix email server (running on a
Unix, not unikernel) is decomposed into several processes with different
privileges. That allows running the most sensitive parts with limited access
rights, to protect against attacks, while still running those parts on the
same server for efficiency. This process isolation is typically not provided
with unikernels.

A lot of unikernels compensate for this by providing language level
protection, by using safe high level language (see MirageOS using OCaml, other
based on Erlang, haskell or rust). Then it's not process isolation but the
language and its implementation that guarantee that provides protection /
isolation between components. My understanding is that is what GP refers too.

A unikernel based on C or C++ will have no process isolation, and no language
level isolation either. So sensitive components would have to be split into
different nodes, isolated by using either separate VMs or machines. That's
doable, but adds complexity to the orchestration and possibly some overhead.

~~~
cbsmith
> That allows running the most sensitive parts with limited access rights, to
> protect against attacks, while still running those parts on the same server
> for efficiency. This process isolation is typically not provided with
> unikernels.

Unikernels also don't have notions of access rights. To make that model work,
more than memory safety, you need declarative ways of asserting constraints,
which... C++ arguably can pull off pretty well.

> A unikernel based on C or C++ will have no process isolation, and no
> language level isolation either.

C++ has lots of support for language level isolation. Yes, due to C
compatibility, if you diddle with memory you can find ways to violate those
contracts, but it is entirely possible, particularly in a unikernel context,
to prove that you aren't doing that.

> That's doable, but adds complexity to the orchestration and possibly some
> overhead.

That'd add way more complexity and likely bugs to the code than just keeping
the codebase clean.

~~~
pjc50
Possible, but not necessarily easy; what tools do you use to sweep C++ code
for 'unsafe' constructs?

~~~
nickpsecurity
I'm interested in the answer to that, too. A member of the Chrome team asked
what static analysis or advanced verification tools I thought they could use
in a significant C++ project. Digging around, I think I just found one,
limited one plus two ways of doing Design-by-Contract (asserts & OOP). That
was it. Not inspiring lol.

Now, there has been work on type-safe or memory-safe version of C++. They're
non-standard. They also get smashed when a memory error occurs and _that will
happen_. So, suggesting to rely on language-based isolation in C++ is a more a
joke than something worth trying.

Good example of work on C++ safety:

[https://www.cis.upenn.edu/~eir/papers/2013/ironclad/paper.pd...](https://www.cis.upenn.edu/~eir/papers/2013/ironclad/paper.pdf)

~~~
cbsmith
> they could use in a significant C++ project.

The difference with Chromium is you've got all that integration with standard
C runtimes that are inherently dicey. Unikernels are a different story.

~~~
nickpsecurity
C runtimes do add extra issues but that's not relevant to my comment. I said I
largely came up dry on methods to prove correctness of _C++ code_. Quite
important if one is considering C++ vs other language for a robust application
and/or unikernel. C++ would be a bad choice if language itself was supposed to
contribute to robustness.

~~~
cbsmith
> C runtimes do add extra issues but that's not relevant to my comment.

I really don't see how it is. If you're running on a desktop platform, you've
got a huge exposed surface that is working with raw pointers to proprietary
logic. That makes provable correctness a far, far more complex problem.

> I said I largely came up dry on methods to prove correctness of C++ code.

It is easy to implement a smart pointer that the compiler can _prove_ will
always do bounds checking before dereferencing. The hard part is proving that
all the code that uses raw pointers is doing the same thing.

~~~
nickpsecurity
"I really don't see how it is. If you're running on a desktop platform, you've
got a huge exposed surface that is working with raw pointers to proprietary
logic. That makes provable correctness a far, far more complex problem."

The point is that C++ itself is damn-near impossible to analyze on the cheap
and without much false positives. That's before I even considered the C
interface. Then there's C level problems that have been the reason I've
opposed it forever. At least there's tons of stuff to draw on in analysing,
transforming, etc that code. I'd go with a C subset or Java/Ada subset with
high-integrity runtime any day over C++.

"It is easy to implement a smart pointer that the compiler can prove will
always do bounds checking before dereferencing. The hard part is proving that
all the code that uses raw pointers is doing the same thing."

I'll take your word on the smart pointers doing bounds-checks as I'm not up-
to-date on all the techniques of C++ developers. Academics need to do a fresh
take on that with assessments vs particular risks & compared to current
languages. Meanwhile, most in safety-critical development that I know of don't
use C++ because it's too complex and unsafe per them. I know there's MISRA
subset and some other stuff. There are people who use it with thorough testing
and source-to-object validation. Mostly not in use, though.

So, do you have any resources showing that C++ code is safe and analysable if
one just uses smart pointers? And what tools and subset you use to do that? If
you have that and proof it works, then that could help a lot of developers
using C that aren't aware of it. I'm being serious as much as I am challenging
your claim. If you have it, I'll consider it.

~~~
cbsmith
> I'll take your word on the smart pointers doing bounds-checks as I'm not up-
> to-date on all the techniques of C++ developers. Academics need to do a
> fresh take on that with assessments vs particular risks & compared to
> current languages.

That's kind of already happened. Stroustrup has done a whole ton of work in
that area with Concepts.

> Meanwhile, most in safety-critical development that I know of don't use C++
> because it's too complex and unsafe per them.

It turns out that provable correctness invariably involves a fair bit of
complexity (you are basically compiling a mathematical proof). People use
Haskell and Coq to really do it right, and --surprise-- those turn out to be
hard for programmers to learn.

A lot of other, more popular, high level languages are actually terrible for
provable correctness, even if they are better for proving memory safety. C++
isn't as effective for the job, but it has the advantage of being great for
integrating in with the platform. It is a trade off, but one that is well
worth while.

> So, do you have any resources showing that C++ code is safe and analysable
> if one just uses smart pointers?

This stuff goes back a way, but stemmed from Modern C++ Design. There is a
whole world of policy based design where you use the type system (much as with
Haskell and Coq) to enforce declarative policies.

The quick thought experiment would be something like this:

    
    
        template <class T>
        struct SafeRef {
            void check() { ... }
            operator T&() { check(); return *x; }
            operator const T&() const { check(); return *x; }
        private:
            T* x;
        };
    

You can override operator-> to make it behave more like a proper pointer. CRTP
gives you some pretty powerful ways of getting the job done too.

~~~
nickpsecurity
"That's kind of already happened. Stroustrup has done a whole ton of work in
that area with Concepts."

Wasn't aware of that. It was an interesting read. Thanks for mentioning it.

"It turns out that provable correctness invariably involves a fair bit of
complexity"

I wasn't even talking about that. I just looked for static analysis tools that
could reliably find common implementation flaws or interface issues with
little to no false positives. These already exist for C, Java, Ada, C#, and
academic languages. Similarly, some verification or foundation of standard
library like Modula-3's or the one for C. I found little to nothing of any of
this for C++. So, the C++ verifications would all be visual and manual unless
you pay big $$$ for one of few commercial tools.

Unacceptable. Formal methods would make C++ unacceptable for even more
reasons.

"This stuff goes back a way, but stemmed from Modern C++ Design."

Same book pjmp recommended. Guess the study should start with it.

"You can override operator-> to make it behave more like a proper pointer.
CRTP gives you some pretty powerful ways of getting the job done too."

Interesting example. I think one test of C++'s safety would be whether such
methods can provide same protections that Ada provides where applicable to
both languages:

[http://www.adacore.com/knowledge/technical-papers/safe-
secur...](http://www.adacore.com/knowledge/technical-papers/safe-secure/)

It would need to catch the problems, do it during compile phase, and do it
fast enough to be productive. I heard bad things about C++ compile phase in
the past, esp for template heavy code. Plus, needs design-by-contract as
Eiffel and SPARK have shown. I've seen it done with asserts and object
constructors/destructors so that's probably not a problem. The other stuff,
esp static analysis for memory & concurrency safety, is where C++ will be
judged most.

