
The Case for Writing a Kernel in Rust [pdf] - lainon
http://www.cs.virginia.edu/~bjc8c/papers/levy17rustkernel.pdf
======
steveklabnik
Note that one of the co-authors of this paper, Amit, is the author of
[https://www.tockos.org/](https://www.tockos.org/)

This paper is sort of a successor to
[https://sing.stanford.edu/site/publications/levy-
plos15-tock...](https://sing.stanford.edu/site/publications/levy-
plos15-tock.pdf) , a previous paper on this topic. After publication, Amit
solved many of the issues presented there. It was previously discussed on HN
[https://news.ycombinator.com/item?id=14105884](https://news.ycombinator.com/item?id=14105884)

Some more context
[https://www.reddit.com/r/rust/comments/655816/ownership_is_t...](https://www.reddit.com/r/rust/comments/655816/ownership_is_theft_experiences_building_an/dg7p2uf/)

Amit left even more context just now on /r/rust:
[https://www.reddit.com/r/rust/comments/75xwib/the_case_for_w...](https://www.reddit.com/r/rust/comments/75xwib/the_case_for_writing_an_os_in_rust/do9wk3g/)

~~~
frankmcsherry
Some minor pedantry that winds me up a bit, and that I thought I'd try to
correct:

> Note that one of the co-authors of this paper, Amit, is the author of
> [https://www.tockos.org/](https://www.tockos.org/)

The definite article "the" precludes the existence of other authors, who (from
personal experience) can get wound up by language like this.

You may have meant "author of the web site", in which case I have no opinion,
but if you meant "author of the project" I'd totally use the indefinite
article "an".

Also all of the work is very interesting and people should read it.

~~~
steveklabnik
Yes, good call, thank you! I sometimes forget this distinction. You're totally
correct that more people than Amit work on Tock, for sure, and I don't want to
exclude their hard work.

------
jp_rider
If you haven't already, check out Redox OS [1,2]. It's a microkernel written
in Rust.

Here's a good interview with the developer [3].

[1] [https://www.reddit.com/r/Redox/](https://www.reddit.com/r/Redox/)

[2] [https://redox-os.org/](https://redox-os.org/)

[3]
[https://www.youtube.com/watch?v=eH5JgMlNE8o](https://www.youtube.com/watch?v=eH5JgMlNE8o)

~~~
Animats
That's very nice. It's approaching the QNX level of microkernel.

One unusual feature of QNX is that the kernel doesn't parse strings anywhere.
There's a "resource manager", but that's a process. Programs register with the
resource manager for a piece of the pathname namespace ("/dev", "/fs", etc.)
and then get open requests sent to them when a pathname starts with their part
of the namespace. Parsing creates a large attack surface, and getting it out
of the kernel is a win.

QNX tries to avoid variable-length objects in the kernel. Messages are
variable length and copied by the kernel, but from one user space to another,
not queued in kernel space. Most of the ways a kernel can run out of memory
are avoided in QNX. If the kernel is out of resources, some system calls
return errors, but the kernel doesn't crash.

If you're doing a kernel in Rust, it's helpful to think that way. Rust doesn't
handle out-of-memory conditions well.

~~~
kbenson
> Rust doesn't handle out-of-memory conditions well.

Do you happen to know languages that do handle out of memory conditions well?
That seems like an interesting topic. If I understand how it's done in C, I
wouldn't call that "well", but it does provide the mechanisms for doing it
(which is more than can be said for all languages). Language level features
(or coding styles in C that could be implemented at a language level
elsewhere) that provide for increased and intuitive control would be
interesting.

~~~
Animats
Ada is one of the few languages that takes out-of-memory conditions seriously.
The exception Storage_Error is raised.

Java, C# and Microsoft's common runtime have out-of-memory exceptions, but I'm
not sure how reliable they are in a limited-memory environment. They're more
like "GC isn't helping much" exceptions.

------
nickpsecurity
This is good work. I enjoyed reading it. I wonder, though, why Muen separation
kernel wasn't mentioned in related work on kernels in safe languages. It uses
the SPARK language so that it's provably immune to a number of defects. The
Ada compilers can also give you as little or much runtime as you want with
several "profiles" available. Muen uses the "minimal, Zero-Runtime" profile.
Rust improves on Ada and SPARK with the affine types giving temporal safety
plus more flexible model for safe concurrency than Ravenscar. CompSci work
shows one can probably do that and info-flow security with contracts but that
would be extra, specialist work for Ada/SPARK versus what's built-in to Rust
already. Verbose, too.

[https://muen.codelabs.ch/](https://muen.codelabs.ch/)

When I about the Rust kernels, I assumed there would be some trusted
components in there which could screw them up on assurance side. Aside from
full formal verification, one thing someone might want to try is writing those
pieces in SPARK like Muen does. Make a direct port from any proven SPARK to
Rust source. Alternatively, compile the SPARK into executable code that Rust
calls with its FFI since C code can call SPARK code. Then, those untrusted
portions are shown safe with one set of rules while the rest is shown safe
with Rust's compiler. Optionally, convert those to an intermediate language of
CompCert or CakeML compilers for certified compilation. This is how my Brute-
Force Assurance model would attempt to handle each's limitations in a kernel
project.

Note: Only problem with using SPARK for that is the person doing it probably
has to buy a license if the Rust kernel in question isn't GPL. SPARK is only
free for use with GPL code. Frama-C is another option one might use to avoid
that but SPARK is stronger offering.

~~~
pjmlp
If you see from my discussions, many in the community (not the Rust designers)
seem to be unaware, willing or not, of all the great work that came before
Rust.

Rust is making a great progress infecting other languages with the idea of
having affine types for resource management and concurrency is possible, just
that the ergonomics still need some fine tuning.

No need to glace over the ideas of other strong type systems programming
languages.

~~~
pjmlp
Oh well, not being a native speaker only now I realized that the last sentence
is the complete opposite of what I intended to say, and the edit is button is
gone now.

What I actually wanted to say, is that it would be worthwhile for the
community to learn what has been done before, like the examples referred by
nicksecurity.

------
Sag0Sag0
I tried to write a kernel in rust but reverted back to c. I just feel that it
is not very nice systems language. The entire time i felt like i was fighting
to manipulate the language into doing something it wasn't supposed to do.

~~~
gwerbin
Would you be able to go into more detail on this? An example maybe? It might
be useful for someone like me who's "interested in Rust" but hasn't tried it
(yet).

~~~
Sag0Sag0
The amount of unsafe code necessary for low level work, the difficulty in
sending a stream of bytes to an io port and the innate complexity of the
language. C's philosophy is everything is a series of bits which you can
manipulate at will. Rust's is the same except you have to through a huge
amount of hoops first to make things "safe".

Things like converting a String to an i64 are a pain. There is large amount of
boilerplate code needed.

All in all this is a personal preference. I prefer some unsafeness in c to (in
my opinion) the constant inconvenience of programming in rust.

~~~
comex
I know it's just one example, and you're reporting more of a general
impression, but since you mentioned converting a String to an i64… well, in
what way?

If you mean reading the bytes of the string as an i64, that's

    
    
        unsafe { *(s.as_ptr() as *const i64) }
    

...which is a bit more verbose than C's

    
    
        *(int64_t *) ptr
    

but not that verbose, especially given that there will often be more code in
the unsafe block. On the other hand, in C that's often technically undefined
behavior due to strict aliasing, so to get strictly conformant code you have
to use memcpy. Rust has no strict aliasing.

(I wouldn't actually recommend the above Rust code, because it doesn't verify
that the string's length matches i64 - but neither does C, so it's a fair
comparison. More idiomatic wrappers can be found in third-party crates like
byteorder or pod.)

If you mean casting the pointer to the bytes to an i64, that's

    
    
        s.as_ptr() as i64
    

which doesn't seem verbose. (No unsafe needed in that case.)

~~~
makapuf
> to get strictly conformant [C] code you have to use memcpy couldn't you use
> an union there ?

> Rust has no strict aliasing. does that mean that rust cannot optimize for
> things C could ? (real quesiton I don't know rust)

~~~
oconnor663
> does that mean that rust cannot optimize for things C could?

Here's my best understanding of the situation. Someone who actually
understands the compiler might have to correct me:

\- Pointers in _unsafe_ Rust don't do any strict aliasing optimizations, which
C compilers sometimes do. The Rust memory model isn't fully specified, though,
and the status quo seems to be related to not actually passing type
information to LLVM. Not clear whether this will change in the future. There's
some discussion of it here:
[http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-...](http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-
tootsie-pop-model-for-unsafe-code)

\- References in _safe_ Rust (the vast majority of code) have much stronger
aliasing information than pointers do in C. This is one of the core features
of Rust, that references that allow mutation are guaranteed not to be aliased.
I think the status quo is that this information isn't passed to LLVM because
of some LLVM bugs getting in the way, but that it should start working in the
near future. When all of this is working, I think it should produce code
that's faster than C, in the same way that Fortran sometimes does.

~~~
comex
You’re on target. Just to clarify: even references (“safe”) do not need type-
based alias analysis (aka “strict aliasing”, which is what GCC calls the flag
enabling/disabling it). All Rust references have semantics similar to C
“restrict”: there should never be any conflicting writes from other sources,
because immutable references imply the data shouldn’t change at all (nobody
has a mutable reference), and mutable references are exclusive. (Types with
interior mutability are an exception, but the compiler knows what types those
are and special-cases them.) So the compiler can assume “no alias” most of the
time with no need to care about types.

The formal specification of rules for unsafe code hasn’t been written yet,
because, well, it’s an ambitious goal! Even the C standard is sometimes not
really clear about what counts as undefined behavior; Rust wants to do better,
while being more permissive, and offering a ‘sanitizer’ tool to verify
correctness at runtime. And implement this on top of LLVM, which was written
by other people, is designed for C’s rules, and, like other compilers, doesn’t
even get those right in every case (even when the spec is clear).

For now, the effort is still fairly tentative. But I’m pretty confident that
type-based aliasing analysis will never be a thing in Rust, so it will always
be legal to read data through ‘wrong-typed’ pointers, both raw pointers and
references (as long as it’s valid data, alignment is right, etc.).

Actually, I’m embarrassed: my code from earlier isn’t actually legal in all
cases. It requires the pointer to be correctly aligned, which in the case of
String it probably will be, but it’s not guaranteed. Meh.

------
naasking
There's another OS that used some aspects of safe code injected into the
kernel: Pebble [1]

[1] [https://www.usenix.org/conference/workshop-embedded-
systems/...](https://www.usenix.org/conference/workshop-embedded-
systems/pebble-component-based-operating-system-embedded-applications)

------
jpfr
The easiest way forward for a rust kernel is not to start from scratch, but to
port Minix3. Minix is an open source micro-kernel somewhat similar to QNX. The
kernel itself is on the order of 4000 lines of code. [1] Though small, Minix
has quite a long history, is super-stable and looks like a sweet spot in
kernel development.

The kernel could be ported piecemeal while always being able to boot during
the transition. Similar to how remacs is slowly porting emacs over to rust
[2].

And on top of that, you can use the rump kernel approach to run the device
drivers, TCP/IP, etc. from NetBSD. So you immediately get a fully functional
OS.

[1] [http://www.minix3.org/](http://www.minix3.org/)

[2] [https://github.com/Wilfred/remacs](https://github.com/Wilfred/remacs)

------
eggestad
Let me be the contrarian to make the case for NOT writing a new kernel at all
(no matter what language)

Many a year ago (1997-2000) I was working on banking and finance software. I
was a part of that breed of programmers that loath cobol and wanted everything
to be in an object oriented language. (C++ was the poison of choice at the
time.) The was a particular evangelical group of us in this company.

With this group I found my self in a meeting with one of the old timers,
business graduate, his jobs was to explain to us CS diaper babies how banking
and finance actually worked when we wrote the software. (surprisingly NOT
we're actually clueless about the bigger purpose of what a core banking system
is fore (other than the obvious)).

After patently listening to us with the facial expression that we're all
diaper babies, he asked a rather pointed question:

OK, supposed we did this, write a new system from scratch using OO programming
and all the other industry standard components as of today, what _new features
to the end user can you build_? Translated, when he was going to walk over to
the bank to sell it, what would be his pitch?

I got the message, what new pitch would the bank get to get new customers? Or
simplified, when was the last time you went in to a prospecting new bank and
asked 'What programming language or paradigm is your core banking system
written in?'

I don't give a shit about what language a kernel is written in, only what the
system calls do. Please tell me what system call you can write in RUST that
you can't write in C.

The first time the problem arose when it was necessary to translate one HW
assembly to a new CPU, the people at the time took a look at the problem and
asked the question. What is easier, write an assembler A to assembler B
translator, or write the kernel in the most minimal language possible that
then can be compiled for an infinite HW architectures. The answer was simple;
C.

The answer is equally easy now. There is no system call you can implement in
RUST that you can't implement in C.

Just like core banking, finance and insurance systems are still in COBOL, the
OS kernels will be for the foreseeable future in C.

The case for writing a kernel in RUST should be because a need a fun hobby.
The thing is that a proof of concept kernel is done when you've doen 10% of
the work. The remaining 90% is just mind numbingly boring shit that you'll
never complete. Even if you do, what's the end user's justification to switch?

On the issue of micro kernel vs monolithic kernel, I'll most humbly refer to
Linus Torvalds previous statements.

TJ

~~~
kbenson
I find it interesting that even though your story is about banking and finance
software, you did not once mention safety or security. That doesn't magically
make it a good idea to start from scratch, but unless you have a perfect
security record, it should definitely be part of the deciding criteria. For
something like the kernel, well they are still dealing with problems in C
today. [1]

1: [https://www.exploit-
db.com/search/?action=search&q=kernel&g-...](https://www.exploit-
db.com/search/?action=search&q=kernel&g-recaptcha-
response=&text=overflow&platform=16) (answer the captcha and search again)

------
mtgx
Someone send this to the Google Fuchsia team.

~~~
pjmlp
I guess Google is somehow aware of safety.

Regarding Fuchsia, the C code is being migrated to modern C++, the TCP/IP
stack and a few filesystem drivers are written in Go, and there are some other
small pieces in Rust.

Brillo was to be Android native layer + C++ Frameworks, it became Android
Things instead, with the ability of writing userspace drivers in Java.

ChromeOS requires disabling security to access the native layer.

On Android the NDK main goals are only for performance or accessing libraries
written in C or C++ into Java.

~~~
Rusky
Unfortunately even modern C++ brings none of the benefits described in the
article, because it defaults to unsafe code. Most of the Rust code from the
Fuchsia team is to support Rust in applications, not in any code that needs to
be trusted.

~~~
pjmlp
When used together with another language high level language, C++ takes the
role of low level _unsafe_ blocks. So there wouldn't exist too much difference
with a pure Rust application full of unsafe blocks.

Which is the approach taken by Google on Android and ChromeOS.

My understanding is that Fuchsia would follow the same approach, using only
Dart for applications.

Are there any examples of Rust applications in Fucshia?

~~~
pcwalton
You can't generally inline unsafe C++ into other languages (like, literally in
the same file, or method) the way you can inline unsafe blocks into Rust.

This increased friction invariably leads to C++ folks making a lot more code
unsafe than needs to be.

~~~
pjmlp
True, but tooling is much more relevant than the ability of inlining code.

It took me more time to make something usable with Gtk-rs, than it usually
takes me to write JNI bindings on Android.

I used to complain about the way Google manages the NDK, quite easy to find if
you search for them.

However, in the last couple of months I came to realize that it actually makes
sense, it is their way to keep Android safe while keeping the door open to
write some performance critical sections of code.

~~~
pcwalton
Not if your goal is security. Android is not a particularly secure operating
system. The JVM is not treated as a security boundary.

~~~
pjmlp
Which is exactly how we got here.

If perfect security implies less productivity, security conscious people abide
by "good enough" to appeal to the masses.

~~~
pcwalton
We were talking about the kernel. The internals of the kernel don't need to
"appeal to the masses". Nor do the low-level system libraries.

You are trying to argue that C++ for system libraries plus a high-level
language for apps is as secure as an all Rust app. This is empirically not
true. Virtually the entire security community disagrees with you.

------
banachtarski
Be prepared for issues with ABI incompatibility and all that. Rust is a low
level-language, but IMO not a systems language (disclaimer: I tried rust
extensively over a 2 week period, background is a blend of C, C++ and Haskell
from which it draws a lot of influence).

~~~
bluejekyll
I’m sorry. But claiming Rust is not a systems language is akin to claiming a
Volvo is not a car.

If you don’t consider Rust to be a systems language, then you’ve got a very
skewed perception of what a systems language is.

You seem to be alluding to Rusts ABI potentially being an issue, and that’s a
fair criticism. Where a stable ABI needs to be exported, Rust supports the C
FFI, like almost all other languages.

~~~
frankmcsherry
I've had a bunch of issues getting clean statements about the ABI that
actually result in usable guarantees.

For example, while you are advised to put `#[repr(C)]` on types you want to
have a known layout for FFI, to the best of my understanding this decoration
isn't included for generic types like tuples, slices, or `Vec`. Afaiu, Rust
reserves the right to change the layout of `(T1, T2)` for `#[repr(C)]` types
`T1` and `T2` on a build-by-build basis, which makes a lot of the core types
unsuitable for FFI.

Anyhow, that's my issue with it as a "systems" language. I can't (yet) write
code that compiles down to the minimal operations I believe the computer can
do, without re-inventing a non-trivial hunk of the standard library. Or, if I
can, I can't find the language in the spec that ensures I won't have to fix
things.

~~~
pcwalton
I don't believe that Rust slices and tuples have unstable ABIs. I guess that
isn't documented as well as it could be, though. File an issue perhaps?

Isn't std::vector in C++ equally unspecified, though?

~~~
frankmcsherry
My understanding of the field-reordering work is that it applies to tuples.
From [http://camlorn.net/posts/April%202017/rust-struct-field-
reor...](http://camlorn.net/posts/April%202017/rust-struct-field-
reordering.html):

> The 2000-foot version: Rust structs, enums, and tuples are now automatically
> smaller in some cases. It's possible for the compiler to work with types
> whose in-memory field order doesn't match that of your source code.

This would mean (again, as I understand it) that you cannot rely on the order
of elements in a tuple, if for example you wanted to pass a `&[(T1,T2)]` to
some other code (even other Rust code, but built separately).

Edit: serious follow up: You've been working with Servo, and I'm assuming at
various moments you (pl) need to map in shared libraries that Servo wasn't
built with (e.g. codecs, whatever). Do you avoid Rust types in these FFI
interfaces, do you rewrap everything, or have pervasive repr(C) type analogues
instead?

It might be that I'm the only person in the world trying to do Rust-to-Rust
FFI, but it feels like (i) it isn't that far from being really tasteful, but
(ii) it has a bunch of issues that make it seem that no one has ever tested it
(e.g. default allocators being different, weak guarantees about layout).

