
Experiences Building an OS in Rust - jaxondu
https://mostlytyped.com/posts/experiences-building-an-os-in-ru
======
zokier
I haven't yet read the paper proper but this statement surprised me a little:

> In particular, ownership has forced us to avoid concurrency in the kernel
> and include more code than we would have liked in the TCB.

One of the strong points of Rust and its owenership system is that it allows
writing concurrent code with greater confidence. I would have hoped that it'd
encourage rather than discourage using concurrency.

~~~
Udo
I was surprised by this as well, so I checked their paper:

" _When multiple threads need to share a resource, they can pass ownership
through channels. However, as we discuss in the rest of this paper, it does
not work well in systems that use non threadbased concurrency models and when
resources must be shared [...]

While hardware interrupts are asynchronous, and therefore run concurrently
with other kernel code, in our operating system interrupt handlers enqueue
tasks to be run in the main scheduler loop, which is single-threaded. As a
result, on_receive and send can never run concurrently and no data race is
possible. We overcome both issues by declaring most resources as static
variables and borrowing them with ’static lifetime. [...]

Being able to mutably borrow static variables multiple times means our
operating system can allow multiple drivers to reference hardware resources or
each other. However, doing so means we lose the ability to detect real data
races at compile time, for example, if the kernel passes a shared resource to
an interrupt handler that may run concurrently. Instead, we resort to an
overly conservative convention in which interrupt handlers perform no work
except posting to the main scheduler’s task queue. An ideal mechanism would,
instead, allow for shared state between components that will never run
concurrently, but disallow sharing when they may._"

~~~
twic
> Instead, we resort to an overly conservative convention in which interrupt
> handlers perform no work except posting to the main scheduler’s task queue

This doesn't seem so bad. OS X does exactly this in order to minimise the time
spent in interrupt mode, and so reduce scheduling jitter:

[https://developer.apple.com/library/mac/documentation/Darwin...](https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming/performance/performance.html)

[https://developer.apple.com/library/mac/documentation/Device...](https://developer.apple.com/library/mac/documentation/DeviceDrivers/Conceptual/WritingPCIDrivers/interrupts/interrupts.html)

~~~
cwzwarich
In reality, IOKit drivers do perform work in the interrupt filter function
that actually runs in an interrupt context.

------
steveklabnik
The Reddit thread and the users.r-l.o thread have insightful posts:
[https://www.reddit.com/r/rust/comments/3nauhy/experiences_bu...](https://www.reddit.com/r/rust/comments/3nauhy/experiences_building_an_os_in_rust_feedback/)
[https://users.rust-lang.org/t/rfc-and-paper-experiences-
buil...](https://users.rust-lang.org/t/rfc-and-paper-experiences-building-an-
os-in-rust/3110/2)

------
twic
The "new hardware protection mechanism called MPU available on some new ARM
Cortex-M microcontrollers" is a device which lets the operating system define
up to eight memory regions (sixteen on the M7), optionally with holes in them.
In each region, permissions can be set for read, write, and execute, and the
policy for cacheing and sharing between cores can be controlled (although this
is presumably not really relevant for single-core microcontrollers).

[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BIHJJABA.html)

[https://blog.feabhas.com/2013/02/setting-up-the-
cortex-m34-a...](https://blog.feabhas.com/2013/02/setting-up-the-
cortex-m34-armv7-m-memory-protection-unit-mpu/)

It's a bit like a normal MMU bit without virtual memory.

~~~
nickpsecurity
They're used in lots of smartcards and embedded designs going way back. I
designed, but didn't implement, similar extensions because virtual memory
model was a complex mess I couldn't secure. Segments were simple and had been
used for effective security before combined with isolation aspect of MMU's.
MPU's were the logical step from that. Easy to implement with highly assured
methods, too, due to simplicity. Not sure how far back they go but here's an
analysis of issues done in 2004 that reference MPU advantages:

[http://www.control.lth.se/documents/2004/5737.pdf](http://www.control.lth.se/documents/2004/5737.pdf)

The documents from Gemalto, Infineon, etc for their EAL5+ or EAL6+ smartcards
give details on how they do the high security variants of MPU's. Rockwell-
Collins ACL2-based method can also be used given they built a whole processor
(AAMP7G) with it.

------
kibwen
Quoting the great feedback from Gankro on Reddit:

Discussing this on Twitter/IRC:

* Wants const size_of closures (for statically allocating that many bytes). We just need RFC #1245 to get implemented (it has been accepted), and for someone to mark the size_of intrinsic as a const fn. However this might get into the weeds pretty bad since "size" is handled by trans (LLVM even). Dunno those details.

* Shared mutability of hardware registers should be able to be soundly handled by Cell.

This leaves the big issue:

The kernel has been architected as a single "main" thread, and a bunch of
interrupt handling threads. All the interrupt-handling threads do is enqueue a
message for the main thread to handle. The main thread then drives all the
drivers off of this queue, so they all run on one thread. However they want to
do some shared mutable stuff. In particular, I think they want closures that
close over the same value mutably? RefCell isn't particularly acceptable
because crashing is Super Bad, and they would like to avoid runtime checks
anyway. They just found out about Cell, so it might work, but they seem to
think this wouldn't be right.

You can make Cells pretty fine-grained. You can also consider /u/SimonSapin's
proposed extension to Cell which has a replace method for non-Copy types so
you could replace (say) an arbitrary Cell<Option<T>> with None temporarily
through a shared reference.

Alternatively, it might be possible to manually thread the shared mutable
state since it's all being driven by some top-level event-loop (as far as I
can tell). This was actually an old std pattern: [https://doc.rust-
lang.org/0.11.0/std/collections/hashmap/str...](https://doc.rust-
lang.org/0.11.0/std/collections/hashmap/struct.HashMap.html#method.find_with_or_insert_with)
(the A is "shared" closed state).

Interested to hear more about the finer details of the shared mutable state!

------
f_
There was a recent HN topic here on the redox-os[0] which is written in Rust
(and heavy WIP). It did actually use a lot of unsafe, though.

[0] [https://github.com/redox-os/redox](https://github.com/redox-os/redox)

------
com2kid
As someone who works on the innards of a cooperative multitasking embedded OS
for a living, I am sick and tired of C and C++.

Closures is a huge one. Everything is async in HW land, an interrupt fires and
calls you when data is ready, so doing a read looks like

    
    
        MyOpInitialFunction()
        {
            // Business Logic
            AsyncReadFromFlash(sDataBuffer, sizeOfBuff, sizeToRead MyOpFirstCallback, contextVar);
        }
    
        MyOpFirstCallback(byte* dataBuf, size_t sizeRead, void* context)
        {
            //Business logic, use data read in some how
            AsyncReadFromFlash(sDataBuffer, sizeOfBuff, sizeToRead MyOpSecondCallback, contextVar);
        }
        
        MyOpSecondCallback(byte* dataBuf, size_t sizeRead, void* context)
        // and so on and so forth
    

It gets boring really fast. The system has a nice DMA controller that we use a
lot, so a similar pattern exists for copying large chunks of data around.

You can actually implement a variant of closures in C using Macros, and right
now the entire team wishes we had done that! (It isn't hard, in embedded land
you have complete memory control, you can just memcopy the entire stack down
to a given known location and memcopy it back to resume!)

Rust has a lot of features that are really useful for embedded, the problem is
that it is going to be a long time, if ever, that the embedded world switches
over.

Embedded systems tend to have very little code space, so efficiency of
generated code is a must. LLVM and even GCC aren't as efficient as ARM's
(horribly overpriced, feature lacking) compilers. The one thing ARM's
compilers do though is minimize code space, and they do a very good job at it.

The other issue is that any sort of language run time is questionable. Even
the C Run Time is typically not included in embedded (teams opt out of using
it), if you want printf you typically write whatever subset of printf
functionality you want!

Oddly enough memory safety is not a real issue, though slightly better buffer
overrun protection than what we get from static analysis would be nice!

Consumers hand over a destination buffer to Producers, who return it in a
callback. Because of the above async pattern, the consumer isn't even running
until the data is ready.

That's solved 99% of our issues. :) (If you want someone else's data, you ask
for it and get it passed in a buffer you provide!)

~~~
acconsta
Any thoughts on C++ coroutines? Microsoft has a draft implemented in their
compiler, and there's a second competing proposal.

[https://msdn.microsoft.com/en-
us/magazine/mt573711.aspx](https://msdn.microsoft.com/en-
us/magazine/mt573711.aspx)

[http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2015/n445...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2015/n4453.pdf)

Ideally we'd implement something similar in Rust.

~~~
com2kid
Embedded land is stuck with C++ 2003.

It sort of sucks.

(This is a world where v-tables are still a concern, they take up way too much
code space if inheritance is allowed to get out of hand.)

Allocation is basically verbotin. This line of code is very real in embedded:

    
    
      #define malloc assert
    

Back to being on topic, current C++ template, with futures et al., is horrible
and may cause eye cancer, Microsoft's final code is obviously cleaner. (The
counter proposal is much better written though, they cover a lot more
information.)

Now, if I am in happy sappy land where I get my ideal languages, it'd look
something like this:

    
    
        CopyBuffers(//... assume stuff is passed in)
        {
            bytesRead = 0;
            while(bytesRead < sizeOfFile)
            {
                error, bytesThisLoop = AsyncRead(fileHandle, &buffer, bufferSize);
                if( error )
                {  // do stuff }
                if( bytesReadThisLoop == 0 ) // something has gone wrong, error should have caught it
                { // Something is wrong }
                    
                bytesRead += bytesThisLoop;
                
                error, bytesWritten = AsyncWrite(destHandle, &buffer, bytesRead);
                if( error )
                {  // do stuff }
                if( bytesWritten != bytesRead )
                { // Something may be wrong, depending on our write contract }
        }
    

Someplace in AsyncWrite my entire state is captured and handed off to someone.
The counter proposal suggests I handle the state myself, which is fine (but
not as cool looking!)

Now on to said competing paper;

 _" Memory usage is exactly that required to represent the resumable
function’s maximum stack depth,"_

Which for a cooperative multitasking system, that is pretty well defined, you
have a while(1) somewhere dispatching tasks! (Or an equivalent
thread/fibers/etc manager.)

I agree that the "up and out" model leads to more boilerplate. Although I
actually like requiring use of "await" to indicate that calling this function
is going to pause the thread, but I'd like it used purely as an FYI to the
programmer that is compiler enforced to be put in place. (I'm not a fan of
surprises.)

They have an idea of a resumable object, which I think they take too long
explaining, all coroutine systems have this, you have to at least copy the
stack off to somewhere.

 _" Resumable objects are non-copyable and non-moveable"_

Sort of sad, I can think of some very fun abuses, but since 99% of them would
result in a crash... ;)

So overall, and this is the part I am not clear about, they want to restrict
this to resumable expressions. From what I grok, that seems to mesh with my
request for "await" being enforced. (though here it seems needed).

The final syntax at the end is a bit ugly, I am not a fan of out parameters.
Ignoring that, it may be less work overall on the programmer to get the base
case up and running, I don't want to have to wrap all my async functions in
adapter classes, it is nice if the compiler does it for me.

The more complex cases get ugly though. I'd like to see a really simple non-
template example of just how to read from a file from the consumer's
perspective.

IMHO that is 90% of the problem space for async. "IO is slow, get back to me
when you're done."

Thanks for leading me down this hole! :)

~~~
acconsta
Yeah, I'm not brave enough for embedded.

I'm still pretty excited about what's happening in this space though. Thanks
for your thoughts.

~~~
com2kid
In some ways embedded is easier, in others much harder. Doing it all on your
own is hard, but if you have a good team backing you I think it is actually
easier than trying to write a large desktop app in native.

No OS to get in your way. No decades of legacy APIs to clog things up. Because
the entire world is understandable, no need to be concerned about someone
else's thread ruining your day.

All your I/O channels are well measured and have know latency and throughput.
If you are lucky enough enough to work with SRAM, then you even get a nice
constant access speed to memory.

Interrupt routines are a fancy way to say "callback" and Direct Memory Access
is a nice way to say "memcopy without CPU overhead".

That said, the initial getting up and running part is hard, that is why lots
of pre-existing systems with their own little runtime exist.

------
wyldfire
This is valuable feedback. It's great that Rust has been so receptive, making
changes where appropriate.

~~~
kibwen
Indeed, and while I think their specific proposal is quite unlikely to end up
in the language proper (it's a lot of machinery for a very specific purpose,
and still isn't quite "safe" by Rust's definition), it's great food for
thought for future enhancements to the language (such as making size_of a
const function so that it can be evaluated at compile time).

------
nickpsecurity
There's been quite a bit of work at securing embedded systems with safer
languages (esp Java) or MPU's (esp smartcards). Gemalto's EAL7 JavaCard work,
MULTOS, IBM's Caernarvon... all made stuff to enforce strong protection w/
similarly, rough constraints. Usually no MPU although some smartcards had them
with some public data on design or proper usage. I recommend the authors dig
up stuff on systems like that to find strategies for countering various
issues.

Most that I saw handle resource and security decisions via type system, safe
transformations to bytecode, and safe handling of that. Fast paths are
sometimes implemented in assembler with safety put into HLL or VM interface to
that... if vendor felt like it. (shakes head) Where this didn't work, there
were coding rules or frameworks for carefully designing and decomposing the
system to manually prevent/catch issues. Most do all of this on development
machine with target system just blindly accepting or trusted booting a signed
binary representing system or apps. Cuz what else can you do with such
constraints... (shrugs)

Far as specific hardware or Rust issues, I can't help as I don't work with
those.

------
acconsta
Basic question — when you're writing a kernel in Rust with its own threading
implementation, how does the rust compiler even know what threads are? It has
to identify thread boundaries for send/sync, right?

~~~
dbaupp
The compiler has no concept of what a thread is anywhere, even in the standard
library. The safety is imposed by careful bounds on exactly the right
functions: those that interact/create new threads, e.g. std::thread::spawn has
signature:

    
    
      pub fn spawn<F, T>(f: F) -> JoinHandle<T>
          where F: Send + 'static + FnOnce() -> T, T: Send + 'static
    

Other than implementations of `Send`/`Sync` themselves, this is essentially
the only function in std that needs to bound by them.

The same approach can be used in a kernel context, e.g. place bounds on the
functions that create (or, at least, initialise interaction between)
concurrently executing pieces of code.

~~~
acconsta
Ah, so Rust does represent thread semantics through types? It's curious that
the authors don't mention that in their proposal.

~~~
dbaupp
Yeah, everything is driven by implementing (or not, as appropriate) Send and
Sync for types.

