
Ownership is Theft: Experiences Building an Embedded OS in Rust (2015) [pdf] - ingve
https://sing.stanford.edu/site/publications/levy-plos15-tock.pdf
======
bascule
I'm an embedded Rust developer, and was a little confused by this paper until
I noticed it was written in 2015. There are now solutions to a number of the
problems they pose both at the language level and the library level:

Here's one in particular:

    
    
        In principal, this would allow us to statically allocate 
        a buffer sized for each particular closure. 
    
        Unfortunately, there is no size_of keyword in Rust,
        and static initialization cannot invoke functions—
        in this case the function size_of<T>() -> usize provided
        by the Rust core library.
    

This can be solved by using the lazy_static crate:

[https://github.com/rust-lang-nursery/lazy-static.rs](https://github.com/rust-
lang-nursery/lazy-static.rs)

From the description in the README:

    
    
        Using this macro, it is possible to have statics that 
        require code to be executed at runtime in order to be 
        initialized. 
    

Hey look, it's exactly what they need! It's also "no_std" compatible and can
work just fine in a kernel-like context.

Take this paper with a grain of salt: it's a bit dated at this point, and
there are solutions for a number of the problems posed.

~~~
codys
Note that lazy-static does not "statically allocate a buffer" for all
definitions of "statically allocate": lazy-static computes it's value on first
use, and must use dynamic alloc space (ie: malloc/box)

Also, lazy-static has been around for a long time. Releases first occurred in
2014! I'd be surprised if it had not been considered.

~~~
bascule
Well, their original stated problem was just invoking size_of() at runtime,
which lazy-static solves.

That said, Rust supports custom allocators now:

[https://doc.rust-lang.org/book/custom-allocators.html](https://doc.rust-
lang.org/book/custom-allocators.html)

Write your own allocator and use Box.

~~~
rkangel
I think you misunderstand what 'statically allocate' means. That means a size
known at compile time, so that the space for that variable can be assigned in
the memory map of the executable. Anything involving a 'custom allocator' is
doing runtime memory allocation.

------
jononor
I feel that some of these issues occur due to an over-reliance on object-
oriented practices, in particular 'encapsulation' of state into objects and
triggering side-effects in callbacks. The usage of Rust brings painpoints to
the fore, but I'll argue that the code is better done otherwise, independently
of borrow checker.

For instance in the 'Callback through closures' example, there is code like:

    
    
        setTimeout(|| {
          activityLed.toggle();
        }, 2000);
    

First, in an embedded system, you cannot have multiple sources of truth. So
its not just because of Rust ownership reasoning that `activityLed` cannot be
shared between callbacks - if multiple events is to be considered for a given
output they should be syncronized explicitly.

Instead I think the code should be more like the following, which models
calculation of new state as a pure function. Side-effects are separated out by
storing the desired effect as data, dedicated function for realizing. Time is
just data, time-based logic without depending on hidden state.

This increases testability significantly. And it can be fully reasoned about
by Rust, no conflicts with the borrow checker.

    
    
        # for each tick
        # calculate new state
        if (now >= state.toggleLedTime) {
          state.activityLed = !state.activityLed;
          state.toggleLedTime = now + 2000;
        }
        # and realize it
        updatePin(config.ledPin, state.activityLed);

~~~
hexmiles
the updatePin call, shouldn't be inside the if body?, or i'm missing
something?

~~~
jononor
Because it sets what is stored in `state` it does not have to be. It _could_
be inside the if, but there two major reasons why it should not:

1) Testability. With it separated, we can write automated tests for our state
calculation logic that are hardware independent, like 'normal' code. We just
need to create `currentState` data, pass it into our `calculate()` function,
and verify the new `state` returned. This can run both on our host-system, and
on-target. We can even build simulations if we want,
[http://www.jonnor.com/2017/03/host-based-simulation-for-
embe...](http://www.jonnor.com/2017/03/host-based-simulation-for-embedded-
systems/) (shameless plug).

2) Deterministic execution time. Many embedded systems have soft or hard real-
time constraints, action must _always_ be performed within X microseconds. The
more branches we have the harder it is to reason about whether we can keep
this guarantee. By eliminating branches we are executing our 'worst case'
always, giving it much more testing.

~~~
jononor
This assumes that the call to updatePin is idempotent (calling multiple times
with same inputs does not do anything). An update of a I/O pin usually is, but
other hardware (or APIs...) might not be.

In this case the realization of the state should still be in a separate
function for testability. But it can take both `newState, currentState` as
arguments, and then do:

    
    
        if (newState.thingon != currentState.thingon) {
          nonIdempotentUpdate(newState.thingon);
        }
    

If the device is latency critical, then the functions should be tested in all
its permutations. You can use generative techniques like fuzzing to generate
the `state` inputs.

~~~
greenleafjacob
componentDidUpdate for embedded systems?

------
Manishearth
The whole execution contexts thing is sort of based off an assumption that the
unique-&mut stuff has to do with thread safety. It doesn't. In fact, it had no
bearing on thread safety in Rust for the many years before scoped threads were
made possible by removing the 'static bound on Send.

[http://manishearth.github.io/blog/2015/05/17/the-problem-
wit...](http://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-
mutability/) details why it's necessary.

The article does sort of mention this (and I have talked with the authors
about this before) but IMO it underrepresents the importance of it.

One thing I did discuss with one of the authors at one point was swapping
around the guarantees a bit -- allowing multiple &mut for cases not involving
any form of runtime typing (enums and vectors are both cases of runtime typing
-- in a vector the number of elements is runtime dependent). This would create
a significantly different language and be incompatible with the vast majority
of the ecosystem; however, it would still be safe, and has the potential to be
useful. I even started hacking on a fork of the compiler that does this, but
never got the chance to finish it. The idea in essence is not too hard to
implement.

~~~
mathw
I get a 404 on that link.

~~~
Manishearth
Try now.

Octopress does this stupid thing where it generates post URIs based on the
local timezone, and I published my last post whilst in Taipei so all the URLs
broke. I need to figure out how to patch the code to fix this :|

------
jononor
From the abstract:

"...embedded platforms are highly event-based, and Rust’s memory safety
mechanisms largely presume threads. In our experience developing an operating
system for embedded systems in Rust, we have found that Rust’s ownership model
prevents otherwise safe resource sharing common in the embedded domain,
conflicts with the reality of hardware resources, and hinders using closures
for programming asynchronously..."

~~~
alanning
"... In addition, we draw from our experience to propose a new language
extension to Rust that would enable it to provide better memory safety tools
for event-driven platforms."

I believe this part also warrants mention. Not only did they share their
findings as experts in the field, they also did the extra work of detailing
improvements.

My take away is that its painful to make embedded systems in Rust now but its
promising enough to make it worth the effort for the embedded community to
help guide Rust's development.

~~~
BaronSamedi
Not my field but I don't understand the rationale behind choosing Rust. Given
the performance and security concerns wouldn't Ada be the natural choice here?
It may not be the most fashionable language but it seems I read about security
problems with embedded and IoT devices on a daily basis. Why not choose a
language designed for this problem?

~~~
vertex-four
Ada isn't actually particularly safe in any reasonable sense - or rather, it
provides enough escape hatches with little obvious marking that it's costly
(in terms of time spent in code review) to ensure safety. Safety of existing
Ada programs is primarily a result of the engineering practices that went into
them, more than the language itself.

~~~
nickpsecurity
Nonsense. The designer looked at almost every spot where coding mistakes were
leading to crashes or corruption. Then, made those safe-by-default wherever
possible. They're described in modern form in the book below by Barnes. The
effect was confirmed when defects dropped to around half what Mitre et al had
been observing on C projects. There was only one case study I saw where Ada
made no difference whose control group was C++. An anomaly that might have
been same system reimplemented by same people, easily-correct system, or just
both super-talented. So, it is a safe language with only exception being
temporal safety w/out a GC. Rust delivers on that.

[http://www.adacore.com/uploads_gems/Ada_Safe_and_Secure_Book...](http://www.adacore.com/uploads_gems/Ada_Safe_and_Secure_Booklet.pdf)

Also, SPARK Ada can straight-up prove common errors don't exist in that code
with an automated prover. Most engineers would spend quite a bit of time doing
that with pencil-and-paper examination or human-driven provers:

[https://en.wikipedia.org/wiki/SPARK_(programming_language)](https://en.wikipedia.org/wiki/SPARK_\(programming_language\))

------
patrickg_zill
Actually it is quite disappointing to read this paper.

The problems they mention and run into, have already been examined and dealt
with in embedded Schemes, such as BIT, PICBIT and PICOBIT.

PICBIT
[http://www.iro.umontreal.ca/~feeley/papers/FeeleyDubeSW03.pd...](http://www.iro.umontreal.ca/~feeley/papers/FeeleyDubeSW03.pdf)
(see the GC section)

PICOBIT
[http://users.eecs.northwestern.edu/~stamourv/papers/picobit....](http://users.eecs.northwestern.edu/~stamourv/papers/picobit.pdf)
(not as relevant in that hard decisions/tradeoffs examined in predecessor
PICBIT)

BIT (1995) [https://github.com/melvinzhang/bit-
scheme](https://github.com/melvinzhang/bit-scheme)

~~~
nickpsecurity
I'm upvoting it only because the links might interest embedded developers.
You're comment might be missing the point, though, where they're leveraging
Rust for it's safety that requires no GC with its CPU or memory overhead. Non-
determinism, too, unless using a real-time GC.

------
Animats
It seems that, for this very tiny OS, they're trying to do real work in
interrupts. Most OSs do that, including UNIX/Linux. This has the problem that
you can't block the interrupt while waiting for some thread to clear a lock.

QNX, though usually sends a message to a thread in a process when an interrupt
comes in. The interrupt is processed slightly later. QNX has no interrupt
lockouts longer than a few microseconds, because it's not doing much from
interrupt code except sending messages. QNX also has a strict real-time
priority system, so this doesn't hurt interrupt processing latency. So
everything running in a thread uses the regular locking mechanisms, rather
than depending on interrupt lockout.

Something like that might be a better fit to Rust's locking model. Either
that, or the language has to know about interrupts, as in Modula I.

~~~
jacquesm
And that's the way to do it if you value latency rather than simply maximizing
throughput. In real time situations latency is usually the more important
factor.

One thing about those interrupt messages ('signals') in QNX parlance, they get
processed before any other messages.

~~~
Animats
QNX has bounded latency. A standard test in the real time world is to hook a
square wave generator up to an input that cause an interrupt, and run a
program which gets activated by the interrupt and turns on an output. You
watch the input and output signals on an oscilloscope. If there are any
outliers, it's not real time.

(A QNX rep once told me they'd had problems with x86 boards that did things in
"system management mode", stealing CPU cycles from the real work and messing
up the response time. They had a quiet blacklist of unacceptable embedded
boards.)

Linux has a long history of long interrupt lockouts. It's gotten better.[1]
The phrase "95% real time" is used. Kernel drivers can lock out interrupts, so
you have to watch for drivers that are not real-time qualified.

That's the trouble with doing real work in an interrupt handler. Usually you
have to activate a thread after an interrupt anyway, so there's no overall
performance gain in doing work at interrupt level.

[1]
[http://events.linuxfoundation.org/sites/events/files/slides/...](http://events.linuxfoundation.org/sites/events/files/slides/Real-
Time-Linux-Comparison-Bridgers-ELC2015.pdf)

~~~
jacquesm
> A QNX rep once told me they'd had problems with x86 boards that did things
> in "system management mode", stealing CPU cycles from the real work and
> messing up the response time. They had a quiet blacklist of unacceptable
> embedded boards.

Ah, so that was the source of QNX informally suggesting we stick to certain
brands of motherboards. I always wondered about that, since it _seemed_ to
work ok. I guess they didn't want to alienate those vendors outright so they
chose to informally promote the ones they were sure were good.

95% real time is funny. It's real time or don't bother. I worked with and for
the QNX dealership in NL long ago.

~~~
Animats
Yes. Some boards emulated a hard drive using flash, with BIOS code stealing
cycles from the main CPU with code in system management mode. This caused what
looked like random CPU stalls at the OS level.

------
steveklabnik
This paper is pretty old at this point; today the project has evolved into
[https://www.tockos.org/](https://www.tockos.org/)

The authors have been meaning to write a follow-up paper describing how they
overcame the problems here, but haven't just quite yet. That slightly
different design is what became Tock.

~~~
dom96
Was the "Execution contexts" feature outlined in the paper implemented?

~~~
steveklabnik
The language was not changed; there were some misunderstandings about the
finer points involved here, and their design was changed instead. So overall
the idea was implemented, but in already-existing Rust rather than changing
the language.

------
rdtsc
> First, a linear sequence of asynchronous operations does not appear in a
> linear piece of code. Instead, it is spread across a series of many small
> functions.

Largely unrelated to Rust, but I want to point out that that is a powerful
insight. It took me a while to internalize that. Once you see it, you'll
understand why asynchronous, callback-based programming is usually inferior to
other models (green threads, channel, lightweight processes and yes, threads).
There are some exceptions for course, say when callback chains are short, like
a web proxy, in the an embedded world, or a short demo code someone pasted on
SO.

Another way to think about it is that "an sequence of callbacks forms an ad-
hoc execution context, like a thread almost, except it is spread all through
various functions, not very well specified, not in one single place".

Now perhaps in this style you can avoid CPU locks, but if callbacks are used
for concurrent IO, you are back to the the need to protect global data from
concurrent modifications. For example, if a callback chain starts, say cb1,
cb2, cb3 but stops at cb2 while it was modifying some global data. Then
another user request (or other such event) comes and it starts running cb1,
cb2, cb3. It will possibly start modifying the same global data before it was
in a consistent state. Next thing you know is you need something like a lock
or semaphore. And sure enough such a thing exists, here is an example in
Twisted:
[http://twistedmatrix.com/documents/8.1.0/api/twisted.interne...](http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.defer.DeferredSemaphore.html)

------
rwmj
_" Garbage collection, for example, introduces nondeterministic delays.
Automatic memory allocators complicate common kernel optimizations such as
slab allocation"_

No no no. Badly designed GC, for sure. But for any modern, thought-through GC
this is not true.

And in any case, the Linux kernel uses ref-counting all over the place, which
is the worst form of GC - slow, hard to debug, full of potential places to
trip you up and cause a leak or a security bug, ...

~~~
learc83
Can you give me an example of a language with GC that doesn't introduce non-
deterministic delays?

~~~
dom96
The Nim programming language's default GC[1] allows you to specify an upper
bound delay time that it should not exceed.

1: [https://nim-lang.org/docs/gc.html#realtime-support](https://nim-
lang.org/docs/gc.html#realtime-support)

~~~
learc83
According to the docs "These procs provide a 'best effort' realtime
guarantee...Tests show that a 2ms max pause time will be met in almost all
cases on modern CPUs"

So suitable for soft realtime like games, but not actually non-deterministic.

------
shepmaster
I've been curious to see how something like tokio[1] and futures[2] would work
on a microcontroller. These both seem like good fits for event-based work.

[1]: [https://tokio.rs/](https://tokio.rs/)

[2]: [https://github.com/alexcrichton/futures-
rs](https://github.com/alexcrichton/futures-rs)

~~~
awelkie
There is some discussion on that here: [https://github.com/rust-
embedded/rfcs/issues/23](https://github.com/rust-embedded/rfcs/issues/23)

------
openasocket
This is a really good article, and I admire the authors for the work they put
in. But I can't shake the feeling that what they want to accomplish could be
done with a library rather than a language extension. Can anyone explain why
their issues couldn't be resolved with a particularly clever mix of traits and
macros?

~~~
steveklabnik
It can, and it was. See my other comments in this thread.

------
Dowwie
"Our proposal for execution contexts in Rust is similar but considers
execution threads, instead of object graphs, as the unit of isolation. Rust’s
low-level interface, safe memory management, and large community make it a
particularly good fit for operating system development. If future language
development can address the challenges we have demonstrated, Rust should be
well positioned to support the next generation of correct embedded operating
systems."

------
IshKebab
Hmm the callback issue is exactly what I have run into while trying to wrap
libsoundio. The user has to provide a read or write callback, but there's just
no ergonomic way to actually do that nicely. At least I can't think of one.
Maybe I just don't know the right combination of Arc, RefCell, etc.

------
ue_
Is the title an allusion to Proudhon's succinct quote: "Property is theft!"?

~~~
geofft
_Proudhon 's_ quote? It's everyone's quote!

~~~
gmfawcett
Well played. :)

------
kibwen
Mods, can we get a [2015] in the title? Both the OS described within and Rust
itself have progressed significantly since this paper's original release. :)

EDIT: In particular: the people behind this recently began offering an IoT
development board, "Hail", that runs Tock (the OS from the OP):
[https://www.tockos.org/blog/2017/introducing-
hail/](https://www.tockos.org/blog/2017/introducing-hail/) (discussion on
/r/rust, with authors present:
[https://www.reddit.com/r/rust/comments/61257c/introducing_ha...](https://www.reddit.com/r/rust/comments/61257c/introducing_hail_an_iot_development_board_that/)
)

EDIT 2: I see that the title has been updated, thanks. :)

------
belovedeagle
Sorry, but you're rather sure of yourself when it's not clear whether you
understand the constraints kernel writers are operating under. On many
platforms simply enumerating valid RAM addresses where a heap may be
constructed, let alone mapping them, is going to require considerable effort.
It will inevitably be necessary to use memory somewhere off the stack to
achieve that effort. Yet it will also be very difficult to write a custom
allocator for that environment.

One choice, for example, would be to rely on the bootloader's handling of .bss
sections to allocate a temporary 'heap'. Haven't tried it, but it probably
gets the job done well enough to bootstrap a real dynamic allocator. The point
is, though, you can't just say "throw lazy-static and a custom allocator at
the problem". Note, for example, my proposed solution requires writing the
custom allocator entirely outside of Rust, since it can't access any Rust
statics. It also requires two implementations of the custom allocator - one
for bootstrapping, and the other after bootstrapping. These will have to be
dynamically selected between at runtime since (AFAIK) Rust is hardly going to
support static switching between them. Then you have to consider
interoperability between them.

Finally, not all kernel designs are prepared to do any kind of dynamic
allocation at all: microkernels which hand all available physical memory to
userspace for management there cannot dynamically allocate (except, perhaps,
for O(1) of allocation in a small, _static_ heap - but at that point, what's
the point?).

~~~
bascule
You are making the exact same strawman argument as the post I was responding
to. I gave a specific example of how lazy-static solves a specific problem
mentioned in the paper, and as far as I can see nothing you have said refutes
that.

Clearly YMMV as to whether or not you can support a kmalloc()-style API. If
you can, allocators will be great for you, and if not they are worthless.

I didn't mean to imply Rust has one-size-fits-all solutions to these problems.
Rust provides a lot of solutions to various problems, and it's up to you as
the Rust developer to figure out which ones are applicable to your problem at
hand.

~~~
belovedeagle
So what you're saying is, "if your use-case doesn't support kmalloc then shut
up and go away, no statics for you!"? That's ridiculous, inappropriate, and
generally unhelpful.

~~~
bascule
No, again that's a strawman. To explain it for the third time: I was
addressing a specific complaint made in the paper and providing a solution. So
far I have not seen any counterarguments against this specific solution to
this specific problem.

Later I pointed out allocators are a great solution if you're able to use
them. Again, if you can't, I'm afraid you're out of luck and I apologize this
solution doesn't fit your needs.

You can, of course, still make statics, you just won't be able to use Box to
conveniently allocate buffers in an automatic and type-safe way.

Also check out UnsafeCell if you want to assert as the developer on behalf of
your program that you're accessing data in a memory safe way.

------
EGreg
Property is theft!

