On Hubris and Humility

scoutt · on Dec 7, 2021

I missed the launch and I didn't read the code. I've worked with RTOSes in the past and I feel the pain, but memory corruption and the like are not my primary concerns while writing code. From the top of my head I might ask:

Interrupts: they are dispatched, so does the program has to go back to "main" loop to handle an interrupt or are they executed in the peripheral interrupt context? What is the "time-to-serve-an-interrupt" in Hubris (real-time wise)? How is it guaranteed?

Also, how shared interrupt are handled and dispatched (interrupts like EXTI10_15)? Is there something that deals with (or prevents me from) requesting an interrupt for PA10 AND PB10?

Drivers: are isolated, this means that my MCU with 4 UARTs needs 4 instances of the same code (each for every UART)?

Isolation: there are a lot of ways to make an embedded system misbehave other than corrupting memory, for example, what about a situation in which a task leaves (one, some, all) interrupts disabled? Unless the OS won't let me fiddle with interrupts, which is bad (for a driver that goes into a "everybody, shut up! I'm reading a Dallas button" situation, with interrupts disabled).

Is there something in Hubris that protects me from starvation? Like an interrupt being fired constantly, making the system 100% busy? This is not a crash, and other tasks may not even notice about the misbehavior since they might not be even scheduled.

Also: the word "priority" is mentioned once in the article. What is the scheduling policy?

What resources is Hubris using? SysTick? PendSV?

steveklabnik · on Dec 7, 2021

You may enjoy reading through https://hubris.oxide.computer/reference/

To answer a few of your questions from the stuff I know offhand before I've had my coffee:

> Drivers: are isolated, this means that my MCU with 4 UARTs needs 4 instances of the same code (each for every UART)?

You are free to organize this however you'd like, each approach has pros and cons, up to you. tasks are isolated from each other, the scope of a task is for you to decide.

> what about a situation in which a task leaves (one, some, all) interrupts disabled?

Users can't disable interrupts, they can only mask or unmask receiving notifications for said interrupts. This means one task cannot shut off another task's interrupts either.

> Also: the word "priority" is mentioned once in the article. What is the scheduling policy?

Tasks have priorities. Smaller is higher priority. Currently we have up to 256.

At any time, the highest priority task that is runnable is being run. This means that within priority levels, scheduling feels cooperative, as we don't time-slice, but in between them, it's preemptive; if a higher priority task becomes unblocked, then it will run next.

> What resources is Hubris using? SysTick? PendSV?

SVcall, SysTick, and PendSV are all used, yes. Mind you these are Arm specific things that wouldn't be used on other platforms, of course.

scoutt · on Dec 7, 2021

Thank you. Blocking IPC is a very interesting approach but you have to carefully plan ahead, even before writing the first lines of code.

I think the above article missed the "notifications" part. Without notifications, a blocking IPC wouldn't have sense in an RTOS: a task would not be able to process an interrupt because it's blocked while "sending".

I guess the existence of notifications is there to provide methods for an UART driver/task that blocks waiting for interrupts. When the interrupt arrives, the driver deals with it and instead of sending the data directly to a listener, it just "notifies" incoming data to another task (pretty much the "usual" way of doing things).

This makes "UART" to sit on its own task, setting the right priority for it, implement a circular buffer, etc. Complexity quickly escalates for a (for example) "sensor fusion" driver, which has to manage 3 other drivers (accelerometer, gyro and magnetometer drivers), each one on top of another drivers (SPI, I2C, etc.).

But I guess that's a price to pay.

> Users can't disable interrupts, they can only mask or unmask receiving notifications for said interrupts.

Sorry. I interpreted that drivers were also tasks (or they live in tasks?), so being able to interface the hardware I guess setting PRIMASK=1 could be an option too.

Perhaps battery-powered systems is not your application target, but I would be interested to see how __wfi() works within the Hubris context.

bcantrill · on Dec 7, 2021

As others have mentioned, QNX is very much a model here, with Hubris notifications looking a lot like QNX proxies. In my experience (with the disclosure that I worked at QNX back in the day), that model was -- contrary to your implication -- remarkably clean for modeling real-time systems. Indeed, it was so clean that I was genuinely confused why so many other systems had adopted a much more monolithic model, which so clearly yielded a much more complicated system. (In hindsight, QNX being proprietary very much limited its impact.)

I would strongly encourage you to check out the docs that Steve pointed you to, and of course, the source itself! And nothing beats actually throwing it on some hardware and messing around with it, which should help clarify the model -- and why we believe it to be a good fit for real-time applications.

scoutt · on Dec 7, 2021

I’m not implying it’s not clean. It seems very clean to me. I said that it may grow in complexity. Thank you both for the answers.

ta988 · on Dec 7, 2021

From what I understand yes drivers are duplicated because they are integrated in the tasks.

xbpx · on Dec 7, 2021

I've ordered the STM32F3 & thinking of ordering the audio dev STM32F4 board to try out Humility.

I read all the docs and I understand Humility as an OS, it's extremely simple (for a OS) and it is going to be fun to develop against. Or at least, I'm excited to give it a shot.

My initial ideas are audio controllers that parse sensor streams and convert that to OSC signals but I have no idea what implementing the network stack will look like as I need UDP from Rust on a microcontroller.

If anyone has experience and wants to share drop me a line at @ben:matrix.graythwaite.ca I'd love to chat about it

bcantrill · on Dec 7, 2021

Awesome -- if you want to play with the STM32F4, I recommend the STM32F411VE Discovery board[0], which seems to be in ample supply (no small feat!), works with Hubris out of the box, is pretty cheap, and has a built-in ST Link. Also, note that Humility is the debugger, but it's an entirely reasonable conflation, as much of playing with Hubris is in fact playing with Humility! We don't have too many drivers for the F4 (our SP is based on the H7), but it's a well-documented and understood part/board, so there should be plenty of fun to be had!

[0] https://www.st.com/en/evaluation-tools/32f411ediscovery.html

a9h74j · on Dec 7, 2021

> no small feat

What a time to be sourcing for new hardware. We're rooting for you.

P.S. The Oxide design motif reminds me of 96-pin Euro/VME connectors.

xbpx · on Dec 7, 2021

Ha indeed, s/Humility/Hubris.

Thanks Brian for the link. Looking forward to watching where Open Firmware community / Oxide / Rust take embedded. Exciting stuff.

pitterpatter · on Dec 7, 2021

There's an open PR for ethernet support (albeit with a different board) that uses the smoltcp crate to provide a tcp/ip stack: https://github.com/oxidecomputer/hubris/pull/158

xbpx · on Dec 7, 2021

Oh awesome thanks, I'll take a look

mcmatterson · on Dec 7, 2021

Fascinating talk. The overall approach here strikes me as being extremely influenced by QNX's design. Send/Receive/Reply as a messaging primitive is too often overlooked, and provides incredibly powerful that (as Cliffe mentions) renders an enormous amount of scheduling complexity as moot.

Anyone who's done the UWaterloo trains course will recognize these patterns immediately, and (IIRC) interrupt dispatching is was done in a similar manner there as well.

Finally, the supervision patterns here strike me as being very similar to those within the BEAM, and remind me of the infamous quote from Robert Virding (http://erlang.org/pipermail/erlang-questions/2008-January/03...). Obviously a necessary reimplementation here, but humorous nonetheless.

Great project, great talk.

mcmatterson · on Dec 7, 2021

I couldn't stop thinking about this, so I went back and dug up an ancient (20yo+) copy of my OS implementation for the trains course (mat(t)OS). Sure enough, we did indeed dispatch the upper half of interrupts to processes, albeit via a dedicated blocking syscall rather than send/receive/reply semantics. Bottom half handlers (which were implemented behind task gates and had persistent stacks) just did the standard bottom half stuff: disabling interrupts & managing state so we could properly unroll after the interrupt had been handled.

What a throwback!

Interrupt code is at https://gist.github.com/mtrudel/c29fa60e5b2f3b6fdc46a9e3c65d.... I've been meaning for years to clean this stuff up and resurrect it. Maybe this is the kick in the ass I need to finally do so!

steveklabnik · on Dec 7, 2021

QNX is an explicit influence, yes.

bcantrill · on Dec 7, 2021

And, transitively, Thoth![0]

[0] https://en.wikipedia.org/wiki/Thoth_(operating_system)

mcmatterson · on Dec 7, 2021

https://www.youtube.com/watch?v=031vKBPk5eA

sabrehagen · on Dec 7, 2021

What program was used to create the slides?

steveklabnik · on Dec 7, 2021

In my understanding, Cliff wasn't finding something that fit the purpose well, so he wrote his own, just for this.

ZeroGravitas · on Dec 7, 2021

I came to ask the same question after watching the video the other day, they were lovely.

jabl · on Dec 7, 2021

This looks like a really neat design for many microcontroller level embedded systems.

However, it mentions that one of the usecases is for the oxide BMC. AFAICT, a "modern" BMC is typically a more or less full featured OS running quite a lot of code, like a TCP/IP (v4 & v6) stack, IPMI and Redfish implementations (which then requires a web server with TLS library etc etc.) and so on. Without the capability to make use of the existing open source user space code in these areas, this seems like a quite tall order to write and maintain? And how does the "static tasks" system without the capability to launch tasks dynamically, handle things like multiple concurrent IPMI/Redfish users?

panick21_ · on Dec 7, 2021

The whole point of their redesign is to NOT have all that stuff in the BMC. So an explicit design goal seems to be to not need Redfish.

The idea is to have a minimal BMC that only does very few things and instead puts as many things as possible to the host OS.

steveklabnik · on Dec 7, 2021

Since I work on this project, to chime in: yes, the siblings are correct. We call it a "service processor" and not a "bmc" for a reason; the SP fits into the server in a way similar to a BMC, but we are not doing all of that. We have an opportunity to re-think these things, and we think it's worth it. As you've noted, the surface area of a traditional BMC is huge.

__d · on Dec 7, 2021

Oxide's Service Processor isn't a PC-style BMC, although it fulfills the same role in managing the hardware. So Hubris doesn't need to support eg. Redfin.

s-phi-nl · on Dec 8, 2021

A bit late to the party, but you have written excellent documentation at https://hubris.oxide.computer/reference/. It immediately answered the questions I had after reading 'Animats several comments on IPC. This is particularly impressive since "right now we are laser focused on shipping Oxide's product": I, at least, often don't prioritize documentation.

MauranKilom · on Dec 7, 2021

Note: This is not a philosophical article, but rather a transcript of a conference talk concerning a kernel written in rust (https://github.com/oxidecomputer/hubris) and an associated debugger.

basch · on Dec 7, 2021

>The conference version of the talk has a constantly animated background that makes the video hard for some people to watch.

A completely unrelated anecdote.

When I clicked the link, I was greeted with a beautiful black and white and blue page, perfectly indented.

I scrolled and looked at the ascii and thought how great it was to just see a web page again.

Then I looked at the source to see if they had designed any extra special formatting and was greeted with <style id="brave_speedreader_style"> and realized I had Brave Speedreader on. I was a little disappointed when I turned it off and the hyperlinks turned pink.