Hacker News new | past | comments | ask | show | jobs | submit login
Protothreads: Lightweight Stackless Threads in C (dunkels.com)
163 points by signa11 on May 16, 2019 | hide | past | favorite | 42 comments

If "running multiple tasks on an embedded system that share a stack" sounds interesting to you, Rust's RTFM is actually quite clever[0]. It only works on ARM MCUs right now, but tasks are IRQ triggered, have a priority, memory safe, deadlock-free and have very low overhead. Scheduling is done in hardware via the interrupt controller.

[0] https://github.com/japaric/cortex-m-rtfm

I wrote an entire I2C stack for an 8-bit microcontrolller using this protothread library.

I would highly recommend using an RTOS instead. ;)

any chance that code is up somewhere? Sounds interesting

My guess is following: some people will put in doubt the rationale of using threads for embedded development as such.

See, MCUs are much more interrupt driven than today's PCs, when it comes to I/O, and don't have to run interface logic as it's usually done in hardware.

You can make a widget feel quite snappy and interactive even with very simple linear code.

For the few tasks genuinely requiring real parallelism, or programming fancy multicore MCUs, it will likely be that you will need more than just thread separation, and it is better to do it properly than use hacks — in other world, use process separation and let complexities be handled by the OS.

> using threads for embedded development

Reminds me of something -

I used to work for an embedded shop where they used exception-like error handling that was implemented through some basic C macro fuckery. It made the code markedly more succinct, had next to zero overhead and it was just a fantastic piece of work in general. 20 years later and I still remember it fondly.

You can think of it as the reactor pattern. You only have one physical CPU so all you are doing is churning through callback-based event handlers.

The callbacks in this case might be the literal interrupt handlers (in which case your dispatcher "loop" is actually the hardware). Or else you make your interrupt write a little bit of state that then gets dispatched by your actual event loop.

Either way, you write responsive software by doing everything in short-running callbacks.

I find it amusing that (some) sophisticated GUI and server framework for big computers re-invent this.

One important difference is that in the MCU world interrupt handlers need to be really short and fast, because often various classes of interrupts are blocked from occurring while an interrupt handler is running. A common pattern is for the interrupt handler to just set some bits to say “this occurred” and return. The bits are then checked and handled in the main loop.

> doing everything in short-running callbacks

We call them "interrupt handlers." :)

It's amusing how you and alexhutcheson both corrected me in opposite directions.

For reasons Alex explains, interrupt handlers have to be really minimal. So you often use them to set state that makes the main loop branch off into the "real" handler.

But even then, you often don't want to wait a long time in that handler either. That too becomes a short-running event handler (though not necessarily a callback). Just not as extremely short running as it would have to be if did it in the interrupt handler.

> My guess is following: some people will put in doubt the rationale of using threads for embedded development as such.

I'm reading through the comments here and it is difficult to even address some of them, because there is apparent confusion as to the meaning of the word "thread".

In embedded you often do not need or even want threads. What you need is a way to deal with events (mostly interrupts) without losing your sanity. A reasonable approach is to have a state machine that reacts to events, and all interrupt handlers only inject events into the state machine. The state machine is the only "thread" that runs: the CPU runs it until its done processing the current event, and then goes to sleep. Interrupt handlers do as little as possible, basically just drop events into a queue (or even a single slot) for the state machine to process.

This works fine for simple state machines, but quickly gets tedious and complex, especially if you have tasks that involve several sequential events (initialize peripherals, setup ADC, read from ADC, calculate parameters, setup an output device based on parameters computed from ADC reads).

A very good way to hide the state machine is to use CSP (you might have seen it in the Go language or in Clojure as core.async). This lets you write the same thing using certain primitives (channels, blocking and non-blocking reads and writes). Note that until this point this has nothing to do with threads! Clojure's core.async works just as well in Clojurescript, which compiles down to JavaScript and runs in a single thread!

I never cleaned it up for release, but if you email me I could send through some interesting bits of it.

I2C and the peripheral we were dealing with is quite stateful, and writing the access code in a synchronous fashion wasn't possible due to overrunning other constraints. It seemed like a good idea at the time. The sibling comment has it pretty right... it would probably have been quicker in the end to go to an RTOS.

That said, I'm still fairly proud of the I2C state machine that runs in the interrupt handler– it was completely separate from the protothread code that layered over the top.

It's in a few microcontrollers in products we do, the funkiest is the smallest standard AVR package you can get, which imitated an obsolete tiny I/O expander chip we had to replace on a 6mm wide PCB. That was using I2C slave mode, but we could then run the entire sensor interfacing on that board rather than using a separate micro.

These seem to work with some clever (ab)use of switch statements: http://dunkels.com/adam/pt/expansion.html

That was the original implementation (and still used for non-GNU C).

On GCC and Clang though, labels-as-values allows a cleaner implementation: https://github.com/Akagi201/protothreads/blob/master/include...

Woo. I wrote the original implementation for that 14 years ago. I discovered local labels when we tried to write some code that used the had multiple yields in the same line. Due to how the switch based implementation worked, this wasn't possible, so I looked for other solutions.

Wait, how does this work with local variables? Wouldn't the scope be destroyed when you returned, how can it restore the stack if it doesn't save information in the local continuation?

Indeed, there is a problem with local variables. The example relies on using global variables. If you want multiple threads executing the same function, it is not going to work. Also there is no solution for calling other functions (and have them switch in the middle). A long time ago, I worked on an implementation that does not have these limitations. See: http://www.iwriteiam.nl/Ha_cmt.html

> The example relies on using global variables.

It could as well rely on local variables with static lifetime, though.

> If you want multiple threads executing the same function, it is not going to work.

That's true for the example, but in general the solution is to pass the protothreads the thread-local state in a struct.

> > The example relies on using global variables.

> It could as well rely on local variables with static lifetime, though.

For those following along, static local variables are the same thing as static global variables[1], C just enforces a scoping restriction as to who can access it at compile time. Most compilers will emit the same BSS section entries for both (though they will mangle the symbol name). It's basically compiler syntactic sugar for most compilers (though I'm sure the actual standard itself allows for some weird exotic use cases where they get handled completely differently).

[1] Static global variables are symbols that are restricted to a compilation unit (as opposed to regular global variables that can be linked to from another compilation unit). If you run `nm` on them, they'll all have a lower case letter indicating it's type.

If I understand it correctly, it doesn't. That's why they're called stackless threads. Variables local to the stack frame lose their state, so any internal state needs to be stored in a global.

The article mentions Duff's Device. And yes it works.

IIRC when Simon Tatham first proposed the mechanism, he suggested it should never be used in production, and whoever does so should be fired!

Quite the opposite, actually. He said that he will probably be fired, but should still insist on it being the right way: "Any coding standard which insists on syntactic clarity at the expense of algorithmic clarity should be rewritten. If your employer fires you for using this trick, tell them that repeatedly as the security staff drag you out of the building." https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

Excitingly, we soon may be able to have both (if you can compile your C code with a C++ compiler). Coroutines are in the working draft for the C++20 spec, so they may soon be provided by the runtime (and thus integrate into all of the other language features and syntax) instead of requiring hacking it up with string macros: https://en.cppreference.com/w/cpp/language/coroutines

If you want coroutines before upgrading to a C++20 toolchain, you can also try:

- boost::Fiber for C++: https://www.boost.org/doc/libs/1_70_0/libs/fiber/doc/html/in...

- boost::coroutine2 for C++: https://www.boost.org/doc/libs/1_70_0/libs/coroutine2/doc/ht... (lower level than boost::Fiber, and doesn't include an internal scheduler)

- libtask for C: https://swtch.com/libtask/

In the C++ context, this doc summarizes the distinction between "fibers" and "coroutines", although the terminology is not consistent across language ecosystems: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n402...

D also has fibers, and good C/C++ integration, so that could be an option too.


Adam is a CEO of Thingsquared =D

I hate protothreads with a passion.

Not because of what they do (it's a neat trick), but because it teaches bad development practices to programmers.

I had a junior programmer "correct" me on some embedded code saying that all my local variables should actually be declared static, because in their experience "static is the way to go".

This "embedded" system was actually running Linux, using proper threading semantics.

That's a matter of using the right tool for the environment, though, isn't it? Someone looking at the world from the small-system, bare-metal embedded development perspective might express similar feelings about garbage collection, for teaching junior programmers to be unacceptably sloppy about memory management.

I think it is simply to be expected that junior programmers will frequently pick the wrong tool for the job, for any job in any context, as they have not yet had time to master many tools and do not yet have enough experience to understand the edges of their ignorance. (Presumably we would start thinking of them as peers and calling them "senior programmers" at some point along this journey of growth...)

Inspired by Protothreads, I implemented a similar framework 5 years ago, well, not exactly a framework, just several lines of macros, which could be used in general purpose not just embedded system (I am not sure if Protothreads can be used in general purpose). I used it in product environment with libuv, here is the github link: https://github.com/huxingyi/c-block

Related: Go-style concurrency for C, http://libmill.org/

Although note that all libmill coroutines are mapped N:1 to OS threads, while goroutines are mapped M:N to OS threads. In other words, libmill won’t let you have run 2+ coroutines simultaneously on different OS cores, but Go will allow that.

In fact, this means that writing concurrent code is easier as you don’t have to worry about synchronizing access to shared state. It also means scaling applications is straightforward, since you are forced to design for horizontal scalability from the get go (similar to deploying node.js). Ive always found it strange that Go has both channels and locks as synchronization primitives.

Scaling out is natural when you have a large number of tasks, but it's not always an option when you have one big task. Maybe you need to sort a giant collection, or hash a giant input, or multiply a couple giant matrices. Threads with shared memory can get to work on those problems easily, but separate processes can't, at least not without trying to reorganize the problem to avoid doing tons of IO.

unless you start doing blocking read/write...

Libdill IO is nonblocking and calls returns a channel. Reading from the channel yields the coroutine, similar to the await statement in some languages.

oh cool ! does that happen for read/write to non-socket fd’s as well ?

Yeah, see the file descriptors section on this page http://libdill.org/documentation.html

i did before posting here, and could not find anything that might have indicated that i.e. non-socket-fd reads end up with coroutine like semantics...

is there an iOS port you'd know of - couldn't find one

How does it work?

The following page describes the implementation details: http://dunkels.com/adam/pt/expansion.html

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact