
A More Natural Stackless Coroutine in C That Maintains Local Variables - liuliu
https://liuliu.me/eyes/a-more-natural-stackless-coroutine-in-c-maintains-local-variables/
======
billziss
I have an implementation of nested coroutines for C in this file:

[https://github.com/billziss-
gh/winfuse/blob/master/src/winfu...](https://github.com/billziss-
gh/winfuse/blob/master/src/winfuse/coro.h)

This uses the macros coro_block to introduce a new coroutine block, coro_await
to invoke a nested coroutine, coro_yield to suspend the coroutine block and
coro_break to exit it.

The implementation is fairly simple in about 17 lines of code (omitting file
headers/comments). I use it extensively in the implementation of a Windows
kernel driver that implements the FUSE protocol (work in progress).

[https://github.com/billziss-gh/winfuse](https://github.com/billziss-
gh/winfuse)

EDIT: For an example use see the implementation of READ:

[https://github.com/billziss-
gh/winfuse/blob/d44307105b0f1a56...](https://github.com/billziss-
gh/winfuse/blob/d44307105b0f1a5683a6cb1ef84d2f722a4f3211/src/winfuse/fuseop.c#L1244-L1301)

------
AnanasAttack
I can't be the only one who finds it cleaner to just use structs with state
and pointers to them, than all such macro and typedef hackery

~~~
inetknght
Structs with state and pointers to them? Might as well just use C with
classes.

------
cryptonector
The author should take a look at async.h[0], which is extremely simple. I
don't think you need a scheduler, but you do need what Rust calls an
"executor", which is a library of I/O functions and an event loop.

[0] [https://github.com/naasking/async.h](https://github.com/naasking/async.h)
[https://higherlogics.blogspot.com/2019/09/asynch-
asynchronou...](https://higherlogics.blogspot.com/2019/09/asynch-asynchronous-
stackless.html)

~~~
liuliu
My real use case involves transferring that event loop from existing thread to
another spawned pthread, thus, having a "scheduler" / "executor" is easier.

I think async.h is very similar to Protothreads, both are very lightweight and
not as opinionated.

~~~
cryptonector
async.h does not preclude threads, as long as you ensure that no more than one
thread is running any given co-routine at any point in time.

async.h is indeed similar to protothreads, but simpler. I like its mechanism
for co-routines calling co-routines.

As for scheduling, there's a lot of history of M:N scheduling. M:N scheduling
hasn't worked out for, e.g., Solaris, Linux, Rust, and some others. Thread
libraries tend to be 1:1 nowadays. Erlang uses M:N and I see claims that it
works well enough, but I am not familiar enough with it to understand if
that's true or why.

M being the number of user-land threads and N being the number of threads
allocated in the OS kernel to run those user-land threads. M:N means having
them be different, with M>N and a user-land scheduler to choose which OS
thread runs which user-land thread (which looks a lot like a co-routine). It's
easy end up with pathological conditions and leaving performance on the table.
But I suppose a lot depends on just what exactly the workload looks like. An
I/O bound workload with no long CPU runs (or lots of yielding during them)
will probably work well enough with M:N scheduling, but 1:1 threading with as
many threads as CPUs should work even better.

~~~
mamcx
I think Erlang is successfully where other not because it fully commit to the
idea. Is not just "give me some way to do M:N", everything is around actors.
Also, actors. Is not just "some way to schedule stuff" is a full paradigm.

~~~
cryptonector
Right, small actors == small (stack, code), preferably stackless coroutines
that hopefully don't do much CPU hogging and just lots of cooperative behavior
-- I'm ready to believe that M:N does well for that when the coroutines have
small stacks or are stackless, and that you don't even need a scheduler for
them if there's no yield operation as then you need only ever "schedule"
coroutines whose pending I/O events have occurred.

I.e., coroutines are just a C10K method, and you must end up with more of them
than you have OS threads and HW CPUs.

If, e.g., Bryan Cantrill and others who claim M:N is bad are wrong, they're
only wrong -I think- if they extend the claim to stackless / small stack
coroutines. But Bryan Cantrill's seminal paper on the badness of M:N threading
was not about stackless coroutines, but about very stackful coroutines
(pthreads).

M:N is necessarily bad if the M things have large stacks (which was and is the
case in, e.g., pthreads).

M:N is necessarily good (C10K) if the M things are extremely light-weight.

Everything we see in this space, from Scheme-style continuations, partial
continuations, to hand-coded CPS, to stackless co-routines, async/await
primitives that allow compilers to do partial CPS conversion / coroutines --
all these things are about program C10K, which is about a) using async I/O,
and b) compressing program state / reducing overhead to serve the most
possible clients.

As a program state compression technique, nothing beats hand-coded CPS, but
it's utterly not user-friendly. Scheme-style continuations mostly shift
program state from the stack to the heap. The sweet spot is async/await.

------
t-writescode
I understand the excitement of proving you _can_ do something.

Wouldn’t a language suited to async, or a framework implementing that from the
ground up, complete with all its natural ugliness, be ... better than trying
to shoe-horn async into such a low language?

~~~
liuliu
It totally would! I think that is where C++20's coroutine proposal (or Rust)
makes a lot of sense (and all these negative abstraction costs!). There are a
lot of fun when working on though, and the end result is not bad (I translated
my old stateful coroutine based code to this one in an afternoon:
[https://github.com/liuliu/ccv/commit/03c84ee1e3344b8458d8502...](https://github.com/liuliu/ccv/commit/03c84ee1e3344b8458d8502010b3ea73417ede75#diff-0576585e7e3ffa72c74264debbc013ad))

