
C++ Coroutine Theory (2017) - squiguy7
https://lewissbaker.github.io/2017/09/25/coroutine-theory
======
pokoleo
Waterloo's CS 343 (concurrency) course uses coroutines as a stepping block
towards understanding concurrent programming.

Notes from the course are fantastic, coroutines start here:
[https://www.student.cs.uwaterloo.ca/~cs343/documents/notes.p...](https://www.student.cs.uwaterloo.ca/~cs343/documents/notes.pdf#page=33)

~~~
3uclid
Would you recommend CS 343?

I'm disappointed the course teaches μC++ rather than referencing C++11
threading support.

~~~
legojoey17
It has been one of my favourite courses because it really dives into the
fundamentals and implementation of how concurrency models work, is very
learnable but still applicable.

The primitives uC++ make it very easy to form the concurrency models that are
present in other languages and these days the prof does exactly that and shows
exactly how to implement common models, such as channels, actors, and a few
others.

Everything done is very mappable to how other languages provide concurrency.

------
psyc
Do C++ coroutines give you a sane call stack when chained? I’ve only worked
with proprietary implementations that do not, and it is an utter catastrophe
for maintenance once they have infested a large code base.

~~~
wahern
There's something truly perverse about the way languages are recapitulating
the evolution of call stacks in the form of coroutines[1] and promises.

Erlang, Go, Lua, and Scheme get this right. Stackless Python never caught on
:( But Java may get proper stackful coroutines in the near future.

[1] Stackless coroutines, which means you can't yield across nested function
invocations. The unfortunately named Stackless Python actually implements
stackful coroutines. "Stackless" in Stackless Python refers to not
implementing the Python call stack on the C ABI stack. Stackful coroutines are
basically threads (sometimes called microthreads or fibers to distinguish from
C ABI/kernel thread construct) with a language-level construct for directly
passing control to another thread.

~~~
Rusky
There's nothing about stackless coroutines that means you can't have good
stack traces. For example, C# already does it, at least to some degree.

Stackful coroutines are clearly a viable tool, but they don't work in all use
cases. They require either segmented stacks, a precise GC, or memory usage
comparable to kernel threads. They are tricky to implement correctly on
Windows. Etc.

In my ideal world, we'd have stackless coroutines with great debugger support
everywhere, with languages free to experiment with the syntax- explicit
suspension points, implicit suspension points, effect polymorphism to make it
look like you're yielding across nested function calls, etc...

~~~
wahern

      > They require either segmented stacks, a precise GC
    

Which is _exactly_ what stackless coroutines and promises do in a very
roundabout manner. Some languages are moving toward annotations to automate
chaining, but the problem with having to explicitly annotate coroutines is
that you no longer have (or can have) first-class functions; at best you now
have multiple classes of functions that only interoperate seamlessly with
their own kind, which is the opposite of first-class functions. Plus it's much
slower than just using a traditional call stack.

Implementing transparently growable or moveable stacks can be difficult, yes.
Solving C ABI FFI issues is a headache. And languages that stack-allocate
variables but cannot easily move them are in a real pickle. Though, they can
do as Go and only stack-allocate variables that don't have their address
taken, and in any event that only applies to C++ and Rust. There's no excuse
for all the other modern languages. Languages like Python and JavaScript don't
have stackful coroutines because of short-sighted implementation decisions
that are now too costly to revisit. Similarly, Perl 6 doesn't officially have
them because they prematurely optimized their semantics for targets like the
JVM where efficient implementations were thought to be difficult. (Moar VM
implements stackful coroutines to support gather/take, which shows that it was
simply easier to implement the more powerful construct in order to support the
less powerful gather/take construct.)

If it were easy we wouldn't have stackless coroutines at all because they're
objectively inferior in every respect, and absent external constraints
(beholden to the C stack) can result in less memory usage and fewer wasted CPU
cycles in both the common and edge cases. But both PUC Lua and LuaJIT do it
properly and are among the fastest interpreted and JIT'd implementations,
respectively, so I think the difficulty is exaggerated.

I understand why these other constructs exist, but I still think it's
perverse. At some point we should just revisit and revise the underlying
platform ABI so we can get to a place where implementing stackful coroutines
is easier for all languages.

For example, the very same debugging information you might add to improve
stack traces can be used by implementations to help, say, move objects. Make
that mandatory as part of the ABI and a lot of cool things become possible,
including easy reflection in compiled languages. DWARF is heavyweight and
complex, but Solaris (and now FreeBSD and OpenBSD) support something called
Compact C Type Format (CTF) for light-weight type descriptions, which shows
how system ABIs could usefully evolve.

Newer languages shouldn't be tying themselves to incidental semantics from 40
years ago. Rather, they should be doing what hardware and software engineers
were doing 40 years ago when they defined the primary semantics--properly
abstract call stacks/call state into a first-class construct (i.e. thread of
control), while simultaneously pushing the implementation details down so they
can be performant.

~~~
zwieback
This seems like a super-interesting discussion but I can't quite follow due to
the mixed-up naming. Is there a good paper or website that clarifies what
stackful/less means in the various contexts?

~~~
lboasso
A good start could be "Coroutines in Lua" [1]. It explains well the
terminology and makes comparisons between designs in different languages.

[1] [http://www.inf.puc-rio.br/~roberto/docs/corosblp.pdf](http://www.inf.puc-
rio.br/~roberto/docs/corosblp.pdf)

~~~
BeeOnRope
How do the terms in that paper map to the stackful/stackless distinction being
discussed above? For example neither term appears in the paper at all, except
for one mention of "Stackless Python" as a name.

~~~
lboasso
I should have linked the other paper by Ierusalimschy, "Revisiting
Coroutines". wahern linked it in his reply. They are both good reads to better
understand coroutines.

------
KeepFlying
Dumb question, but is this something that a dev would expect the compiler to
do for us automatically depending on what is deemed most efficient or is this
something that a developer can write directly into the code to make it work
this way?

I want to be sure I understand what is going on here to be sure.

Can someone offer an example of where this ability would be particularly
useful?

~~~
Twisol
Coroutines are quite nice for managing sessions with asynchronous events, like
stateful GUIs or client-server interactions. If you're familiar with the
tendency of callbacks in e.g. JavaScript to nest deeply, you can think of
coroutines as a way to recover a flat, procedural style.

I'm a particular fan of how coroutines work in Lua. Here's an article that
helps explain them a bit in that context: [http://leafo.net/posts/itchio-and-
coroutines.html](http://leafo.net/posts/itchio-and-coroutines.html)

~~~
vvanders
Yeah, they're awesome for doing sequences of events that can have temporal
gaps in the middle.

We used to use them(Lua) in games to do scripted sequences and AI. Was simple
enough that even our designers could edit/extend them.

------
RajuVarghese
Modula-2, one of the early languages with coroutines, had a pretty simple
implementation. With NEWCOROUTINE a new coroutine was created (including the
heap memory that would function as a workspace for that coroutine), TRANSFER
to transfer control from one coroutine to another and IOTRANSFER to do the
same but for interrupts. With these one could design a scheduler and off you
went!

I had built a coroutine system for a Pascal environment by implementing
NEWCOROUTINE and TRANSFER. Both turned out to be pretty simple in assembly
language. The workspace contained an area for the CPU registers and the stack.
So TRANSFER involved saving the registers of one coroutine in the workspace
and restoring the registers from the second.

------
steveklabnik
The visualization here is excellent; exactly the kind of thing I've been
meaning to look up.

------
eptcyka
Seems like this would be more like tokio for Rust rather than Go's goroutines.

~~~
steveklabnik
Yes, and we're exploring a similar thing in Rust right now, though we call
them "generators."

A key question right now is, can we have an implementation where the heap
allocations that occur here don't have to? It's not 100% clear.

~~~
GolDDranks
It's interesting that people refer the same thing with "generators",
"semicoroutines" or "stackless coroutines". (A resumable computation that
doesn't have a growable stack)

On the other hand, "stackful coroutines" and "coroutines" without a qualifier
often refers things with dynamically growing stacks.

It seems to me that the C++ Coroutines refer to the former, which likely
complicates the terminology further.

~~~
tpush
They are all specific restrictions on and/or implementations details of
delimited continuations anyway :).

------
jnordwick
Does anybody know if there has been some research and experimentation with
inlining? It is probably the most important part of an optimizing compiler,
and coroutines would seem to make that very difficult or impossible.

~~~
lucozade
I'd recommend looking at the work the LLVM team have done on this. [0] for
example and there is a Youtube video from one of the LLVM confs if I recall
correctly.

In a nutshell they are still researching this but their general approach is to
split the coroutine up and devirtualise/inline the parts where they can.

[0] [https://llvm.org/devmtg/2016-11/Slides/Nishanov-
LLVMCoroutin...](https://llvm.org/devmtg/2016-11/Slides/Nishanov-
LLVMCoroutines.pdf)

