Hacker News new | past | comments | ask | show | jobs | submit login
My tutorial and take on C++20 coroutines (stanford.edu)
187 points by erwan 11 days ago | hide | past | favorite | 136 comments

Some notes:

- C++20 coroutines are stackless, meaning that multiple coroutines will share a single OS thread stack. This is non-obvious when you first look in to them them because coroutines look just like functions. The compiler does all the work of ensuring your local variables are captured and allocated as part of the coroutine yield context, but these yield contexts are not stack frames. Every coroutine you invoke from a coroutine has its own context that can outlive the caller.

- C++20 coroutine functions are just ordinary functions, and can be called from C code. In fact, coroutine contexts can be cast to and from void* to enable their use from C.

- There's currently no generic utility in the C++20 standard library for using C++20 coroutines to defer work. std::future<> only works on threaded code, and you can't really use std::function<>. See Lewis Bakers 'task' class for a usable mechanism https://github.com/lewissbaker/cppcoro#taskt

To clarify what stackless means when you're writing code:

* calling into the coroutine from the caller uses the caller's stack (i.e. it just pushes on another stack frame). * the lack of a coroutine stack ("stackful" coroutine) means that the coroutine can only "yield" a value from a single method; it cannot call a function F, and then have F yield back to the original coroutine's caller. * In other words: you can think of it like a "coroutine with one stack frame for coroutine yielding semantics"

The compiler does some magic to slice up the coroutine into segments (between co_ statements) and stores state that must persist between these segments in a coroutine frame (in addition to the "slice" that the coroutine should continue at once it is called again).

The real tricky part is the lack of library support. From what I've seen, it seems like a naming convention is all that defines what various coroutine functions are, e.g. `promise` or `promise_type`. This is very much like iterators, which can be written via structural subtyping: anything that has the special static type values and/or methods can be used as one.

You are right that you are effectively stuck in a single coroutine (non-stack) frame. But you can chain multiple such coroutine frames, because one coroutine can co_await another.

Thanks for this great article! I've been circling my way around understanding coroutines and this really put it in place.

I think this co_awaiting is the most confusing part for folks: in most languages with stackful coroutines, it makes sense how one can

1. call "co_await" on a coroutine at a high-level API boundary 2. have that coroutine execute normal code all the way down to some expensive part (e.g. I/O) 3. at that particular point "deep" down in the expensive parts, that single point can just call "yield" and give control all the way back to the user thread that co_awaited at 1, usually with some special capability in the runtime.

I believe the way you can do this in C++20 concepts is to co_yield a promise all the way back to the originating "co_await" site, but I may be confused about this still...

It's totally clear to me why they didn't choose this for C++: keeping track of some heap-allocated stack frames could prove unwieldy.

I wish more effort went into explaining and promoting coroutines. Right now the advice seems to be "be brave and DIY, or just use cppcoro until we figure this out".

Also, if it’s not clear, within a coroutine, you can call any function (or coroutine) you want. It’s just that to co_yield, you have to be in the coroutine not deep in the stack.

Isn't it like that is most languages? I'm thinking about Python, C#, JS. If you call a blocking function from an async function, you cannot yield from deep inside the blocking function.

Why is this a big deal in C++? Am I missing anything that makes c++ coroutines less powerful than other mainstream solutions? Or are people comparing its power with e.g. Lisp or go?

It's a big deal because, while it has some downsides, being stalkless means they can have next to no overhead, meaning it can be performant to use coroutines to write asynchronous code for even very fast operations. The example given https://www.youtube.com/watch?v=j9tlJAqMV7U&t=13m30s is that you can launch multiple coroutines to issue prefetch instructions and process the fetched data, so you can have clean code that issues multiple prefetches and process the results. Whereas in Python (don't get me wrong, I love Python) you might use a generator to "asynchronize" slow operations like requesting and processing data from remote servers, C++ coroutines can be fast enough to asynchronously process "slow" operations like requesting data from main memory.

Wow, that talk is a fantastic link. He actually gets negative overhead from using coroutines, because the compiler has more freedom to optimize when humans don't prematurely break the logic into multiple functions.

All those languages have also stackless coroutines. Notably Lua and Go have stackful coroutines.

It is sort of a big deal because the discussion of wether adding stackful or stackless coroutines in C++ was an interminable and very visible one. Stackless won.

I'm trying to wrap my head around what the implication of stackless coroutines is.

Can I use them like `yield` in Python+Twisted, i.e. to implement single-threaded, cooperative parallelism? It would not expect to be able to plainly call a function with a regular call, and have that function `yield` for me - but can I await a function, which awaits another, and so on?

As far as I understand, C++20 coroutines are just a basis and you can build a library on top of it. It is possible to build single-threaded async code (python or JS style) as well as multi-threaded async code (C# style where a coroutine can end up on a different context), right?

Is there anything like an event loop / Twisted's `reactor` already available yet?

I'm really looking forward to rewrite some Qt callback hell code into async...

Stackless means when you resume a coroutine you're still using the same OS thread stack. Coroutine contexts/activation records are conceptually heap allocated (although in some cases that can be optimized away).

You can use coroutines for what you say, but there are no execution contexts (like a thread pool or an event loop) in C++20 standard library in which to execute coroutines, so to do networking and async I/O you need to use a library or DIY. Hopefully standard execution will come later as part of the Executors proposal.

You can currently use C++ native coroutines with the ASIO library, but this is probably subject to quite a bit of API churn in the future:


You can also wrap things like ASIO yourself. I did this in 2018 when I was learning about C++ coroutines to create a simple telnet based chatroom:


Note that this code is likely garbage by todays standards. One thing i can't remember is why i needed to use the cppcoro library in the end.

> Stackless means when you resume a coroutine you're still using the same OS thread stack

This is confusing, because it begs the question "same as what?" In fact, you can migrate a coroutine across threads, or even create a thread specifically for the purposes of resuming a coroutine.

But I suppose it is true that from the point at which you resume a coroutine to the point at which it suspends itself, it will use a particular stack for any function calls made within the coroutine. That's why you can't suspend a coroutine from within a function call, because the function call will use the stack.

C++, C#, Python coroutines are all stackless and pretty much equivalent. Lua, Go have stackful coroutines (i.e. one-shot continuations). There are libraries for stackful coroutines in C++ of course.

There is a proposal for executors that would add event loops to C++. In the meantime boost.Asio (also available outside of boost) is one of the best options (the standard proposal was originally based on Asio, and Asio is tracking the standard proposal closely).

> but can I await a function, which awaits another, and so on?

Whether a coroutine is stackful or stackless is largely an implementation detail that has some tradeoffs either way, in either case coroutine semantics can allow you to write efficient asynchronous code imperatively or do your callback-to-async transformation.

it is not an implementation detail at all. The syntax and semantics are very different. Stackfull coroutines allow suspending more than one activation frame (i.e. a subset of the callstack) in one go. You can of course "emulate" it with stack less coroutines by converting each frame in the call stack to a coroutine, but it is a manual and intrusive process.

I think if your syntax and semantics imply the stack-ness of your coroutine implementation then you you have a language design problem. Coroutines are subroutines with two additional primitives (resume after call, and yield before return). Whether `yield`/`resume` imply stack swapping or a state transition between activation frames is just an implementation detail. Both have tradeoffs, to be sure.

> I think if your syntax and semantics imply the stack-ness of your coroutine implementation then you you have a language design problem

A lot of languages, including C#, python, C++, rust and many others specify the stackless-ness of coroutines. So it is not just an implementation detail. You could argue that specifying these semantics is a problem, and I would actually agree, but obviously it is not something many, if not most, language designers agree.

> I'm really looking forward to rewrite some Qt callback hell code into async...

last time I tried it was pretty easy to make C++ coroutines fit into Qt's event loop.

Are C++20 coroutines allowed to be recursive? Or does recursing require boxing?

For a stackless coroutine the compiler normally has to build a state machine to represent the possible execution contexts, but if the state machine includes itself then it has indeterminate size. Normally you solve this by boxing the state machine at yield points and using dynamic dispatch to call or resume a pending coroutine - which may be significantly slower than using a stackful coroutine to begin with (in which case, the stack is allocated on the heap up front, leading to a higher price to spawn the coroutine, but lower to resume or yield).

I'm curious. Intuitively, a non-recursive coroutine would yield a state machine that only goes "forward". If you add tail recursion into the mix, you could have cycles in the state machines (going back to the begin state), correct? Of course non-tail recursion would not work within a single frame.

Yes, a tail recursive coroutine could reuse its previous frame context across yields.

With a state machine transform the `resume()` method on a coroutine is a state transformation, it doesn't necessarily know what is "forward" or "backward" in the control flow graph. There are some tricky bits though, since tail recursive functions can have multiple exits but single entries. A recursive coroutine might have multiple exits and multiple entries, so it's not always clear what is "forward" and what is "backward."

The state is allowed to be heap-allocated but can be optimized onto the stack. But if it calls recursively without control flowing back out again, then I’d think the nested state could live on the stack so long as the compiler knows the nested state never has to be held across yield/resume.

There's none in the standard library, but e.g. Seastar already has coroutine support implemented for its future<>s and it works really well - the code looks clearer and in many cases it reduces the number of allocations to 1 (for the whole coroutine frame).

It seems really dumb that they are stackless. If you are saving/restoring the stack pointer anyway in your yield routine it's trivial to set it to a block of memory you allocated in the initial coroutine creation.

Is there no setjmp/longjmp happening? Are C++ 20 coroutines all just compiler slight-of-hand similar to duff's device with no real context switching?

> It seems really dumb that they are stackless

Why? C/C++ already has stackful coroutines. And that seems extremely wasteful unless you know you'll run the coroutines at the same time... with single threaded stackful coroutines, you'd get two stacks and only ever use one at a time. that wastes megabytes, requires heavier context switching, makes cache misses more likely, etc.

Modern stackful coroutines don't (or shouldn't) use context switching (at least not anything like ucontext_t, just use threads if you're going to spend the cycles to preserve the signal mask) or setjmp/longjmp. Those tricks are slow and hazardous, especially when code needs to cross language boundaries or remain exception safe.

The compilation model is implementation defined, but yes, the expectation is that they compile down to a switch-like explicit state machine.

The advantage is that the worst case frame size is bounded, small and computable at compiletime.

Stackfull coroutines are of course more powerful and have, IMHO, better ergonomics.

> These yield contexts are not stack frames though. Every coroutine you invoke from a coroutine has its own context that can outlive the caller.

Well, they are obviously not stack frames because they do not follow a stack discipline, but they certainly are activation frames. I guess that's the point you were trying to make?


Part of this is that I’m tired, but it blows my mind how difficult C++ coroutines are as someone who considers themselves decent at C++ (although maybe I’m not) and uses coroutines in other languages. The amount of code needed to do almost nothing is extraordinary, and putting it all together doesn’t seem like you would often get on your first try. I get that new keywords basically can’t be added, but man, that’s painful.

C++ coroutines as they are in the standard are not really intended to the everyday programmer (1), but rather for library implementers who will use and abstract them into higher level features.

1: like most of C++ one would way

Ultimately, I think the simultaneous complexity and lack of built in functionality will limit how often people end up using this new part of the standard. I can use coroutines to write high performance, parallel code in other languages without being a mythical ~library implementor~. I usually even write libraries in those languages.

Agreed. I think the killer use is for event-driven programs that already do a lot of "stack ripping". If you are already manually putting all your "local" variables into an object and slicing your functions into many methods, then the grossness of C++20 coroutines will be easy to digest given how much more readable they can make your code.

Doing things right takes time. This is one step in making co-routines useful. Library authors now have something to work with to figure out what the next part of the C++ standard is. Expect standard library support in C++ in either c++ 23 or 26.

I agree with GP, I feel a similar disdain when I hear that features that are merged into the standard are for the "library implementors". Boost.Coroutine and Boost.Asio have been around for how long? At least a decade? I think there has been more than enough experience with coroutines and async programming to get coroutines in standard C++ done so that at least the "average" C++ programmer can grok them.

The problem is that it took forever to standardize the internal language-level machinery that the committee run out of time to standardize the user level library bits. Instead of delaying coroutines again (they were originally scheduled to be part of C++17, but then they were taken out of the standard), they decided to merge what they had into the standard.

The problem is coroutines need compiler support before you can play with them. Boost coroutines were rejected because doing coroutines only as a library just isn't pretty (it works, but nobody liked it), so looking at other languages the current direction was decided. Only now that you can do something can libraries spend the decades to figure out how it all fits into C++.

Currently WinRT has the best support for C++ co-routines, Microsoft was after all the main driver of the proposal, and they map to the Windows concurrency runtime.

So on WinRT it is hardly any different from what C++/CX allowed for in concepts.

Not really, coroutines are supposed to be user level constructs. I.e most users should be able to write functions that yield or await. Unfortunately the machinery surrounding coroutines is very complex, but that can be hidden behind library functions.

This is a similar approach to what Kotlin did with its co-routine implementation. Aside from the suspend keyword, there are no language features that you work with directly. Instead everything is done via the kotlinx-coroutines library. Kotlinx means it is a multiplatform library with implementations for native, js, and jvm that do the appropriate things on each platform.

Basically how the library realizes this is opaque to the user and indeed it does things like reuse low level primitives available in each of the platforms. Basically, at it's lowest level it is simply syntactic sugar for callback based implementations common in platforms that have things like Futures, Promises, etc.

In fact it's easy to integrate any third party library that provides similar constructs and pretend that they are suspend functions. E.g. the suspendCancellableCoRoutine builder can take anything that has a success and fail callback and turn it into suspending code blocks. For example Spring supports coroutines and flows out of the box now and provides deep integration with its own reactive webflux framework. So, you can write a controller that is a suspend function that returns a flow and it just works as a nice reactive endpoint.

I actually would expect that these new C++ capabilities will also find their way into Kotlin's native coroutine implementation eventually. Kotlin native is still in Beta but seems popular for cross platform IOS and Android library development currently.

Kotlin's coroutine library took quite a while to get right for the Kotlin developers and evolved a lot between Kotlin 1.2 and 1.4. Parts of it are still labelled as experimental. Particularly, reactive flows are a relatively late addition and have already had a big impact on e.g. Android and server-side programming.

They are very complex, and they kept being patched up till the last minute as more corner cases were discovered.

I believe that the design is both unneededly complex and too unflexible, but this is the only design that managed to get enough traction to get into the standard and it took forever. There are competing designs and improvements, we will have to see if they will make it into the standard.

A part of the tis that the standards committee decided to do a multi phase rollout. They created the core language parts and will hopefully in a future version add the library support making it easier to approach. (And then hopefully find that the design works with the library design ...)

Well put. I wish more features were rolled out as core language features before putting them in std. making things language features makes them possible. Making them library features makes them vocabulary. I’m more eager for things to be possible than I am for them to be ergonomic.

Making something a core language feature means it has to be completely right. Doing something in library leaves more of an option to fix it later or creating another version of the thing inna different header.

I also would like if std::variant were a language feature, not library :)

This is how I view iterators, but with an extra notch in difficulty. You really need to understand the coroutine model to get anywhere -- and the same is true of iterators. Are these leaky abstractions? I generally think of leaks as "pushing", but this is "pulling" -- I'd call 'em sucky abstractions. But I digress.

I just sat down with the tutorial and banged out what I consider to be a holy grail for quality of life. Alas, it's about 50 lines of ugly boilerplate. I'm omitting it, in part because it's untested, but mostly because I'm embarrassed to share that much C++ in a public forum. If anybody close to the C++ standards is listening... please make this boilerplate not suck so hard.

    template<typename T>
    struct CoroutineIterator {
      left for the reader!
      hint: start with the ReturnObject5 struct in the article,
            and add standard iterator boilerplate

    //! yields the integers 0 through `stop`, except for `skip`
    CoroutineIterator<int> skip_iter(int skip, int stop) {
      for(int i=0;i<stop;i++)
        if(i != skip)
          co_yield i;

    int main() {
      for(auto &i : skip_iter(15, 20)) {
        std::cout << i << std::endl;

So, given sibling comments on stack and function calls, this:

    skip_iter(int skip, int stop)
      for(int i=0;i<stop;i++)
Wouldn't work with complex types, boxed integers and so on? Because calling postfix ++ would be a function/method call?

Stackless coroutines can call other functions just fine (they would completely unusable otherwise). What they can't do is delegate suspension to any called function, i.e. any called function must return before the coroutine can yield as exactly one activation frame (the top level) can be suspended.

That's just a simple example. I think the following should work, possibly with modification for forwarding:

    class Stooge {...};

    CoroutineIterator<Stooge> three_stooges() {
       co_yield Stooge("Larry");
       co_yield Stooge("Curly");
       co_yield Stooge("Moe");

This is all down to not having a "managed runtime", so it has to be shoehorned into the environment available where you have a stack, a heap, and that's basically it.

For me coming from Rust, this just seems way overly complicated to me. Rust's async doesn't even need a heap to work: https://lights0123.com/blog/2020/07/25/async-await-for-avr-w...

Requiring heap allocation for coroutines was extremely contentious to say the least. But because C++ does not have a borrow checker, all the other allocation-less designs proved to be very error prone.

C++ coroutines do support allocators [1], so it is not a huge issue, but it does complicate the design further.

[1] the compiler is also allowed to elide the allocation, but a) it is not clear how this is different from the normal allocation elision allowance that compilers already had, and b) it is a fragile optimization.

Does that mean that every nested coroutine (async call) needs another heap allocation, or just the top level one?

What do you mean nested call? As a first approximation, you need an heap allocation for each coroutine function instance for its activation frame. Every time a coroutine instance is suspended, the previously allocated frame is reused. If you instantiate a coroutine from another coroutine, then yes you need to heap allocate again, unless the compiler can somehow merge the activation frames or you have a dedicated allocator.

Is that a given, though? Rust's generators are decent prior art - they generate a state machine that would only require heap allocation if the size of the state machine becomes unbounded (for example, a recursive generator). Otherwise the generator is perfectly capable of being stack allocated in its entirety. This turns out to be sufficient for a large amount of programs, with a sufficient workaround for the ones where you can't (box the generator, making the allocation explicit).

oh, yes, in rust coroutines do not normally allocate as far as I understand. This is not the case in C++ unfortunately. This was extremely contentious to say the least, but all alternative designs were either very unsafe or were presented very late, so the committee has preferred to go with something working now instead of something perfect in an indeterminate future.

There are already proposals to improve on the design, but we will have to see if they work out.

Great article, thanks for sharing!

You are meant to use a library like cppcoro https://github.com/lewissbaker/cppcoro rather than building all this on your own.

But for folks working on gamedev libs, high-performance async, etc would probably prefer making their own task/promise-type for hand-crafted customization.

What exactly do you find complex? I see that it is just a blob of C++ that is difficult to read at a first glance, but if you get past that, it is actually super simple.

I hope C++ at some point gets another frontend in terms of syntax. Languages should probably be specified in terms of abstract syntax instead of being stuck with a bad frontend like C++.

I've used C++ coroutines (with io_uring). They are really useful in this case and allows one to write code like one would with the simple blocking API. And from what I've read they are better than Rusts coroutines for this use case (and for the yield use case they aren't good).

It adds an additional foot-gun w.r.t. to e.g. by-reference parameters to functions and their lifetime. The function parameter might not be alive anymore after a "co_await" and it doesn't require any capture (like lambda) or has any hint about this being the case.

Then, the tooling isn't there yet (other than the missing standard library). Gdb doesn't show correct lines and can't print local variables when in coroutines. If there is a deadlock one can't see the suspended coroutine (and its call stack) holding the lock, etc.. Back to printf debugging...

> And from what I've read they are better than Rusts coroutines for this use case

Reference please? In what sense are they better, and what makes them better?

Rust's Futures don't have asynchronous destructors (I don't know if coroutines do).

When a Future is aborted early, it's destroyed immediately with no remorse. This means it can't simply offer its own buffers to the kernel, because the kernel could write back after the Future has been freed.

An API contract that includes "just be careful not to do the stupid thing" is not good enough by Rust's standards, so the only way to guarantee safety would be to have Future's destructor wait synchronously until the I/O operation is cancelled on the kernel side, but that's inelegant in an async context.

Isn't Future lifetime must be tied to I/O operation, so Future will not outlive I/O operation? Can you post an example, please?

C++ coroutines for async functions (returning std::task) seem to have completion semantics (are not randomly interruptible). See eg the APIs in cppcoro.

However that is not a general property of c++ coroutines. The generator style coroutines also seem randomly cancellable

Seems like once again we are stuck with an overly complex and un-ergonomic design that is going to burden every single C++ programmer for a decade, until someone gets fed up and fixes it. Just like chrono, just like random, just like std::string, all of std::algorithm, etc. God damn it.

You have hope that someone is going to fix it? You're an optimistic soul :)

> C++20 coroutines are implemented as a nice little nugget buried underneath heaps of garbage that you have to wade through to access the nice part. Frankly, I was disappointed by the design, because other recent language changes were more tastefully done, but alas not coroutines. Further obfuscating coroutines is the fact that the C++ standard library doesn’t actually supply the heap of garbage you need to access coroutines, so you actually have to roll your own garbage and then wade through it.

Well, that sounds like more fun than a tax increase.

What I find interesting about coroutines is that they are relatively trivially achieved in hand-coded assembly using `jmp` and being careful about registers in a way that makes sense. `jmp`ing around is a pretty normal way to code in assembly. In a sophisticated environment they become a beast. It surprises me that we actually do lose something meaningful when abstracting to higher level languages.

Of course we lose something when we abstract to higher level languages, that's exactly what it means to abstract out. It'd be more surprising other way around. For example, when you're programming in a language like Haskell, Python or Java you cannot micromanage your pointers etc. This is because these languages abstract away the memory layout of objects. If you do hacky things and override the memory intentionally you can cause e.g. garbage collection to misbehave etc. On the other hand in C++ or C you can manage exactly how objects are encoded in the memory, when and how you allocate memory for your system. All these things are not things you're meant to be able to "customize" in higher level languages.

Its insane how complex C++20 coroutines are when compared with Rust coroutines.

screw rust (which needs async-std), look at how zig does coroutines.

You don't need async-std for coroutines (generators) in rust (or async at all, they are two different, albeit related features). Async is implemented using generators.

Can you say more on this? What is Rust doing differently here that simplifies things?

I find the Rust design very simple: a coroutine is just a state machine, i.e. just a C struct. I find this very easy to reason about.

It does not require memory allocations, does not require a run-time, works on embedded targets, etc.

Also, the compiler generates all the boilerplate (the state machine) for you, which I find makes it very easy to use. And well, the compiler ensures memory safety, thread safety, etc. which is the cherry on top.

I'm not sure that answers my question; C++ also uses a state machine.

Most of the post is concerned with the compiler<->library interface - where Rust uses Generator, GeneratorState, Pin, etc. Is there something fundamentally different about the design here?

Rust's compiler/library interface is still much simpler than C++'s. An `async fn` call gives you an anonymous-typed object (much like a lambda) which implements the `Future` trait, which has a single method `poll`.

`Generator` and `GeneratorState` are not exposed or usable. There are no knobs to turn like C++'s `await_ready` or `early_suspend`/`final_suspend`. There is no implicit heap allocation or corresponding elision optimization, and thus no need to map between `coroutine_handle`s and promise objects.

To be fair, C++'s design is a bit more flexible in that it supports passing data in and out of a coroutine. But even if you look at Rust's (unstable work-in-progress) approach to supporting this, the compiler/library interface is still way simpler. The difference is really not related to how much functionality is stabilized, but how scattered and ad-hoc the C++ interface is.

In Rust, the state machines just implement one trait, Future, which has one method: poll.

For a very long time, async/await were just normal Rust macros; there was no compiler<->library interface.

For a year or so, async/await are proper keywords, which provides nicer syntax, and some optimizations that were hard to do with macros (e.g. better layout optimizations for the state machines).

But that's essentially the whole thing.

Looking at safety, flexibility, performance and simplicity, Rust design picks maximum safety, performance, and simplicity, trading off some flexibility in places where it really isn't necessary:

- you can't move a coroutine while its being polled, which is something you probably shouldn't be doing anyways

- you can't control the layout of the coroutine state for auto-generated corotuines; but you can lay them out manually if you need to, for perf (the compiler just won't help you here)

- you need to manually lay out a coroutine and commit to the layout for using them in ABIs

C++ just picks a different design. 100% safety isn't attainable, maximum flexibility is very important, performance is important, but if you need this you have alternatives (callbacks, etc.). I personally just find the API surface of C++ coroutines (futures, promises, tasks, handles, ...) to just be really big.

Technically it's an enum, or tagged enum.

Rust's interface at the lowest level is just a single method, which in C++ could roughly look like:

   optional<T> poll(*waker_t)
When polled it either returns the final result or that it's pending. If it's pending, it keeps the reference to the waker arg, and uses it to notify when it's ready to be polled again.

This design is very composable, because a Future can trivially forward the poll calls to other futures, and the wakers can be plugged into all kinds of existing callback-based APIs.

async/await is a syntax sugar that auto-generates the poll methods from async function's bodies (where each poll() call advances state to the next .await point).

There's no built-in runtime in the language that does the polling. You make your own event loop, and that gives you freedom to make it simple, or fancy multithreaded, or deterministic for test environments, etc.

Keep in mind that these is a really basic building block where you can bring your own runtime and hook coroutines into it, not something that is at all usable out of the box. This is exacerbated by the fact that the C++ standard library is still lacking support for non-blocking futures/promises and queued thread pools to run them.

To see how it can be used for actual asynchronous operations on a thread pool, take a look at asyncly, which I co-authored:


i used this : https://blog.panicsoftware.com/coroutines-introduction/ a while back. maybe someone else might find it useful too...

From the existing explanations, I’ve deduced that coroutines are a form of continuation based coopetative multitasking. Is it possible to “restart” the sequence, and how would this be done?

There are two ways you can restart with the standard interface:

- create a new task (start over) and maybe delete the old task

- implement the task as a generator that can repeatedly yield values

I think this is as good as you can hope for without lots of other interfaces (and since you can implement all the promise types yourself, you can do that too).

As far as I understand, they are one-shot continuations, so no, they are not restartable unfortunately. I believe at some point one proposal had copyable (and hence multi-shot) continuations, but the proposal had other issues.

Thanks for the quality response. Without restarts, it limits the use cases.

I see they are excellent for concurrent programming but is it possible to utilise all cores of processor effectively with coroutines?

Not necessarily. Concurrency is generally not parallelism.

Coroutines help solve IO-bound issues - a long networking API call doesn't block your program. You can continue doing other things. The side-effect of which may use all your cores efficiently but that's not the primary goal.

No. Think of the coroutines as goto:s with a dynamic label that does some magic to replace locals too.

You still need threads.

it'd be very trivial with something such as ASIO's io_context as your coroutine event loop: https://dens.website/tutorials/cpp-asio/multithreading

Typical HN comment I know, but how is this blog generated? It looks fantastic. Is this only Markdown + Pandoc?

"std::coroutine_handle<> *hp_;"

does it, the former line, really count as a tutorial at the very beginning?

As a developer, when you decide to use bleeding edge C++?? extensions keep in mind that when you do so you're making it much, much harder to run any of your code on a distro more than 4 years old. Is it worth it?

Similarly, as a user, keep in mind that when you use a 4 year old distro, it becomes much, much harder to run any software that has adopted new C++ features.

At least now. Back during the golden age of desktop from 2000 to about 2010 things were stable. But then dev funding for linux and it's primary libs went back to be about creating the most bleeding edge, fastest, possible environment for running server farms for mega-corps and desktop stability was abandoned.

The end result is the fever that is containerization as a symptom of the sickness that is future shock from devs always using the latest.

I can't find the article, but there was a rant a little while back that essentially panned C++ as veering into the "move fast and break things" mentality. Their recommendation was something to wait on the order of 7 years, where one should expect C++14 features to be generally solid and performant by now, but to treat everything else with caution. If you're supporting 4 year old distros, you might stick with C++11.

Note that the author (in addition to being a great professor) is responsible for one of my favorite papers: https://news.ycombinator.com/item?id=11098655

Is learning modern C++ worth the effort in 2021?

I know classic c++, but don't really use it anymore.

I also know classic C++ and used it professionally, although it wasn't my main language. I've started using more C++ 11, 14, 17, lately. Particularly with constexpr, and some more stuff with templates and smart pointers.

I would say the experience is about what you would expect. There are some things that are great, that are cleaned up and more consistent. There are more conveniences. But then you run into weird edge cases / quirks, or a cascade of subtle compile errors, and just use a macro instead.

I'm writing a garbage collector and there are a bunch of things where it helps over C, but also a bunch of areas where it falls short. In summary, C++ is still great and frustrating at the same time. If anything that property seems to have increased over time: It's even more great and even more frustrating than it ever was :) Coroutines look like another example of that.

"worth the effort" is very difficult to quantify. If you haven't touched C++ in a while, I'd say the things in C++20 are intriguing, even if half-baked:

* ranges means that we're starting to get composable algorithms. Compared to doing `for_each` and then `accumulate` on a container, you can compose both operations into one reusable "pipeline" of operations. I really grew to like this from other languages, and am glad that C++ is starting to get it * modules will help a lot, but I doubt we'll get widespread use for another year at least (until cmake has it; build2 has it, but that's a bit esoteric even though it's very pleasant to work with) * concepts make working with template-laden code a lot easier. Most of the tricks for obscure SFINAE removal of templates are no longer necessary; one can just express it in a "positive" way as if to say "the thing matching this template should have these properties", instead of convoluted ways to turn off templates under certain circumstances. * coroutines are very half-baked, but when combined with executors (which will hopefully come in 23) it will really be useful I think. The stackless nature makes them hard to reason about, but trying to add stackful coroutines would likely be a non-starter; it would require a lot of runtime support and modifications that would either require a lot of performance overhead OR backward-compat difficulties.

That said, unless you want to actually use it to DO something specific (e.g. make a game, work at a specific company), Rust or similar languages are probably more pleasant experiences. Without a standardized build tool with sane defaults, it's hard to just play around with things in a fun way. Trying to figure out how CMake + Conan works will take beginners at least a day in my experience, and the lack of a solid template "good defaults, applicable most places but easily changeable" for the build system + dependency system makes things a bit tedious to just try things out.

As much as I like some replacement candidates, C++ is everywhere, so if you want to do anything related with graphics, machine learning, compiler development, OS drivers ,IoT toolchains, and not having the burden to sort out infrastructure problems by yourself, then C++ is the less painful way to go.

Some would say C, but in domains like machine learning, you really don't want to write plain old C for what is running on the GPGPU.

Yes, you do not program in C++ because you want to program in C++. You do because a lot of cool, interesting (and yes, well paid) stuff is done in C++.

Still, with all its issues, I do quite like the language, but unless you want to work on some industry that is C++-centric, I do not think it is worth learning just for the sake of it.

Same here, I keep referring how C++ was the upgrade path from Turbo Pascal because I never came clear with bare bones C.

Since 2002, it has been a mix of .NET, C++ and Java, with my C++'s usage decreasing since those days, yet it is never zero, there is always a library or OS API to create bindings for.

I also like the language, but I definitely do not fit the culture of every ms and byte counts that they inherited from C.

On the C++ code where I have full control, RTTI, exceptions and STL bounds checking are always enabled, and I am yet to find a project where it mattered to have done otherwise.

Depends on what you want to use it for. Game programming with UE4 is much nicer with modern c++ syntax, or anything where you want to push the boundaries; but I use kotlin for most things because the syntax is much more concise and the performance isn't an issue.

How much C++ coding does a modern game still require? I would've thought most of the (coding) effort would be in a scripting language inside the game engine's editors.

all my friends who do gamedev for their day job are 100% C++ except that one who generally works on the menus, settings, etc. (in semi-large / large french video game companies)

I would say, yes. C++11 is, in some ways, a totally new and more powerful language. I was sold on it as soon as I started learning. C++14 and C++17 add some useful features. I know basically nothing about C++20.

As a non-C++ professional programmer question, what's the occurrence of the codebases you work on, where the standard is significantly/fundamentally C++ 14/17?

I don't know how to answer that. For me, it is 100%, but it is a small sample size and one where I, in part, set the standard.

If you have a C++11 code base, you don't really lose anything by adopting C++17, and you do gain a few things.

Interesting; I've always mistakenly assumed that C++ codebases working with different standards would necessarily end up being a patchwork.

If you combine larger libraries from pre-eleven with post-eleven libraries you will see the different worlds (Qt as a prime example), small libraries can easily be enhanced with smart pointers etc. so do you don't have much of a clash. After eleven most features are nice things, which make things a bit nicer here or there, but don't change the code as much as reliance on moves and smart pointers do with eleven. Also upgrade from C++11 to C++17 is as painless as a compiler update can be. (Meaning: in some corner case you will probably hit some compiler optimization caused issue, but for most code it's a non-issue)

I'm sure someone will come up with a counterexample, but my experience is that pretty much everything is backwards compatible.

C++17 was backwards compatible with the venerable C++11 in almost all ways except for the removal of some library constructs few people actually use and have long been replaced by superior models, like `std::auto_ptr` and the old binder functors.

We use C++17 to make our professional 3D printers go. We look forward to C++20 features when they become available. We shed a few hundred lines of code when we upgraded to 17 because we could drop hand-rolled stuff. It works. Nothing is meaningfully faster. And if you try hard it’s faster still.

> Is learning modern C++ worth the effort in 2021?

(Hopping onto this topic,) I wonder, what are the right resources for properly diving into the modern C++? Are there books that are up to date? Tutorials? What has worked for HNs the best?

I really liked Josuttis's book on C++17 (https://www.cppstd17.com/). The only C++20 book I know of is the Grimm book linked in the referenced blog post (https://leanpub.com/c20). I'm glad I read it, but honestly it's a bit rough still. To be fair, though, it's still only a draft so will probably improve. And I'm assuming you already have C++11. There are a lot of resources for that one. I happened to use Stroustroup 4th edition, but I'm willing to believe there are better books, particularly for people already familiar with C++03.

By 'classic' do you mean C++17, or do you mean 'C with Classes' from a quarter century ago?

Just learn Rust.

I'm a big Rust supporter/advocate (and I do use it for personal, non-trivial, programs), but I still suggest it only for a minority of the cases, as real-world is complicated.

In strict technological terms, there's for example the gamedev domain, which is C++ dominated, so a legitimate and non-obvious doubt, is if following up with such language is an appropriate choice or not.

Then there's the business side. It's a bit obscure how much Rust is used in BigCos; it's clearly catching on, but the extent is not obvious. Therefore, if one aims at, say (random pick) a Google job, up to date C++ knowledge may be more advantageous.

For the case when one is learning a new language (which is not the parent's case) _and_ they're not constrained by legacy/context, though, I agree that one should not even mention memory unsafe languages :)

I'm a bit confused by all of this. The idea of coroutines existed since the 60's. Then we came up with OOP that tries to combine function + data. And now we abolished this, are back at only functions and are now solving the 'state capture' problem for functions again with: Coroutines? How does the history of programming paradigms/patterns make any sense? :)

Good article though.

Your analysis of history there is a bit lacking. Coroutines didn't go out of fashion because of OOP, it went out of fashion because of structured programming and higher level languages.

Coroutines are doable if you're programming directly in assembly, but if you want to do it in C-like structured languages, it turns out that it's really tricky: the whole concept of those languages are about having a single stack and hierarchical lexical structure. You can do coroutines in languages like this, but unless you want to do nasty things with longjmp and manually manipulating the stack pointer, they simply aren't built for it. You can build structured languages with first-class support for constructs with coroutines, but it took a couple of decades of programming language development for that to happen. Even today, most of the languages that support coroutines (C++20 included), only has support for the stackless kind. Basically the only languages with full stackful coroutine support in wide use are Lua and (sort-of) Go.

There is no particular difficulty in having one-shot continuations in C (or C++) and in fact over the last few decades there have been plenty of libraries implementing them. They just never caught on with the general C and C++ programming population, although they were popular in some niches.

C with Classes had them from the beginning, as Stroustrup liked them in Simula, but then (like many other things) had to take them out of the language after user feedback.

Re stackless coroutines and language support, while it is relatively straightforward to have stackfull coroutines as a library, the stackless kind in practice needs to be implemented as a language feature.

Or take advantage of the ABI of the runtime, and use assembly. [1] Yeah, not portable. But using setjmp()/longjmp() has issues as well (way too easy to mess it up because it doesn't do what you think it's doing).

[1] https://github.com/spc476/C-Coroutines

> Basically the only languages with full stackful coroutine support in wide use are Lua and (sort-of) Go.

Aren't Javascript Generators also coroutines?

they are, as far as I know, stackless. I.e. you can only yield from the generator top level frame.

The glib answer is that these kinds of things are cyclical.

But C++ coroutines don't seem too incompatible with the level of OOP you already get in C++, no?

> . And now we abolished this,

... no. coroutines are not supposed to be used in, like, more than an extremely small fraction of your codebase if you want to keep your code understandable

Eh, I think coroutines are a convenient way to achieve concurrency and parallelism. If you limit mutation and try to reduce the long reaching accessibility of variables then I think they’re generally very understandable, especially compared to other concurrency/parallelism paradigms.

> Eh, I think coroutines are a convenient way to achieve concurrency and parallelism.

but how often do you need concurrency and parallelism outside of handling network requests and performing some complicated math algorithm (which may be in a lib that you don't touch anyways, e.g. FFT>) ?

e.g. most UI code has to run on the main thread due to macOS / Windows limitations, and in most UI software it is extremely rare to have the kind of chains of callbacks whose readability gets improved by coroutines.

Coroutines are not really for parallelism. I doubt you'll see them much around math heavy code. Possibly if someone implements some Cilk style work stealing executor...

Ideally, but they taint everything they touch, that is why in C# I always start them from some point with Task.Run(), or leave it for the event and request handlers.

These are features built into the language designed to make multi-threaded code safer and easier to write.

Objects and coroutines are very closely related.

void main() does not exactly inspire confidence.


Umm... if you download the code you will see that main returns int, but the main1...main6 functions invoked by main return void because they don't need to return a value.

I copy and pasted `void main()` from the article. If you read the SO question I linked, you would see that `void` is not a valid return type for `main`.

Oh, that must be a typo in the document, sorry. I just fixed it. The actual corodemo.cc file was correct, though: http://www.scs.stanford.edu/~dm/blog/corodemo.cc

Also, here's a more authoritative source than stack overflow for the return type of main: https://timsong-cpp.github.io/cppwp/n4861/basic.start.main#2

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact