Does anyone know how Rust's implementation compares to C++2a's? The C++ people s...

tmandry · on Aug 19, 2019

Disclaimer: I'm not an expert on the proposal, but have looked at it some, and can offer my impressions here. (Sorry, this got a bit long!)

The C++ proposal definitely attacks the problem from a different angle than Rust. One somewhat surface-level difference is that it implements co_yield in terms of co_await, which is the opposite of Rust implementing await in terms of yield.

Another difference is that in Rust, all heap allocations of your generators/futures are explicit. In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.

In Rust, all the "state inlining" is explicitly done as part of the language. This means that in cases where you can't inline state, you must introduce an explicit indirection. (Imagine, say, a recursive generator - it's impossible to inline inside of itself! When you recurse, you must allocate the new generator on the heap, inside a Box.)

To be clear, the optimizations I'm talking about in the blog post are all implemented today. I'll be covering what they do and don't do, as well as future work needed, in future blog posts.

One benefit of C++ that you allude to is that there are a lot of extension points. I admit to not fully understanding what each one of them is for, but my feeling is that some of it comes from approaching the problem differently. Some of it absolutely represents missing features in Rust's initial implementation. But as I say in the post, we can and will add more features on a rolling basis.

The way I would approach the specific problem you mention is with a custom executor. When you write the executor, you control how new tasks are scheduled, and can add an API that allows specifying a task priority. You can also allow modifying this priority within the task: when you poll a task, set a thread-local variable to point to that task. Then inside the task, you can gain a reference to yourself and modify your priority.

je42 · on Aug 20, 2019

> In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.

I don't think this is correct. C++ 20 allows a lot of choices to implement it without forcing a heap allocation.

see https://lewissbaker.github.io/2017/11/17/understanding-opera... also see this video that goes in depth how to have billions of coroutines with C++: https://www.youtube.com/watch?v=j9tlJAqMV7U

saurik · on Aug 19, 2019

Thanks for the information!!

On your last paragraph, the thing I'm concerned by is where this extra priority information is stored and propogated, as the term "task" is interesting: isn't every single separate thing being awaited its own task? There isn't (in my mental model) a concept that maps into something like a "pseudo-thread" (but maybe Rust does something like this, requiring a very structured form of concurrency?), which would let me set a "pseudo-thread" property, right?

As an example: if I am already in an asynchronous coroutine and I spawn of two asynchronous web requests as sub-tasks, the results of which will be processed potentially in parallel on various work queues, and then join those two tasks into a high-level join task that I wait on (so I want both of these things to be done before I continue), I'd want the background processing done on the results to be handled at the priority of this parent spawning task; do I have to manually propagate this information?

In C++2a, I would model this by having a promise type that is used for my prioritize-able tasks and, to interface with existing APIs (such as the web request API) that are doing I/O scheduling; I'd use await_transform to adapt their promise type into one of mine that lets me maintain my deadline across the I/O operation and then recover it in both of the subtasks that come back into my multi-threaded work queue. Everything I've seen about Rust seems to assume that there is a single task/promise type that comes from the standard library, meaning that it isn't clear to me how I could possibly do this kind of advanced scheduling work.

(Essentially, whether or not it was named for this reason--and I'm kind of assuming it wasn't, which is sad, because not enough people understand monads and I feel like it hurts a lot of mainstream programming languages... I might even say particularly Rust, which could use more monadic concepts in its error handling--await_transform is acting as a limited form of monad transformer, allowing me to take different concepts of scheduled execution and merge them together in a way that is almost entirely transparent to the code spawning sub-tasks. The co_await syntax is then acting as a somewhat-lame-but-workable-I-guess substitute for do notation from Haskell. In a perfect world, of course, this would be almost as transparent as exceptions are, which are themselves another interesting form of monad.)

tmandry · on Aug 19, 2019

The concept of a pseudo-thread you're referring to is a task. A task contains a whole tree of futures awaiting other futures. So no manual propagation is necessary.

Of course, it's possible for tasks to spawn other tasks that execute independently. (To be clear, if you are awaiting something from within your task, it is not a separate task.) For spawning new tasks, there's a standard API[1], which doesn't include any executor-specific stuff like priority. You'll have to decide what you want the default behavior to be when someone calls this; for example, a newly spawned task can inherit the priority of its parent.

To get more sophisticated, you could even have a "spawn policy" field for every task that your first-party code knows how to set. Any new task spawned from within that task inherits priority according to that task's policy. The executor implementation decides what tasks look like and how to spawn new ones, so you can go crazy. (Not that you necessarily should, that is.)

To summarize the Rust approach, I'd say you have 3 main extension points:

1. The executor, which controls the spawning, prioritization, and execution of tasks

2. Custom combinators (like join_all[2]), which allow you to customize the implementation of poll[3] and, say, customize how sub-futures are prioritized (This is at the same level as await, so per-Future, not per-Task.)

3. Leaf futures (like the ones that read or write to a socket). These are responsible for working with the executor to schedule their future wake-ups (with, say, epoll or some other mechanism). For more on this, see [4].

[1]: https://doc.rust-lang.org/1.28.0/std/task/trait.Executor.htm...

[2]: https://rust-lang-nursery.github.io/futures-api-docs/0.3.0-a...

[3]: https://doc.rust-lang.org/1.28.0/std/future/trait.Future.htm...

[4]: https://boats.gitlab.io/blog/post/wakers-i/

saurik · on Aug 19, 2019

Thank you so much for the context here!! And yeah: a big terminology clash is that many of the libraries for C++ come from a common lineage (almost entirely from Lewis Baker, who has been involved in the STL abstractions, wrote cppcoro, and then got allocated by Facebook to work on Folly) and use the term "task" to essentially mean "a future that you can efficiently co_await". What I'm seeing so far seems reasonably sane and arguably similar to the abstraction I have been building up using lower-level components in C++; which all makes me very happy, as I'm anticipating eventually wanting to rewrite what I have done so far in Rust.

anp · on Aug 19, 2019

In a Rust implementation of your example, you might not necessarily spawn the two sub-operations as separate tasks. Awaiting them directly in a parent async function (probably using FuturesUnordered, like Promise.all in JS) will cause all of their work to be scheduled, prioritized, cancelled, etc. together because they’ll be a part of the same task. There’s a 1-many relation from tasks to Futures in Rust.

saurik · on Aug 19, 2019

FWIW, what I meant by "join those two tasks into a high-level join task" was "call something akin to Promise.all to compose the two futures into a single one upon which I could await". It sounds like I need to learn more about the concept Rust has for "tasks", as maybe they are providing this "pseudo-thread" abstraction I was discussing in passing. I am seeing terms like "leaf Futures" and "top-level Futures" in some of the documentation I am quickly turning up.

comex · on Aug 20, 2019

I don’t fully understand your use case, but in case this helps:

- Future is a trait, not a type, so you can write your own future types.

- Future’s poll method takes a context argument, and async/await-based futures pass that argument unchanged to any sub-futures that they await. The context argument is a pointer to a standard library type (std::task::Context), which itself contains a pointer to a “waker” object that’s meant to be supplied by the executor, with some predefined methods. There’s some room for customization here, but it’s not as flexible as it probably should be – e.g. for now, as far as I can tell, you can’t get the pointer back out of the waker, only call the standard methods on it. Thread-local storage is an option, of course.

steveklabnik · on Aug 19, 2019

A “task” is the thing you submit to an executor, that is, the sum of all async/awaits in a single computation. Each await is not its own task.

skybrian · on Aug 19, 2019

It's interesting that support for recursion is no longer the default here. A partial reversal of what happened going from Fortran to Algol?

Rusky · on Aug 20, 2019

Aside from the high-level similarity of the "function -> state machine" transformation, Rust's is quite a bit different (and IMO both simpler and more flexible).

A C++ coroutine chooses a particular promise type as part of its definition. Its frame defaults to a separate heap-allocation per coroutine, with some allowance for elision. At a suspension point, it passes a type-erased handle to the current coroutine to an `await_suspend` method, which can either `return false` or call `handle.resume()` to resume the coroutine. A stack of `co_await`ing coroutines (or "psuedo-thread" as you call it) is thus a linked list of `coroutine_handle`s stored in the coroutine frames of their await-ees, rooted with whoever is responsible for next resuming the coroutine.

A Rust async function does things inside out, in a sense. It has no promise type; calling one directly returns its frame into the caller's frame, as a value with an anonymous type that implements the `Future` trait. This trait has a single method called `poll`, which resumes the function and runs it until its next suspension point. `poll` takes a single argument, a handle which is used to signal when it is ready to continue. This handle is threaded down through a stack of `poll`s (a "task" or pseudo-thread), and stored with whoever is responsible for notifying the task it should continue.

One implication of the Rust approach is that the "executor" and the "reactor" are decoupled. An executor maintains a collection of running tasks and schedules them. A reactor holds onto those handles and notifies executors of relevant events. This lets you control scheduling without language hooks like await_transform- you can associate your prioritization data with a task when you spawn it on a particular executor, and it can await any reactor without losing that information.

Another implication is that you have a choice of whether to a) `await` a future, making it part of the current task, or b) spawn it as its own task, to be scheduled on its own, much like OS thread APIs. Option (a) can get really interesting with multiple concurrent sub-futures (with things like Promise.all or select); it can be as simple as having the caller poll all its children every time it wakes up, or as complex as wrapping `poll`'s handle argument and implementing your own scheduling within a task.

Matthias247 · on Aug 20, 2019

My understanding is that C++ opted more for a coroutine-first design, where a very generic coroutine abstraction is in the center of the design, and other things (like generators and async functions) are built around it. That makes it very universal - but probably also harder to understand if one only has a specific use-case.

Rusts design compared to that was not only focused on async functions as the main design goal, but also on maintaining compatibility with a "struct Future" type which also can be implemented by hand and represents a state-machine.

The latter will allow Rust to reuse lots of async infrastructure that had been built in the (combinators & state-machine) Futures world in the last years (e.g. the tokio library and everything on top of it).

One downside of Rusts approach might be that some parts feel a bit awkward and hard, e.g. the 2 different states of Futures (one where it hasn't been executed and can be moved around and one where it has been started executing and can't be moved anymore) and the pinning system. As far as I understand C++ exposes less of those details to end-users - this might be something where the implicit allocations might have helped it.

As far as I understand the async function flavor of C++ coroutines also have run-to-completion semantics and can't be cancelled at any yield point like Rusts Futures can be. This has the advantage of being able to wrap IO completion based operations in a more natural fashion than Rust. But it then again has the downside that users need to pass CancellationTokens around for cancellation, and that some code might not be cancellable.