
How Rust optimizes async/await - tmandry
https://tmandry.gitlab.io/blog/posts/optimizing-await-1/
======
weiming
As a newcomer to Rust, wishing that this post was one of the _first_ ones I've
read about this topic. It took scouring through many many posts, some of them
here on HN, to be able to grasp some of the same idea. (I may not be alone,
judging from the very long discussion the other day:
[https://news.ycombinator.com/item?id=20719095](https://news.ycombinator.com/item?id=20719095))

~~~
hathawsh
Development of high quality async support in Rust is happening right now, so
remember to wear a hard hat. ;-) I like to watch
[https://areweasyncyet.rs/](https://areweasyncyet.rs/) and [https://this-week-
in-rust.org/](https://this-week-in-rust.org/) to see where things are.

------
zackmorris
This is one of the most concise tutorials on how generators, coroutines and
futures/promises are related (from first principles) that I've seen.

I'm hopeful that eventually promises and async/await fade into history as a
fad that turned out to be too unwieldy. I think that lightweight processes
with no shared memory, connected by streams (the Erlang/Elixer, Go and Actor
model) are the way to go. The advantage to using async/await seems to be to
avoid the dynamic stack allocation of coroutines, which can probably be
optimized away anyway. So I don't see a strong enough advantage in moving from
blocking to nonblocking code. Or to rephrase, I don't see the advantage in
moving from deterministic to nondeterministic code. I know that all of the
edge cases in a promise chain can be handled, but I have yet to see it done
well in deployed code. Which makes me think that it's probably untenable for
the mainstream.

So I'd vote to add generators to the Rust spec in order to make coroutines
possible, before I'd add futures/promises and async/await. But maybe they are
all equivalent, so if we have one, we can make all of them, not sure.

~~~
tmandry
It's the same underlying mechanism for generators as for futures: they are
_stackless_ coroutines. All the space they need for local variables is
allocated ahead of time.

In my experience, the fact that they are stackless is not at all obvious when
you're coding with them. Rust makes working with them really simple and
intuitive.

~~~
truncate
Debugging can be pain though, as you may not know the right stack, and makes
it harder to follow how the code executed in that context. But yes, rather
than writing an async state machine with callbacks, I would prefer this.

~~~
steveklabnik
[https://crates.io/crates/tracing](https://crates.io/crates/tracing) is
attempting to solve this issue!

------
pcwalton
As a reminder, you don't _need_ to use async/await to implement socket servers
in Rust. You can use threads, and they scale quite well. M:N scheduling was
found to be slower than 1:1 scheduling on Linux, which is why Rust doesn't use
that solution.

Async/await is a great feature for those who require performance beyond what
threads (backed by either 1:1 or M:N implementations) can provide. One of the
major reasons behind the superior performance of async/await futures relative
to threads/goroutines/etc. is that async/await compiles to a state machine in
the manner described in this post, so a stack is not needed.

~~~
woah
I find async/await easier to reason about than threads for anything more
involved than the 1 request per thread web server use case. This is because
you avoid bringing in the abstraction of threads (or green threads) and their
communication with one another. You trade syntactical complexity (what color
is your function, etc), for semantic complexity (threads, channels, thread
safety, lock races).

~~~
kccqzy
They have the same semantic complexity. You have tasks in async/await and you
still need to deal with inter-task communication, locking, etc.

~~~
lalaland1125
The main difference is that in async/await you control where context switches
occur and the syntax (.await) explicitly points them out. This means you can
often avoid locks and do things in a more straightforward manner.

~~~
GolDDranks
Note that this applies only to tasks that are !Sync. If they aren't, two tasks
accessing the same state could be moved to different threads and access that
state racily. In that case .await tells you nothing about accessing non-local
state. However, for purposefully single-threaded task pools avoiding locks
this way certainly seems possible.

------
saurik
Does anyone know how Rust's implementation compares to C++2a's? The C++ people
seem to have spent a lot of time creating an extremely generic framework for
async/await wherein it is easy to change out how the scheduler works (I
currently have a trivial work stack, but am going to be moving to something
more akin to a deadline scheduler in the near future for a large codebase I am
working with, which needs to be able to associate extra prioritization data
into the task object, something that is reasonably simple to do with
await_transform). I am also under the impression that existing implementation
in LLVM already does some of these optimizations that Rust says they will get
around to doing (as the C++ people also care a lot about zero-cost).

~~~
tmandry
Disclaimer: I'm not an expert on the proposal, but have looked at it some, and
can offer my impressions here. (Sorry, this got a bit long!)

The C++ proposal definitely attacks the problem from a different angle than
Rust. One somewhat surface-level difference is that it implements co_yield in
terms of co_await, which is the opposite of Rust implementing await in terms
of yield.

Another difference is that in Rust, all heap allocations of your
generators/futures are explicit. In C++, _technically_ every initialization of
a sub-coroutine starts defaults to being a new heap allocation. I don't want
to spread FUD: my understanding is that the _vast majority_ of these are
optimized out by the compiler. But one downside of this approach is that you
could change your code and accidentally disable one of these optimizations.

In Rust, all the "state inlining" is explicitly done as part of the language.
This means that in cases where you can't inline state, you must introduce an
explicit indirection. (Imagine, say, a recursive generator - it's impossible
to inline inside of itself! When you recurse, you must allocate the new
generator on the heap, inside a Box.)

To be clear, the optimizations I'm talking about in the blog post are all
implemented today. I'll be covering what they do and don't do, as well as
future work needed, in future blog posts.

One benefit of C++ that you allude to is that there are a _lot_ of extension
points. I admit to not fully understanding what each one of them is for, but
my feeling is that some of it comes from approaching the problem differently.
Some of it absolutely represents missing features in Rust's initial
implementation. But as I say in the post, we can and will add more features on
a rolling basis.

The way I would approach the specific problem you mention is with a custom
executor. When you write the executor, you control how new tasks are
scheduled, and can add an API that allows specifying a task priority. You can
also allow modifying this priority within the task: when you poll a task, set
a thread-local variable to point to that task. Then inside the task, you can
gain a reference to yourself and modify your priority.

~~~
saurik
Thanks for the information!!

On your last paragraph, the thing I'm concerned by is where this extra
priority information is stored and propogated, as the term "task" is
interesting: isn't every single separate thing being awaited its own task?
There isn't (in my mental model) a concept that maps into something like a
"pseudo-thread" (but maybe Rust does something like this, requiring a very
structured form of concurrency?), which would let me set a "pseudo-thread"
property, right?

As an example: if I am already in an asynchronous coroutine and I spawn of two
asynchronous web requests as sub-tasks, the results of which will be processed
potentially in parallel on various work queues, and then join those two tasks
into a high-level join task that I wait on (so I want both of these things to
be done before I continue), I'd want the background processing done on the
results to be handled at the priority of this parent spawning task; do I have
to manually propagate this information?

In C++2a, I would model this by having a promise type that is used for my
prioritize-able tasks and, to interface with existing APIs (such as the web
request API) that are doing I/O scheduling; I'd use await_transform to adapt
their promise type into one of mine that lets me maintain my deadline across
the I/O operation and then recover it in both of the subtasks that come back
into my multi-threaded work queue. Everything I've seen about Rust seems to
assume that there is a single task/promise type that comes from the standard
library, meaning that it isn't clear to me how I could possibly do this kind
of advanced scheduling work.

(Essentially, whether or not it was named for this reason--and I'm kind of
assuming it wasn't, which is sad, because not enough people understand monads
and I feel like it hurts a lot of mainstream programming languages... I might
even say _particularly_ Rust, which could use more monadic concepts in its
error handling--await_transform is acting as a limited form of monad
transformer, allowing me to take different concepts of scheduled execution and
merge them together in a way that is almost entirely transparent to the code
spawning sub-tasks. The co_await syntax is then acting as a somewhat-lame-but-
workable-I-guess substitute for do notation from Haskell. In a perfect world,
of course, this would be almost as transparent as exceptions are, which are
themselves another interesting form of monad.)

~~~
tmandry
The concept of a pseudo-thread you're referring to is a task. A task contains
a whole tree of futures awaiting other futures. So no manual propagation is
necessary.

Of course, it's possible for tasks to spawn _other_ tasks that execute
independently. (To be clear, if you are awaiting something from within your
task, it is _not_ a separate task.) For spawning new tasks, there's a standard
API[1], which doesn't include any executor-specific stuff like priority.
You'll have to decide what you want the default behavior to be when someone
calls this; for example, a newly spawned task can inherit the priority of its
parent.

To get more sophisticated, you could even have a "spawn policy" field for
every task that your first-party code knows how to set. Any new task spawned
from within that task inherits priority according to that task's policy. The
executor implementation decides what tasks look like and how to spawn new
ones, so you can go crazy. (Not that you necessarily should, that is.)

To summarize the Rust approach, I'd say you have 3 main extension points:

1\. The executor, which controls the spawning, prioritization, and execution
of tasks

2\. Custom combinators (like join_all[2]), which allow you to customize the
implementation of poll[3] and, say, customize how sub-futures are prioritized
(This is at the same level as await, so per-Future, not per-Task.)

3\. Leaf futures (like the ones that read or write to a socket). These are
responsible for working with the executor to schedule their future wake-ups
(with, say, epoll or some other mechanism). For more on this, see [4].

[1]: [https://doc.rust-
lang.org/1.28.0/std/task/trait.Executor.htm...](https://doc.rust-
lang.org/1.28.0/std/task/trait.Executor.html#tymethod.spawn_obj)

[2]: [https://rust-lang-nursery.github.io/futures-api-
docs/0.3.0-a...](https://rust-lang-nursery.github.io/futures-api-
docs/0.3.0-alpha.11/futures/future/fn.join_all.html)

[3]: [https://doc.rust-
lang.org/1.28.0/std/future/trait.Future.htm...](https://doc.rust-
lang.org/1.28.0/std/future/trait.Future.html#the-poll-method)

[4]:
[https://boats.gitlab.io/blog/post/wakers-i/](https://boats.gitlab.io/blog/post/wakers-i/)

~~~
saurik
Thank you so much for the context here!! And yeah: a big terminology clash is
that many of the libraries for C++ come from a common lineage (almost entirely
from Lewis Baker, who has been involved in the STL abstractions, wrote
cppcoro, and then got allocated by Facebook to work on Folly) and use the term
"task" to essentially mean "a future that you can efficiently co_await". What
I'm seeing so far seems reasonably sane and arguably similar to the
abstraction I have been building up using lower-level components in C++; which
all makes me very happy, as I'm anticipating eventually wanting to rewrite
what I have done so far in Rust.

------
continuational
I don't quite follow. What exactly is the overhead that other languages have
for futures that is eliminated here?

~~~
weiming
Going down the same rabbit hole earlier this week, found this to be a good
explanation:

 _All of the data needed by a task is contained within its future. That means
we can neatly sidestep problems of dynamic stack growth and stack swapping,
giving us truly lightweight tasks without any runtime system implications. ...
Perhaps surprisingly, the future within a task compiles down to a state
machine, so that every time the task wakes up to continue polling, it
continues execution from the current state—working just like hand-rolled
code._ [1]

[1] [https://aturon.github.io/blog/2016/09/07/futures-
design/](https://aturon.github.io/blog/2016/09/07/futures-design/)

~~~
davidw
> Perhaps surprisingly, the future within a task compiles down to a state
> machine, so that every time the task wakes up to continue polling, it
> continues execution from the current state

How are those tasks implemented, and what's scheduling them?

~~~
weiming
You need a third-party library to provide an executor. Rust does not come with
one to keep the runtime size small. The community seems to have centralized on
[https://tokio.rs/](https://tokio.rs/) (under the hood it uses epoll/whatever
OS-specific functionality to schedule M:N)

See:
[https://news.ycombinator.com/item?id=20722297](https://news.ycombinator.com/item?id=20722297)

~~~
davidw
How's that play out in terms of, say, performing some long-running
calculations, rather than something that performs IO?

~~~
ori_b
You're essentially implementing statically scheduled cooperative userspace
threads on top of a single thread with async/await.

So, for computation, this is a bad solution.

------
emmanueloga_
Related questions: does anybody know:

1) which was the first language that introduced the async/await keywords? I
want to say C#, but I’m not sure.

2) are there any papers that describe the code transformation to state machine
that is commonly performed to implement these keywords?

~~~
TkTech
For #1, there's a well-sourced Stack Overflow post
[https://softwareengineering.stackexchange.com/a/377514](https://softwareengineering.stackexchange.com/a/377514).

~~~
emmanueloga_
Cool, I'll take a look!

I was quickly searching around and found this paper [1]:

"Pause 'n' play: formalizing asynchronous C#".

It looks promising, although it is behind a paywall ;-(. Also, the keyword
"formalizing" tells me that maybe this goes a bit deeper than the kind of
description I'm looking for...

1:
[https://dl.acm.org/citation.cfm?id=2367181](https://dl.acm.org/citation.cfm?id=2367181)

~~~
girvo
sci-hub.tw usually helps with getting papers that are behind paywalls!

------
fnord77
I'm a big user of Rust, but I'm kinda dismayed that async IO wasn't part of
the original design.

It's nice they're making Futures a zero-cost abstraction, but it feels like it
is at the expense of ergonomics.

~~~
tmandry
Futures were developed outside Rust core, in a third-party library, before
being brought into the language. Working with them in combinator form
definitely _was_ less ergonomic, but async/await fixes that.

~~~
steveklabnik
Fun fact: there _was_ a future type in the rust standard library, long, long
ago.

[https://doc.rust-lang.org/0.10/sync/struct.Future.html](https://doc.rust-
lang.org/0.10/sync/struct.Future.html)

~~~
brson
That version of Rust also did async I/O in the runtime. Async I/O has always
been part of Rust. The model changed because there was too much overhead doing
it the more ergonomic way and it got booted out of the runtime.

~~~
steveklabnik
Yep, this is a great point.

Someday, we should get a book about the history of Rust together...

~~~
tmandry
I didn’t know about this.. I’d love to read that book :)

------
Paul-ish
I think the year is off by 1 in the blog post, or the title needs a 2018 tag.

~~~
giancarlostoro
Former, or that's some impressive time travel knowledge.

~~~
orthecreedence
Rustradamus.

------
MuffinFlavored
How does `yield` work under the hood? Does it add a reference to some array,
with code to loop over all the references with some small timeout until the
reference status changes from "pending" to "completed"?

~~~
TheCoelacanth
It gets converted into a state machine.

    
    
      || {
        yield 2;
        yield 3;
        yield 5;
      }
    

will get converted to a struct that implements the Generator trait[1] with a
resume method something like

    
    
      fn resume(self: Pin<&mut Self>) -> GeneratorState<i32, ()> {
        match self.next_state {
          0 => {
            self.next_state = 1;
            Yielded(2)
          },
          1 => {
            self.next_state = 2;
            Yielded(3)
          },
          2 => {
            self.next_state = 3;
            Yielded(5)
          },
          _ => Complete(())
        }
      }
    

Local variables become fields in the struct and if you have more complex
control flow, the state could end up jumping around instead of just increasing
by one each time. It's nothing that you couldn't write by hand, but it would
be very tedious to do so.

[1] [https://doc.rust-lang.org/beta/unstable-book/language-
featur...](https://doc.rust-lang.org/beta/unstable-book/language-
features/generators.html)

~~~
kzrdude
Just like for async, borrow across yield points here are a special feature
that you couldn't implement in Rust as an open coded state machine, you'd have
to find a workaround.

------
strictfp
We're getting generators as well? Awesome.

~~~
tmandry
Not necessarily. They're an implementation detail of the compiler, and aren't
fully baked yet to boot.

But there's plenty of reason to want generators, including the fact that they
let you build streams. And the fact that async/await relies heavily on them
has pushed the implementation much closer to being ready. I hope we get them
at some point!

~~~
richardwhiuk
They are available in nightly - [https://doc.rust-lang.org/beta/unstable-
book/language-featur...](https://doc.rust-lang.org/beta/unstable-
book/language-features/generators.html) \- but are very unstable.

I think "at some point" is the best answer :)

~~~
steveklabnik
To be clear on how unstable: they have not received an actual RFC yet, so
they’re barely designed at all. It will be some time before they’re stable, as
in some sense, they haven’t even started the process.

I’d imagine that they would go through the stages faster than many other
features, though, given that they implementation will have received a lot of
testing via async/await.

------
Walther
Thank you for the great writeup <3 Eagerly waiting for the upcoming parts -
please keep it up!

