
Async and Await in Rust: a full proposal - KenanSulayman
https://boats.gitlab.io/blog/post/2018-04-06-async-await-final/
======
kibwen
Note that this post is from April, and a lot of experimentation and reams of
discussion and debate have taken place regarding the details of the two RFCs
in question (along with other related RFCs such as the one for the Pin trait).
In fact, one of the RFCs linked at the top of the OP has been closed and
superceded by a newer one: [https://github.com/rust-
lang/rfcs/pull/2418](https://github.com/rust-lang/rfcs/pull/2418) . I've
actually heard that tomorrow (Monday) the futures and networking working
groups will begin a series of regular blog posts on design and implementation
leading up to the release of Futures 0.3 (which IIRC intends to be more-or-
less the final design for Futures 1.0).

As for this article, I _think_ the broad implementation details that it talks
about are largely still accurate, just don't take it as gospel. :)

------
lapinot
And yet we somehow don't acknowledge the fact that this is just do-notation
for some specific monad--yeah, that powerful abstraction that can't be
expressed in Rust because we don't allow higher-order polymorphism. Don't get
me wrong, i'm bitter because i feel like Rust really is almost in the right
direction for the future of language design. Yet there is a long time before
we get a language with a really precise type system (linear, "pure") and thus
with highly efficient code generator (~rust), that at the same time has all
the higher abstraction goodies, maybe even dependent types (cf quantitative
type theory [1]). Of course they took that into account, it's even explicitly
mentioned in the rfc, and it got turned down for being too experimental, but
at some point you gotta make a bold step forward and force people to actually
care about abstractions.

Now that Rust is getting some maturity i worry that it gets too complex in too
much direction just because it's not powerful enough to express the few common
abstraction, of which several instances are being added as distinct concepts
(async&try, const datakind, higher-order poly--especially for lifetimes, named
impls, before-after memory state for references...).

[1] [https://bentnib.org/quantitative-type-
theory.pdf](https://bentnib.org/quantitative-type-theory.pdf)

~~~
withoutboats
I don't have time to write a whole essay, so let me just establish my
credentials:

\- I wrote the generic associated types RFC (how Rust will implement higher
kinded polymorphism).

\- I wrote the const generics RFC (the closest Rust will get to dependent
types).

\- I wrote the async/await RFC, as well as the linked blog post.

That is to say that I am intimately familiar with how Rust's type system can
be extended to support more "powerful" abstractions.

Monads as implemented in pure functional programming languages like Haskell
cannot usefully abstract over asynchronous and synchronous IO in Rust for a
variety of reasons having to do with the way the type system exposes low level
details by virtue of Rust being a systems programming language. I do not
believe that `do` notation could be a useful mechanism for achieving either
the ergonomics or the performance that async/await syntax will have in Rust.

I'm responding to you because you're the top comment, but I could write a
similar response to a lot of comments here. Monads, stackful coroutines, green
threads, CSP, etc - we've heard of them! :) We have well-motivated reasons to
choose async/await: its the only solution that meets our requirements.

~~~
canndrew2016
> Monads as implemented in pure functional programming languages like Haskell
> cannot usefully abstract over asynchronous and synchronous IO in Rust for a
> variety of reasons having to do with the way the type system exposes low
> level details by virtue of Rust being a systems programming language. I do
> not believe that `do` notation could be a useful mechanism for achieving
> either the ergonomics or the performance that async/await syntax will have
> in Rust.

Sorry, but I don't buy it. I had a half-baked, unfinished proposal for an
effects system that would have allowed Rust to implement async/await just as
efficiently (no stackful coroutines) along with any number of other effects
[0]. Maybe it wouldn't have been a good idea due to stretching Rust's
complexity budget too far, but that's very different from saying it's
impossible. Having watched the development of Rust closely I really think that
the design team just didn't understand the theory side well enough to be able
explore the design space here. (I'm not being as critical as I might sound, PL
theory is _hard_ and the Rust devs have wielded it _much_ more competently
than the designers of any other non-research language).

[0] [https://internals.rust-lang.org/t/start-of-an-effects-
system...](https://internals.rust-lang.org/t/start-of-an-effects-system-rfc-
for-async-etc-is-there-any-interest-in-this/7215)

~~~
withoutboats
The first part of your comment is unresponsive to mine; the last part is
pretty rude & factually wrong (we are not _motivated_ to implement an effect
system right now; we understand the theory).

Sticking to the first part: an effect system is not what the user I was
responded to was talking about. They were talking about building do notation
on top of type classes with higher kinded polymorphism, which cannot
effectively abstract over the monadic operations in Rust.

~~~
canndrew2016
> The first part of your comment is unresponsive to mine;

I interpreted OP's comment as complaining about the lack of more general
abstractions in Rust that would allow you to implement async/await. Your
comment specifically mentioned Haskell-style monads (eg. a `Monad` trait), but
that's not the only way to implement something like this.

> the last part is offensive & wrong

Quoting steveklabnik:

> it’s an open research problem if do notation can work in Rust. Until that’s
> solved at all, we’re just not sure it’s possible. ... "Open question"
> doesn't mean "impossible", mind you. But nobody has ever come up with a
> design. In the meantime, we have users to support...

Isn't this what I was saying? "We don't know how to do it, so we're going with
the easier option."

 __Edit: __To be clear, I don 't think async/await we've ended up with is
necessarily in the wrong direction. But I also don't think that "we thoroughly
explored the design space of do/monads/effects and concluded that they were
impossible to implement ergonomically/efficiently" is really true.

~~~
Manishearth
"impossible" is a highly contextual term here. Adding this to Rust isn't
"impossible", of course it isn't. We can "just" slowly turn Rust into Haskell
using the edition mechanism. Done.

When folks say something is "impossible" in such a context, they mean "given
the constraints", which include goals the lang team has for the language. An
effects system is pretty heavyweight and may violate these goals.

~~~
withoutboats
I think that there is not a definition of the Monad trait - not just
undesirable, not possible - that can abstract over all Futures and Iterators
_as implemented in Rust_. You would have to use some kind of trait object &
lose the incredible inlining benefits that Rust gets from how these interfaces
are designed today.

This is separate from effect systems, which I never said was not possible.
rpjohnst's parallel response sums up the key differences between monads and an
effect system.

------
wruza
This thread is so confusing to me. Not because of complex differences in a
language theory, but on choices that developers do follow due to experience in
‘the past’, regarding light threads. I understand that rust has no stdlib,
that go has, and that js just doesn’t have switchable stacks. But _why_ does
almost everyone inclined to async-await in general? The answers I got in other
threads (not on this exact question, but along the lines) is that async-await
makes runloop-tearing points explicit. But is it really so important? How does
a regular js guy manage a state incapsulation in-between their futures’
callback invocations? All the code I see is then-then-catch and it doesn’t
account for, well, asynchronous effects like races; it just occurs naturally
by not modifying other chains’ states. Why not just go with seamless
coroutines then? Why making functions explicitly async? (Or sync, if that
matters.)

Maybe I’m misunderstanding something, but since people here are so fluent in
concurrent execution, can someone point me to an in-depth explanation why are
light-threads, coroutines, current-continuations, etc. so opposed by async
keyword (and futures in general) today?

I have some experience with low-level runlooping via coroutines in luajit and
understand it to the point to be able to create asynchronous system,
consisting of mix of os threads and coroutines (and it worked smoothly until
our project was closed due to company’s external issues). I can say, I never
felt the need of something different, neither met the ‘complexity’ of
everything-can-yield rule. And the possibilities that open, i.e. scalability
of simple code around io and cpu cores is just outstanding. I am genuinely
curiuos what’s so great (or different) in futures, which seem to me, for now,
just poor man’s light threads implemented via lexical closure overhead along
with syntactic snow, running on a single-core cpu. This topic seems to be so
narrow that modern google is too shallow to answer that. I believe there
should be a LtU or similar thread that discusses it in classic depth. Thanks
in advance!

~~~
dpc_pw
Note sure if I fully grasped your quesiton but here it goes.

Light threads and stackfull coroutines require stack allocation. That is their
cost and it is unbearable for system-level language priding itself in zero-
cost abstractions. Eg. AFAIK, it's the main reason that is making Go calling C
code slow.

Also, (again AFAIK) Rust stackless coroutines and futures compile down to
state machines, so I don't understand the "lexical closure overhead".

There's nothing forcing futures to be "running on a single-core cpu" in Rust,
since Rust can reason about thread-safety.

Personally, after couple of months of using JS at dayjob, I very dislike JS
(as I thought I would), but I love coroutines/yield. I think they will be
glorious in Rust: they will allow writting reasonably nice code with an
amazing performance.

~~~
wruza
To clarify: not why extreme requirements discourage heavier methods, but why
people don’t use them for daily jobs that don’t require anything than fastest
delivery time.

------
gpderetta
I'm somewhat disappointed that rust is going with async style stackless
continuations. I'm a huge fan of stackfull coroutines/continuations as they
are much more elegant and flexible. The downside is that they need a full
stack, but I strongly believe (but can't prove) that rust has enough
annotations and lifetime capabilities that it should be able to guarantee
single frame allocation (or even no allocation for fully scoped coroutines,
like many generator use cases) in every situation that the async model would.

There is an ongoing discussion in the C++ world between traditional C# style
async, a more extreme non-type erased version (similar to the rust implemation
I think but unsafe) and stackfull coroutines. Some (like me) hope that an
hybrid solution might be possible.

~~~
pcwalton
We already have stackful coroutines. They're called threads. If for some
reason you want M:N threading, we have that too, via the mioco library. (If
you think mioco's M:N threading will provide far superior performance to
regular 1:1 threads, though, you will probably be disappointed.)

If you look at the performance numbers of these approaches, you'll see why
stackless coroutines are desired.

~~~
gpderetta
I want stackfull coroutines, with custom, fast user space scheduling and task
switching with guaranteed optimisation to a single stack frame and no
allocation where possible.

I also want the ability to convert internal iterators to internal iterators
with no overhead and even (especially) if the internal iteration function has
not been specifically marked (i.e. no red/blue functions).

Hey, a man can dream.

~~~
eddyb
> guaranteed optimisation to a single stack frame and no allocation where
> possible

You are literally describing _stackless_ coroutines. And the generator state
transform _is that optimization_.

If you want to get this _without_ using generators explicitly, it's still
stackless coroutines just not how Rust supports stackless coroutines. There
was some discussion about making it more implicit but no progress was made in
the implicit direction.

~~~
gpderetta
Very much not. I want first class stackful continuation semantics that behave
as stackless in at least all (but ideally more) scenarios where a stackless
continuation would.

------
AndyKelley
This is exciting. It allows programmers to model their problems in code that
fully accounts for the parallelism.

For comparison, here is async/await in Zig:
[https://ziglang.org/documentation/master/#Coroutines](https://ziglang.org/documentation/master/#Coroutines)

Zig decided to go the other way - when you async call a function, it _does_
eagerly evaluate until the first suspend point. This is less overhead than
immediately suspending, plus it removes the dependency of the language feature
on a userland event loop. Users who want the immediate suspend feature can
call a userland utility method of the event loop which suspends and then tail
calls the async function in question.

~~~
dom96
It's exciting to see so many systems programming languages implement
async/await.

For a further comparison, Nim works in the same way as Zig when it comes to
eager evaluation. Something that I'm particularly proud of when it comes to
Nim's async/await implementation is that everything, right down to the macro
which defines what `await` means, is implemented in the standard library. The
compiler only implements the coroutines. This means that the language isn't
bloated by this extra feature and makes it much easier for developers to
implement their own async/await.

I believe there was talk to do the same in Rust, but for some reason the
developers decided to implement it in the compiler instead.

~~~
aidanhs
Can you clarify what youre referring to with "decided to implement it in the
compiler instead", be it from this blog post or elsewhere?

Clearly you need coroutines in the compiler, and the way Rust treats ownership
means pinning is needed in some way (I've not been following closely so don't
know if this is in libstd or a language-level feature), but the rest
(according to this post) seems to be going in libstd so you can write your own
if you want.

~~~
steveklabnik
Pinning is almost entirely a library feature; it does use the unstable “auto
traits” functionality of the language, so it’s a _bit_ special in that regard.
But it’s provided by the standard library.

------
kbumsik
As an embedded system dev, I'm wondering if these async features can be
implemented on bare-metal (or without runtime)? Maybe I'm dumb and it might be
a wild thought but it would be great if I could easily integrate async
language features with hardware interrupts.

~~~
Nemo157
Definitely. We’re currently blocked with the builtin await using thread local
storage, but that’s planned to be removed and replaced with something that
will work without an OS before stabilisation.

I have had the old macro based async code in Rust running on a Cortex M
device, completely runtime free. Once the TLS stuff is sorted I plan to port
this forward to work with the builtin syntax.

~~~
bluejekyll
What’s the TLS stuff to be sorted out?

~~~
steveklabnik
IIRC, the initial implementation of async/await requires TLS. Eventually it
won’t.

~~~
bluejekyll
Oh! For some reason I jumped to Transport Layer Security, not Thread Local
Storage... duh.

Yeah the Pinning stuff is supposed to help with this as I understand.

~~~
steveklabnik
Ah! Super reasonable, yeah. It can be confusing.

------
TekMol
I'm waiting for the day when 'async' will be the default function type and
'await' will be the default type of function call.

If anywhere down the callstack a function needs to await something it has to
become an async function. And this needs to be done to the whole callstack
recursively. So over time more and more functions of every codebase turn into
async functions.

~~~
quotemstr
Having made "async" the default, you've just returned to regular threading,
which is what we should have stuck with all along. The thread model is
actually pretty useful, the pitfalls are well understood, and the tooling very
mature.

~~~
anko
Threading sucks. It's a situation where you have some external process (the
operating system) deciding when different pieces of work should be woken up,
with no way of feeding this back (so it just bases it on relatively naive
schedulers).

In practice, most of your threads are in one form of wait loop or another and
you've just got polling both inside the threads and with the scheduler.

Have a look at Erlang if you want a better model :) the erlang "processes"
(different to os processes) can intelligently only wake up when there is work
for them to do.

For a language to efficiently use cores, it really needs to include it's own
scheduling.

~~~
weberc2
To pick some nits and generally elaborate, Erlang's VM also has a scheduler;
it's not the presence or lack of a scheduler, it's how efficient it is and
what guarantees it allows the programmer to make about their system. For
example, OS schedulers are typically pre-emptive, which means your OS thread
can get interrupted anywhere. On the otherhand, Go's scheduler (I'm using Go
because I'm more familiar with it than with Erlang) only allows context
switching at well-defined points in your program. Further, operating system
threads have more overhead than in Go (presumably also Erlang) because they're
fixed stack size (yes, I know this isn't true for all OSes).

~~~
imtringued
>only allows context switching at well-defined points in your program.

Erlang works the same way. The VM scheduler will only context switch on a
function call. for or while loops don't exist in Erlang which means there is
no risk of blocking the scheduler.

~~~
weberc2
I assumed it must, I just wasn’t familiar. Thanks for clarifying!

------
Q6T46nT668w6i3m
I just finished reading the proposal and I’m super impressed and even excited.
I really appreciate the commitment to not leaking implementation details into
the standard library (i.e. exposing only the interface). This is fantastic for
both embedded and (operating) system applications. Nice work everyone.

------
nercury
I want to reflect on that this is still done without any garbage collector.

~~~
UncleEntity
Why would anyone assume that a GC is needed for async/await?

The Future object seems to do the all bookkeeping needed by the borrow checker
(which I believe is what rust uses to track the lifetime of objects).

~~~
Matthias247
Maybe because most future implementations (e.g. in Javascript, the proposed
C++ solution, Seastar, etc) work by scheduling a continuation on an event
loop. Which obviously requires at least some form of dynamic memory for
queuing up these continuations.

~~~
gpderetta
I think seastar futures are allocation free. Standard c++ futures are not
really a paragon of efficiency or good design. Still no GC though.

~~~
Matthias247
I haven't read the code, but there are 2 areas where I expect allocations:

\- The continuation which is passed to .then, and which is typically a
closure, must be type-erased, which requires an allocation. Storing the
continuation in a std::function would allocate too. A short glance at
[https://github.com/scylladb/seastar/blob/master/core/future....](https://github.com/scylladb/seastar/blob/master/core/future.hh)
also confirms that there is a make_unique there.

\- Since continuations are most likely not called inline on completion but
deferred into the next eventloop iteration there needs to be a dynamically
sized queue to hold the ready continuations. I am not 100% sure if that's the
case for seastar too, but I would guess so.

~~~
gpderetta
I do not think the continuation is type erased. I belive that then, at least
optionally (depending on the actual futurator passed in) can returns a future
whose type encodes the continuation type and stores it inline.

The queue might be dynamically sized which might eventually require
allocation, but that can be ammortized across many futures. A large enough
queue might never require reallocation.

I'm also not 100% sure as I have never used seastar.

------
Const-me
I think the hard part here is not language design, it’s implementation of the
runtime.

Linux is especially problematic, asynchronous IO has arrived late, years later
than IOCP and kqueue. It took multiple kernel versions to make it usable, and
still the APIs are questionable, e.g. files and sockets use different ones.

Even MS failed to do it right, see a bug I found:
[https://github.com/dotnet/corefx/issues/25066](https://github.com/dotnet/corefx/issues/25066)

------
EGreg
I know this may sound silly, but can someone make a quick rundown of the pros
(and maybe cons) of Rust as compared to NodeJS, Go and Erlang? Why would
people use it as opposed to these far more mature ecosystems, especially if
it’s hard to master based on the comments I have seen from Rust users? (Not
trying to be biased, actually want to ask people who do choose it.)

~~~
Retra
Hard to master doesn't mean it isn't worthwhile to master. And the only reason
the Rust ecosystem is not more mature is because not enough effort has been
put into it. By learning the language and working in it, both of the problems
you point out will solve themselves.

This is a classic "being traffic" comparison. If you're in a traffic jam, you
are as much the cause of it as anyone else is. Likewise, if the ecosystem is
immature, you are as much the reason for that as anyone else who doesn't
participate in it.

~~~
EGreg
The same could be said about any charity or relief effort, but typically
people join the ones that are already easy to join.

~~~
Retra
I wasn't meaning to imply any obligation that you should be using Rust or
anything. My point was only that the reasons people do things are already very
familiar to you, so it doesn't do much to be incredulous of their motivations.
I don't watch baseball, but I understand why people do it even if I can see no
real value in it. People are motivated by a rational advantage far less often
than by things like accessibility, familiarity, habit, curiosity, popularity,
culture, counter-culture, identity, or need -- all of which are reasons that
people use programming languages not present in your list.

Perhaps this is just obvious. But I do frequently see people pretending to be
objectively objecting -- complaining about traffic that they are themselves
the cause of as if they didn't already know the answer to their question.

------
api
How about lightweight threads, or are they there already?

Going to Go with its LWT goroutines was awesome. Would never want to go back
to async, which is really just a manual way of implementing LWT.

~~~
steveklabnik
You can sort of think of tasks as green threads. It just really depends on
exactly how you define your terms; this area has a ton of similar sounding
terms that different people define differently.

The downside of Go’s approach is the overhead of calling into C; Rust can’t
afford this. Go can. IMHO both languages are making the correct choices with
regards to their constraints.

------
majewsky
Usually we add notes like (2016) to old articles. In this case it would be
appropriate to add (April 2018) to the title given how fast the futures
ecosystem is moving in Rust.

~~~
pyed
2018 is assumed

~~~
gpm
But April isn't, and (April 2018) is less confusing than just a bare (April).

------
he0001
Why does it have to be either one? Why not the possibility of choosing the
implementation you want to solve the problem? A language where you could
easily run whatever you want would have my vote. Not having it shoved down my
throat (I’m looking at you JS). While it’s possible to build most of it
yourself you should have the possibility to choose.

~~~
sametmax
Be careful with this. I'm not saying you should not do it, but consider you'll
have to design it very carefully.

In python you can chose what event loop to hook to async/await, and hence we
have gevent, qt, uvloop, twisted, asyncio, tornado, trio and curio as
competing implementations.

They are very difficult to mix, and their ecosystems are mostly isolated,
dividing the man power to add features, fix bugs, provide support, create libs
or frameworks and write docs or tutorials.

Another problem is that you have (except for gevent which causes other
problems by monkey patching the stdlib) to setup the event loop explicitly.

Those mechanisms are complicated, easy to get wrong, confuse beginners, make
docs introduction long and annoying or misleading before getting to anything
interesting.

E.G: to use asyncio, you are exposed to an event loop, an event loop policy,
awaitable, coroutines, coroutine functions, futures, tasks and task factories.

But there is worse... You can setup any event loop any way and time you want,
and because the api is public, another lib can come and swap it. No lib to my
knowledge provide any form of locking.

This leads to some weird situation where libs are considered so low level you
end up writing wrappers on top of it (e.g:
[https://github.com/Tygs/ayo](https://github.com/Tygs/ayo)) just to be able to
start using it sanely.

Now compare with JS.

I'm really not a fan of the language. However, even when you can't use
async/await, using a promise is straightforward. You don't have to bother
about creating the loop, starting it, stopping it, cleaning after it has
stopped. You don't have to wonder if somebody is going to swap the loop. You
don't have to get a reference to a loop to schedule anything. Actually you can
mostly ignore the loop and just code the solution to your problem.

Now this makes JS dependent on one loop implementation for each runtime. Also
you can't code any new async feature in JS, only use the existing ones. It's
probably not what you want for a language like rust.

However, you should learn from the python ecosystem fragmentation and overly
exposed low level API to avoid the same mistakes.

Somebody that just wants to use async/await should not have to learn how the
implementation works in details, nor take so much precaution to avoid
implementation lock in or break somebody else work.

And you really want a federated ecosystem. Having 7 incompatible websocket lib
sucks.

~~~
timClicks
Concur.

Although perhaps the situation in Python could have been mitigated somewhat if
a protocol was defined early that implementations could have adopted? Context
managers, iterators and decorators all work nicely together, even across
language boundaries via the extensions API.

~~~
sametmax
> Although perhaps the situation in Python could have been mitigated somewhat
> if a protocol was defined early that implementations could have adopted?

Yes. That's one thing the rust community has to get right.

async/await was supposed to be that, but it's only a protocol to define what
blocks/doesn't and when you allow context switching. An event loop also has
the notion of scheduling, getting a reference to what is scheduled, request
the result or error on said scheduled thing, or cancel it. And even loop must
bridge different implementations of concurrency (e.g:
asyncio.run_in_executor). An event loop also has a life cycle, which includes
at the very least a setup and a tear down. An event loop must integrates in an
environment, like what do you do when you have several loops, or if you run
one loop per threads ?

So you need to define a general behavior for all that. Then let anyone write
the implementation the way they want.

~~~
steveklabnik
That’s all of this: [https://doc.rust-
lang.org/nightly/core/task/](https://doc.rust-lang.org/nightly/core/task/)

~~~
sametmax
That's an API. Like asyncio in Python is an API. If you make the event loop
swapable without making this API mandatory however, it's never going to be a
protocol.

~~~
steveklabnik
The api is mandatory; it’s how async/await works.

~~~
sametmax
So everybody has to implement a task for any alternative event loop and accept
tasks from other implementation ?

~~~
pimeys
Yep. You build your event loop to take structs implementing the Future trait.
The futures don't care about the executor, and it's easy to switch from an
event loop running on the current thread to a thread pool, if needed.

~~~
sametmax
Great. I wish we have done the same.

