Hacker News new | past | comments | ask | show | jobs | submit login
Async and Await in Rust: a full proposal (boats.gitlab.io)
388 points by KenanSulayman 8 months ago | hide | past | web | favorite | 190 comments

Note that this post is from April, and a lot of experimentation and reams of discussion and debate have taken place regarding the details of the two RFCs in question (along with other related RFCs such as the one for the Pin trait). In fact, one of the RFCs linked at the top of the OP has been closed and superceded by a newer one: https://github.com/rust-lang/rfcs/pull/2418 . I've actually heard that tomorrow (Monday) the futures and networking working groups will begin a series of regular blog posts on design and implementation leading up to the release of Futures 0.3 (which IIRC intends to be more-or-less the final design for Futures 1.0).

As for this article, I think the broad implementation details that it talks about are largely still accurate, just don't take it as gospel. :)

And yet we somehow don't acknowledge the fact that this is just do-notation for some specific monad--yeah, that powerful abstraction that can't be expressed in Rust because we don't allow higher-order polymorphism. Don't get me wrong, i'm bitter because i feel like Rust really is almost in the right direction for the future of language design. Yet there is a long time before we get a language with a really precise type system (linear, "pure") and thus with highly efficient code generator (~rust), that at the same time has all the higher abstraction goodies, maybe even dependent types (cf quantitative type theory [1]). Of course they took that into account, it's even explicitly mentioned in the rfc, and it got turned down for being too experimental, but at some point you gotta make a bold step forward and force people to actually care about abstractions.

Now that Rust is getting some maturity i worry that it gets too complex in too much direction just because it's not powerful enough to express the few common abstraction, of which several instances are being added as distinct concepts (async&try, const datakind, higher-order poly--especially for lifetimes, named impls, before-after memory state for references...).

[1] https://bentnib.org/quantitative-type-theory.pdf

I don't have time to write a whole essay, so let me just establish my credentials:

- I wrote the generic associated types RFC (how Rust will implement higher kinded polymorphism).

- I wrote the const generics RFC (the closest Rust will get to dependent types).

- I wrote the async/await RFC, as well as the linked blog post.

That is to say that I am intimately familiar with how Rust's type system can be extended to support more "powerful" abstractions.

Monads as implemented in pure functional programming languages like Haskell cannot usefully abstract over asynchronous and synchronous IO in Rust for a variety of reasons having to do with the way the type system exposes low level details by virtue of Rust being a systems programming language. I do not believe that `do` notation could be a useful mechanism for achieving either the ergonomics or the performance that async/await syntax will have in Rust.

I'm responding to you because you're the top comment, but I could write a similar response to a lot of comments here. Monads, stackful coroutines, green threads, CSP, etc - we've heard of them! :) We have well-motivated reasons to choose async/await: its the only solution that meets our requirements.

Didn't mean to bash the lang design team at all, actually i was kinda hoping for you folks to reply interesting and tricky stuff about how things are not so simple.

This is probably only because you didn't develop the answer fully but i still struggle to see how monads (or other structures in that family) couldn't apply here: they aren't implemented in any way and just provide interface (eg not caring about rust at all, the semantic of that feature here is very monadic, it should mostly behave like the CPS monad). Anyway, i'm gonna stop arguing, look at how it is/will be implemented and hopefully wait for that essay of yours, to have a broader picture.

Here are three problems:

Higher kinded polymorphism results in trivially undecidable type inferences without something like currying; the restrictions needed to support it would be arbitrary and weird given that Rust does not have currying (essentially some rules to reconstruct the restrictions of currying in type operator context).

Instances of both Future and Iterator do not implement the Monad type class as defined in Haskell, because the "fmap" operator does not return the "self" type parameterized by a new type, it returns it own new type. This is because the state machine they represent is leaked through their function signature.

Do notation doesn't work with the imperative control flow that Rust has, which other people have already discussed.

You don't need HKTs to implement 'do' notation. A good example is LINQ query syntax from C#! You actually have full blown monadic comprehension that you can use to easily write 'flattened' (ie, no async pyramid of doom) Future/Promise code with.

A better example would be computation expressions in F#, the async {} and query {} builders are super easy to work with and don't rely on HKT's to work (since there's no such thing in .Net). I think such a design would work pretty well in rust, and doesn't require adding a bunch of single-purpose keywords - something I wish the C#/.Net team did considering F# had async first.

async blocks require keyword in rust because it conflicts with a struct constructor.

F# works around this by having expression builders be normal types, `async` is just an alias for Control.AsyncBuilder (actually, it's an instance of it if I want to be technical), it's not even a reserved word in the language specification. This is why I feel such an implementation would be a great fit for Rust, the parser already has to figure out different contexts curly braces could be used (struct initializers, lexical scoping constructs, closures, the list goes on) - you can avoid adding an extra reserved keywords and get support for more than just async expressions.

Async is strictly more limited than the more generic LINQ comprehension syntax.

Actually in F# computation expressions are way more powerful, as they can define their own "keywords" within given semantics. This is for example, how `query{}` computation expression works - and unlike linq it can support things like left joins natively just because entire sql-like syntax is defined by library, not hardcoded into language. Same for async, yield generators etc.

> Monads as implemented in pure functional programming languages like Haskell cannot usefully abstract over asynchronous and synchronous IO in Rust for a variety of reasons having to do with the way the type system exposes low level details by virtue of Rust being a systems programming language. I do not believe that `do` notation could be a useful mechanism for achieving either the ergonomics or the performance that async/await syntax will have in Rust.

Sorry, but I don't buy it. I had a half-baked, unfinished proposal for an effects system that would have allowed Rust to implement async/await just as efficiently (no stackful coroutines) along with any number of other effects [0]. Maybe it wouldn't have been a good idea due to stretching Rust's complexity budget too far, but that's very different from saying it's impossible. Having watched the development of Rust closely I really think that the design team just didn't understand the theory side well enough to be able explore the design space here. (I'm not being as critical as I might sound, PL theory is hard and the Rust devs have wielded it much more competently than the designers of any other non-research language).

[0] https://internals.rust-lang.org/t/start-of-an-effects-system...

A Rust-flavored effect system is quite a bit different from do-notation. You're probably right that such a thing could achieve the same ergonomics and efficiency as built-in async/await, but you're also probably right that it would be a huge amount of complexity.

When people say that monads and do-notation don't work in Rust, they're talking about a user-level Monad trait and a simple CPS transform built on top of it:

- Such a monad trait is impossible to write even with HKT because it would have to abstract over both Option/Result (type constructors) and Iterator/Future (traits)

- Such a CPS transform would be extremely limited (not composable with built-in loop structures) and/or extremely tricky around TCE, lifetimes, and allocation (the typical type for `>>=` would involve `Fn` trait objects...).

Effects bypass this by leaving the CPS transform in the compiler, instead only exposing delimited continuations to userspace. Which is basically what async/await does, just non-generically.

The first part of your comment is unresponsive to mine; the last part is pretty rude & factually wrong (we are not motivated to implement an effect system right now; we understand the theory).

Sticking to the first part: an effect system is not what the user I was responded to was talking about. They were talking about building do notation on top of type classes with higher kinded polymorphism, which cannot effectively abstract over the monadic operations in Rust.

> The first part of your comment is unresponsive to mine;

I interpreted OP's comment as complaining about the lack of more general abstractions in Rust that would allow you to implement async/await. Your comment specifically mentioned Haskell-style monads (eg. a `Monad` trait), but that's not the only way to implement something like this.

> the last part is offensive & wrong

Quoting steveklabnik:

> it’s an open research problem if do notation can work in Rust. Until that’s solved at all, we’re just not sure it’s possible. ... "Open question" doesn't mean "impossible", mind you. But nobody has ever come up with a design. In the meantime, we have users to support...

Isn't this what I was saying? "We don't know how to do it, so we're going with the easier option."

Edit: To be clear, I don't think async/await we've ended up with is necessarily in the wrong direction. But I also don't think that "we thoroughly explored the design space of do/monads/effects and concluded that they were impossible to implement ergonomically/efficiently" is really true.

"impossible" is a highly contextual term here. Adding this to Rust isn't "impossible", of course it isn't. We can "just" slowly turn Rust into Haskell using the edition mechanism. Done.

When folks say something is "impossible" in such a context, they mean "given the constraints", which include goals the lang team has for the language. An effects system is pretty heavyweight and may violate these goals.

I think that there is not a definition of the Monad trait - not just undesirable, not possible - that can abstract over all Futures and Iterators as implemented in Rust. You would have to use some kind of trait object & lose the incredible inlining benefits that Rust gets from how these interfaces are designed today.

This is separate from effect systems, which I never said was not possible. rpjohnst's parallel response sums up the key differences between monads and an effect system.

I'm scratching my head at the premise of this comment. The grandparent is positing that monads/do-notation would not be useful in Rust; this one appears to be trying to accuse it of saying that an effect system in Rust is impossible? Nowhere does withoutboats say anything about an effect system, or that anything is impossible. Did you reply to the wrong comment?

> Having watched the development of Rust closely I really think that the design team just didn't understand the theory side well enough to be able explore the design space here.

You might want to do a Google Scholar search on Niko Matsakis and Aaron Turon before making silly claims like this.

Out of topic. Wow withoutboats you didn't have an account here before? Welcome! That's great to see you coming here and to see you giving many in-depth and very precise answers.

Have you considered algebraic effects and handlers? If you add a linearity restriction on the return continuations (easily doable with the existing type system of Rust) their implementation is no harder than async/await, yet they can express many useful monadic abstractions.

...I saw this thread too late, hopefully you will still see this comment. I'm genuinely curious.

There's a sibling subthread about this. https://news.ycombinator.com/item?id=17538191

Supporting do notation or not isn’t some sort of position we’re taking as a language; it’s an open research problem if do notation can work in Rust. Until that’s solved at all, we’re just not sure it’s possible.

(Previous version of this comment said "HKT" but that's not purely right; we have a path forward for something at least HKT-like, but do notation is the bigger question.)

Yeah, sure, i understand that, hn comments just need to be a bit spicy (btw, any ref of papers or other stuff in that area? i didn't follow closely but didn't see too much moving, actually i don't even know what the big question marks are). Although i really believe it's not about if but how, so let's hope that when the time comes Rust will be able to take the tough changes necessary to unify the different branches that split off.

Like what is the real road-blocker, since we're talking here about abstractions that are gonna be monomorphised (at most into lifetime-polymorphic function, which anyway get erased) and known statically, right? Is it only a problem of getting the typing rules consistent and robust?

There's multiple issues. The biggest ones that I can recall, off the top of my head:

First off, do notation desugars to closures. But Rust has three different kinds of closures. How does that work?

Furthermore, Rust has imperative control structures. How does all of this interact with each other? Are you now no longer allowed to use those constructs inside of do notation? That feels inconsistent.

I don't know of any languages that have something like do notation but don't have everything boxed. It makes everything easier, but that's not generally acceptable in Rust. For example, we can use async/await with no dynamic allocation. Could we with do notation?

"Open question" doesn't mean "impossible", mind you. But nobody has ever come up with a design. In the meantime, we have users to support...

Thanks, that's interesting, i should dive a bit more into how async is implemented. On "which closure kind to choose" i feel like that's not the worst part (closure syntax already infers which trait to implement), but the fact that it is a closure and how it fares with allocation seems not trivial.

For the moment, you can see the progress on empowering the type system in the RFC for generic associated types, which is intended both to be the simplest extension to the type system that allows for some important patterns (e.g. streaming iterators and collection traits) and to be forward-compatible with any additional extensions in the direction of HKT. Importantly, it's also intended to be intuitive for users who have no prior experience with higher-kindedness and usable without (hopefully) requiring anyone to learn anything about type theory. The RFC text can be read at https://github.com/rust-lang/rfcs/blob/master/text/1598-gene... and comments can be left on the tracking issue at https://github.com/rust-lang/rust/issues/44265 .

Note also that HKT and do notation are their own separate things; you can have HKT without do notation. That's all HKT stuff, not do notation stuff.

If you have HKT then isn't do notation a piece of cake? It just desugars into monadic bind and return operators. Or is there something specific to rust that would mess with that?

Imperative control flow, for one. Think about how you would implement "break" that can break out of multiple nested loops in do-notation…

Please see my comment here: https://news.ycombinator.com/item?id=17537878

I thought I had read at some point that generic associated types we're equivalent to HKTs in terms of what they can express. Or maybe that they were theorized to be so. Is that not the case? (Not critical, just curious)

I have been doing programming, including functional programming for more than two decades, I still don't really know what a "monad" is. Each time someone explains it to me, I understand something different.

The way I accepted it after years of confusion, and without still understand the word: A monad is any type which supports 2 specific operations. One maps an type into it's monadic form, and the other allows to apply functions to monadic types and returns the monadic type again. The latter is sometimes called bind or flatmap.

For Future types, the first thing could be called MakeReadyFuture(type), the second one Future.then(result => functionWhichReturnsANewFuture).

... which allows a generic composition enabling imperative programming in a purely functional language. Haskell has do notation, that sugars the syntax to look like one is writing imperative code, even though it's all about composing Monads.

Monads are containers for which .map, and .flatMap (>>= in Haskell) make sense.

Future, Option, List, State/IO, etc.

And the do-notation that some comments mention become a long list of flatMap/filter/map invocations.

  do {
    futRecord <- getDBrecord(username, passw) 
    uid = futRecord.map(record -> record.id)
    username = futRecord.map(record -> record.name)
    futResult <- setUserLastLoggedIn(uid, now())
    futResult <- checkUserACL(uid, operation.roles)
  } yield username
This is just an abstraction, and it's not exactly a pretty one, but better than the flatMap hell in functional languages.

The do-notation uses the same monad throughout, in this example the Future one. The final yield is a map, the "<-" lines all represent a flatMap.

Finally, Monads are very much like Vector Spaces in Maths. You have Axioms for both, and if the object satisfies them, it's a Monad or Vector Space!



I would remove "containers" from the first line. In my reading it is ANYTHING for which monad laws hold.

For better or worse, "container" is the most intuitive concept to use when trying to explain monads. Yes, there are monad instances that don't really fit the container model, but they're not important at first. Definining "monad" as a thing for which monad laws hold is exactly the sort of almost-tautological non-explanation that makes people roll their eyes at FP aficionados.

However, I agree that the important step in understanding monads as an abstraction is realizing how map and flatmap make sense for many things other than lists.

> Definining "monad" as a thing for which monad laws hold is exactly the sort of almost-tautological non-explanation that makes people roll their eyes at FP aficionados.

Thank-you for crystallizing and articulating a frustration I have often, as a beginner to intermediate Haskeller.

I guess flatMap is very misleading to newcommers. Probably simply calling it chaining, composition, or even "then" would help.

I like chaining, that's a good analogy.

I feel like, now that I have a certain level of intuitive understanding of what bind does, to me, the `>>=` symbol best represents what the operation does. It just looks like some kind of physical gadget that extracts a thing from a container, does something to it, and injects it into the next container. This is pretty ironic given Haskell's (somewhat deserved) reputation as impenetrable symbol soup. I don't know whether I'd endorse trying to use an intuitive mnemonic like this to explain monads to FP beginners.

I know a lot of symbols, but I still get irritated when I encounter a new one, and I can't search for it. (At least math got a nice list: https://en.wikipedia.org/wiki/List_of_mathematical_symbols )

For math it's kind of okay. Because there the point is to talk about that theory. You introduce definitions, and theorems, and use them in proofs. Or in calculations.

And even in math proofs usually come with a lot of explanation. (At least the better ones.)

In software engineering maintainability is important.

And, sure, we can just accept that Haskell is something that you can't learn by looking at real world Haskell code. After all, you can't really learn real world "research math" by looking at it.

But I think real world code, especially one that is looking for maintainers should not err on the side of indecipherability and inapproachability.

I continue to chastise Scalaz for its bad naming and documentation convention. (Haskell is ... well, it's irredeemable.)

And TypeScript is getting into this mess too. The documentation for new releases with new type system goodies (and I really mean it, I like powerful type systems, I just don't want to spend my life on understanding them, I'm happy to use them to get particular jobs done), but with barely enough documentation to let serious TS users with many years of experience understand it.

That's true, although it's better than "bind".

They have something in them (sometimes just a very fuzzy non-explicit state, maybe a seed for a RNG, but there's something there, otherwise it'd be just a value). Every monad is a [type] constructor, wrapping something.

Or could you show a counterexample please?

Several people found my blog entry on this enlightening: http://blog.reverberate.org/2015/08/monads-demystified.html

Just to add more different explanations, since that'll help.


This one is my favourite, and does it well by example and walking through three different cases, then generalizing.

This thread is so confusing to me. Not because of complex differences in a language theory, but on choices that developers do follow due to experience in ‘the past’, regarding light threads. I understand that rust has no stdlib, that go has, and that js just doesn’t have switchable stacks. But why does almost everyone inclined to async-await in general? The answers I got in other threads (not on this exact question, but along the lines) is that async-await makes runloop-tearing points explicit. But is it really so important? How does a regular js guy manage a state incapsulation in-between their futures’ callback invocations? All the code I see is then-then-catch and it doesn’t account for, well, asynchronous effects like races; it just occurs naturally by not modifying other chains’ states. Why not just go with seamless coroutines then? Why making functions explicitly async? (Or sync, if that matters.)

Maybe I’m misunderstanding something, but since people here are so fluent in concurrent execution, can someone point me to an in-depth explanation why are light-threads, coroutines, current-continuations, etc. so opposed by async keyword (and futures in general) today?

I have some experience with low-level runlooping via coroutines in luajit and understand it to the point to be able to create asynchronous system, consisting of mix of os threads and coroutines (and it worked smoothly until our project was closed due to company’s external issues). I can say, I never felt the need of something different, neither met the ‘complexity’ of everything-can-yield rule. And the possibilities that open, i.e. scalability of simple code around io and cpu cores is just outstanding. I am genuinely curiuos what’s so great (or different) in futures, which seem to me, for now, just poor man’s light threads implemented via lexical closure overhead along with syntactic snow, running on a single-core cpu. This topic seems to be so narrow that modern google is too shallow to answer that. I believe there should be a LtU or similar thread that discusses it in classic depth. Thanks in advance!

Note sure if I fully grasped your quesiton but here it goes.

Light threads and stackfull coroutines require stack allocation. That is their cost and it is unbearable for system-level language priding itself in zero-cost abstractions. Eg. AFAIK, it's the main reason that is making Go calling C code slow.

Also, (again AFAIK) Rust stackless coroutines and futures compile down to state machines, so I don't understand the "lexical closure overhead".

There's nothing forcing futures to be "running on a single-core cpu" in Rust, since Rust can reason about thread-safety.

Personally, after couple of months of using JS at dayjob, I very dislike JS (as I thought I would), but I love coroutines/yield. I think they will be glorious in Rust: they will allow writting reasonably nice code with an amazing performance.

To clarify: not why extreme requirements discourage heavier methods, but why people don’t use them for daily jobs that don’t require anything than fastest delivery time.

I have often wondered this as well and never quite found a fully satisfactory answer. I imagine that it has to do with ease of implementation, although there's an obvious and efficient implementation of making coroutines one-to-one with stacks and allowing them to yield between each other. In constrained environments like JVM and CLR, maybe that can't be done, which might explain C#'s choice of async/await. Perhaps in Rust it has to do with the interaction with the borrow checker?

Does async/await alleviate the need for a runtime component? Because then I could see the decision make sense, else I am also really puzzled why anyone would not, given the opportunity, go with green threads (other than complex implementation, maybe).

Yes, it does. Additionally, all of this stuff:


I'm somewhat disappointed that rust is going with async style stackless continuations. I'm a huge fan of stackfull coroutines/continuations as they are much more elegant and flexible. The downside is that they need a full stack, but I strongly believe (but can't prove) that rust has enough annotations and lifetime capabilities that it should be able to guarantee single frame allocation (or even no allocation for fully scoped coroutines, like many generator use cases) in every situation that the async model would.

There is an ongoing discussion in the C++ world between traditional C# style async, a more extreme non-type erased version (similar to the rust implemation I think but unsafe) and stackfull coroutines. Some (like me) hope that an hybrid solution might be possible.

This is already true with “tasks”; the chain of futures has a known stack size when you submit it to the event loop. It allocates the “stack” for the entire thing at once. So while the primitive may not have them, it’s still sorta there. This is only possible because the individual generators are stackless; if they were stack full, each future would get a stack individually.

You emphasize that stackfull coroutines could get down to a single allocation, but I'm not sure exactly what you mean. I could trivially get a stackfull coroutine down to a single allocation, the same way that systems thread do it: by allocating a defined stack size and raising a seg fault when you go over it.

The trick that "stackless coroutines" can achieve, and do achieve in Rust, is not just getting a single allocation, but getting a single perfectly sized alloc every time. It is exactly big enough to hold all the data it could ever need to hold, no more no less. You mention we have a lot of annotations, but the annotations that I know of that can achieve that are annotating every yield point so we can compute the stack space we need to store at that point, which is exactly what async/await notation is for.

By single stack frame I mean exactly that: a single perfectly sized (as it is known at compile time) allocation as opposed to a potentially runtime unbounded list of stack frames requuring a full stack (whatever form that mught take). I think that the optimization of converting the unbound stack to a fixed size stack frame can done for stackfull coroutines iff the continuation of the calling coroutine (i.e. the one that was suspended to activate the curren one) never escapes the coroutine function or any non-mutually recursive function called from the coroutine that the compiler can see though (either via inlineing or interprocedural optimization).

Interestingly this is similar to the analysis required to completely optimize a way the coroutine stack frame allocation. And because rust type system is good at tracking lifetimes and nested scopes, I think it might be possible to extend it ti guarantee this sort of optimization. Unfortunately this way beyond my pay grade.

We already have stackful coroutines. They're called threads. If for some reason you want M:N threading, we have that too, via the mioco library. (If you think mioco's M:N threading will provide far superior performance to regular 1:1 threads, though, you will probably be disappointed.)

If you look at the performance numbers of these approaches, you'll see why stackless coroutines are desired.

I would also mention https://github.com/edef1c/libfringe which pretty much solves context-switches (by replacing them with compiler-generated minimal stack spills/restores).

But as you might be able to tell from their APIs, you still have to allocate stacks somehow.

I want stackfull coroutines, with custom, fast user space scheduling and task switching with guaranteed optimisation to a single stack frame and no allocation where possible.

I also want the ability to convert internal iterators to internal iterators with no overhead and even (especially) if the internal iteration function has not been specifically marked (i.e. no red/blue functions).

Hey, a man can dream.

> guaranteed optimisation to a single stack frame and no allocation where possible

You are literally describing stackless coroutines. And the generator state transform is that optimization.

If you want to get this without using generators explicitly, it's still stackless coroutines just not how Rust supports stackless coroutines. There was some discussion about making it more implicit but no progress was made in the implicit direction.

Very much not. I want first class stackful continuation semantics that behave as stackless in at least all (but ideally more) scenarios where a stackless continuation would.

Undelimited continuations are memory sieves. Delimited continuations are better but a bit less useful.

Genuine question: aren't all one shot continuations (like the ones this thread is about) naturally delimited? Is there even such a thing as an undelimited one shot continuation?

Sure there is: take an undelimited continuation API, and restrict the resulting continuations to be called only once. Most use cases still work just fine, and I think it would be strange to call that anything other than "undelimited one-shot continuations".

Maybe what you have in mind, though, is the traditional argument, that there's no such thing as undelimited continuations. All continuations are naturally "delimited" by the boundary of the language runtime, operating system, etc, even with undelimited, multi-shot continuations. One-shot continuations make this argument more natural, because they have an obvious stack-based implementation for both delimited and undelimited continuations, and stacks are clearly delimited.

I guess that's what I have in mind. Any practical coroutine API has an explicit coroutine creation and yielding point that act as continuation delimiters.

Mm, not quite. Racket's greenthreads are incredible, and they don't use explicit yielding. I've always wanted something similar for JS.

This is exciting. It allows programmers to model their problems in code that fully accounts for the parallelism.

For comparison, here is async/await in Zig: https://ziglang.org/documentation/master/#Coroutines

Zig decided to go the other way - when you async call a function, it does eagerly evaluate until the first suspend point. This is less overhead than immediately suspending, plus it removes the dependency of the language feature on a userland event loop. Users who want the immediate suspend feature can call a userland utility method of the event loop which suspends and then tail calls the async function in question.

> when you async call a function, it does eagerly evaluate until the first suspend point. This is less overhead than immediately suspending, plus it removes the dependency of the language feature on a userland event loop.

This seems wrong. There's no more overhead for Rust's approach- which to be clear immediately suspends only the callee, not the entire stack of async functions. In fact, if you immediately await the future, the control flow is literally no different from eagerly evaluating until the first suspend point.

There is also already no dependency on a userland event loop. This is, again, because only the callee is immediately suspended. It's still entirely up to specific leaf future implementations to interact with (or not) the event loop.

It's exciting to see so many systems programming languages implement async/await.

For a further comparison, Nim works in the same way as Zig when it comes to eager evaluation. Something that I'm particularly proud of when it comes to Nim's async/await implementation is that everything, right down to the macro which defines what `await` means, is implemented in the standard library. The compiler only implements the coroutines. This means that the language isn't bloated by this extra feature and makes it much easier for developers to implement their own async/await.

I believe there was talk to do the same in Rust, but for some reason the developers decided to implement it in the compiler instead.

We did implement it out of tree at first. [0] Because Rust requires annotations on macros to distinguish them from built in syntax, they are less ergonomic to use. In general, we believe in adopting a path of beginning as an out-of-tree macro, and then deciding to elevate that to first class syntax if its clear that they will be extremely commonly used.

[0]: https://github.com/alexcrichton/futures-await

As another comparison, Dart started out delaying execution of async functions and then switched to eager in Dart 2:


Async blocks require a keyword to be unambiguous with a struct constructor.

Can you clarify what youre referring to with "decided to implement it in the compiler instead", be it from this blog post or elsewhere?

Clearly you need coroutines in the compiler, and the way Rust treats ownership means pinning is needed in some way (I've not been following closely so don't know if this is in libstd or a language-level feature), but the rest (according to this post) seems to be going in libstd so you can write your own if you want.

Pinning is almost entirely a library feature; it does use the unstable “auto traits” functionality of the language, so it’s a bit special in that regard. But it’s provided by the standard library.

> Can you clarify what youre referring to with "decided to implement it in the compiler instead"

I'm referring specifically to the async/await syntax. Coroutines probably need to be implemented in the compiler, sure, but the actual async/await can be implemented with metaprogramming.

Can you explain "this is less overhead than immediately suspending"? AFAICT Rust's approach is as low-overhead as it gets, since the inherent separation between future creation and future execution means that future creation doesn't need to have any cost at all. I'm also not sure it makes sense to refer to Rust's behavior as "immediately suspending", since it's not suspending anything: until someone chooses to start executing the async function, there's no state to suspend?

As for the rationale for why Rust is aiming for this behavior, I think this comment buried in the RFC discussion (https://github.com/rust-lang/rfcs/pull/2394#issuecomment-382...) sums it up (and makes note of Dart 2.0 as a contrasting example):

"A fundamental difference between Rust's futures and those from other languages is that Rust's futures do not do anything unless polled. The whole system is built around this: for example, cancellation is dropping the future for precisely this reason. In contrast, in other languages, calling an async fn spins up a future that starts executing immediately."

"A point about this is that async & await in Rust are not inherently concurrent constructions. If you have a program that only uses async & await and no concurrency primitives, the code in your program will execute in a defined, statically known, linear order. Obviously, most programs will use some kind of concurrency to schedule multiple, concurrent tasks on the event loop, but they don't have to. What this means is that you can - trivially - locally guarantee the ordering of certain events, even if there is nonblocking IO performed in between them that you want to be asynchronous with some larger set of nonlocal events (e.g. you can strictly control ordering of events inside of a request handler, while being concurrent with many other request handlers, even on two sides of an await point)."

"This property gives Rust's async/await syntax the kind of local reasoning & low-level control that makes Rust what it is. Running up to the first await point would not inherently violate that - you'd still know when the code executed, it would just execute in two different places depending on whether it came before or after an await. However, I think the decision made by other languages to start executing immediately largely stems from their systems which immediately schedule a task concurrently when you call an async fn (for example, that's the impression of the underlying problem I got from the Dart 2.0 document)."

My understanding is that Rust uses LLVM coroutines[1]. In order to have the behavior where the function does not start executing immediately, you would follow the example that you can find by searching for "injected suspend point, so that the coroutine starts suspended". So what this looks like is:

* Function call

* The coroutine creates its frame (I believe this maps to "future creation".)

* Function return (the injected immediate suspend)

* (sometime later) Put the work on the event loop

* Function call (suspend resume)

* Function executes until a suspend point

Eager execution has no injected immediate suspend, so it looks like this:

* Function call

* The coroutine creates its frame

* The function executes until a suspend point (function return)

That's it. The coroutine is responsible for making sure that it gets resumed appropriately if it suspends. Await doesn't do anything with an event loop; it just suspends and then puts the suspended coroutine handle in the target coroutine's frame with an AtomicRMW. If it turns out the target coroutine already completed, then it grabs the result, destroys the target coroutine, and cancels suspending (which is a jmp instruction). Otherwise the suspend completes and the target coroutine will resume the suspended one when it completes.

> If you have a program that only uses async & await and no concurrency primitives, the code in your program will execute in a defined, statically known, linear order.

This is true with eager execution as well. You can see some examples here [2].

One more side note, how eager execution was valuable to me in the self-hosted compiler:

    pub async fn renderToLlvm(comp: *Compilation, fn_val: *Value.Fn, code: *ir.Code) !void {
        defer fn_val.base.deref(comp);
        // ...
At the callsite we make an async call to renderToLlvm, passing in a ref-counted fn_val. If the body didn't execute immediately, the callsite might deref fn_val and destroy it before renderToLlvm gets a chance to add a reference, but because of eager execution, the ref is guaranteed.

[1]: http://llvm.org/docs/Coroutines.html

[2]: https://github.com/ziglang/zig/blob/363f4facea7fac2d6cfeab9d...

> My understanding is that Rust uses LLVM coroutines[1].

IIRC it doesn't. Coroutines would induce a significant cost (one stack allocation per future) which would make futures unsuitable e.g. for embedded systems.

My understanding is that Rust will compile futures into state machines. So a future would be a closure with a data object that's a sum type with one variant per suspend point, that contains all the variables and references that are in scope at that suspend point. (That's also why Pin needs to be added for futures/generators: This set of variables may be self-referential.)

To clarify, we don't use LLVM's implementation of coroutines. We have our own implementation in rustc.

On top of the sibling comments pointing out that Rust doesn't use LLVM coroutines, which allocate each coroutine frame by default, you misunderstand the control flow of Rust async functions:

A Rust future is also "responsible for making sure that it gets resumed appropriately if it suspends." Rust's await is even cheaper than C++/LLVM's- it doesn't do anything with an event loop either; it just returns, without atomically messing with any coroutine handle (which doesn't exist).

Further, Rust futures are poll-based, not callback-based. Your bullet list looks like this:

* Function call to the main entry point (which is really more of a constructor)

* The future creates its frame as a value type (not as a heap allocation) and initializes it with the call arguments

* Function return (not a suspend, because it returns an initialized future rather than Poll::Pending- so again more of a constructor return)

* (sometime later, often immediately) Function call to `poll` (initial resume)

* `poll` executes until a suspend point (function return)

The initial call to `poll` can happen because the constructed future was put on the event loop and scheduled, as you describe, but it can also happen merely because its caller is also a future that is already running somehow.

Also, note that "the event loop" is a rather loose concept here- it's just "the thing that calls top-level `poll` functions." The language doesn't ever submit anything to it, or signal it, or anything. Each call to `poll` receives a handle to that top-level caller, and each leaf future stashes that handle somewhere that will signal it when it's ready to resume. This even works in embedded microcontroller scenarios.

And finally, your eager execution example is addressed in Rust in two ways:

* First, ownership and the borrow checker prevent dangling references like this statically. In your particular case, the caller would increment the refcount as part of cloning an `Rc`, and then move the clone into the future. (Unless, of course, the future was short-lived enough, and you decided to take advantage of that to pass in a `&'a Value.Fn` instead.)

* Second, because future construction (the first three bullet points) and future execution (the last two) are decoupled, you can write your own function that does any extra construction work in the cases that it's actually necessary. For example, using Rust's syntax for async blocks:

    pub fn render_to_llvm(.., fn_val: *Value.Fn, ..) -> impl Future<Output = ()> {
        async {
            defer fn_val.base.deref(comp);
            // ...

As an embedded system dev, I'm wondering if these async features can be implemented on bare-metal (or without runtime)? Maybe I'm dumb and it might be a wild thought but it would be great if I could easily integrate async language features with hardware interrupts.

Definitely. We’re currently blocked with the builtin await using thread local storage, but that’s planned to be removed and replaced with something that will work without an OS before stabilisation.

I have had the old macro based async code in Rust running on a Cortex M device, completely runtime free. Once the TLS stuff is sorted I plan to port this forward to work with the builtin syntax.

I don't understand why people don't just support TLS in non-userspace code. It's so convenient for a bunch of things, and sadly Rust, for now, has nothing in between "fully explicit argument passing" and "scoped global state".

Super excited about that! This will be a major feature for embedded devices which perform network I/O, be it Internet of Things or industrial control over Ethernet.

What’s the TLS stuff to be sorted out?

IIRC, the initial implementation of async/await requires TLS. Eventually it won’t.

Oh! For some reason I jumped to Transport Layer Security, not Thread Local Storage... duh.

Yeah the Pinning stuff is supposed to help with this as I understand.

Ah! Super reasonable, yeah. It can be confusing.

I'm not familiar with the way rust is going, but in general Async can be implemented purely a compile time by turning an async function into a set of functions tail calling each other or a switch statement. The stack frame need to go somewhere, but with allocator support this does not need to be exclusively handled by the runtime.

Async/await is basically sugar for generators and futures.

This is likely doable if no I/O is involved in the scheduling of coroutines. Usually it is.

That's completely orthogonal though.

If you're interested in async features for the sake of concurrency, you might want to give synchronous concurrency a look - Céu is a very recent entry into this paradigm[0]. It specifically targets the embedded space due to its minimal overhead compared to other concurrency solutions. Also, there is a version that supports libuv for added async stuff[1].

[0] http://ceu-lang.org/

[1] https://github.com/ceu-lang/ceu-libuv

Pretty sure you need at least an operating system and memory allocation

Futures have landed in libcore, and do not inherently allocate.

You can write an executor with a fixed-size queue as well, and not require dynamic allocation at all.

Being able to do this is a hard constraint on the design.

If I’m reading the proposal correctly, it’ll be possible and even straightforward because the standard library only provides the interface. You’ll be able to swap implementions by replacing the package with whatever fulfills the interface requirements.

I'm waiting for the day when 'async' will be the default function type and 'await' will be the default type of function call.

If anywhere down the callstack a function needs to await something it has to become an async function. And this needs to be done to the whole callstack recursively. So over time more and more functions of every codebase turn into async functions.

That model is called green threads.

Continuation passing style means passing along your continuation to your callee, and is the implementation of the async model. But normal calls do pass their continuation: the return address, pushed on the stack, is the continuation function, and the stack itself is the scope, and together they are a closure. Make the stack a first class value that can be switched, and you get to continuations - and that's what the runtime gets with green threads.

It means much harder interop because you can't have native frames on your stack. But you have the same problem with async.

Green threads is very different from CPS actually both from an implementation and semantics point of view. In a CPS model the continuation is first class and is available at every single call point. In practice this would require heap allocating every single function frame, which is expensive and and a huge problem for interoperability (a smart compiler can turn it back to a traditional stack if the continuation is only used in traditional ways)

With the normall call/return model, the implicit continuation in a normal call stack can only be accessed via return. Green threads do allow accessing the continuation at specific yield points (although most green thread implementations do not expose it to the user) but because the yield point are much fewer than every single call, a whole stack can be allocated and used in one go. There aren't in fact many issues with interoperability as foregin functions or even os calls can be accommodated on this stack.

Async is a tradeoff, basically the programmer is responsible of marking functions that are to be CPS transformed and that need their stack frame reified. This way no full stack is needed nor there are the performance and interoperability issues of full CPS. You end up with the blue/red functions issue though.

If you want call/cc (for repeated executions of the continuation) you need a first class stack. A CPS execution model can use green threads right up until you want that explicit continuation in hand and split the stack. In most implementations of CPS the continuation is not in fact available for the user to execute: stack-oriented programming is the mental model, until you pull out call/cc or something similar.

Note that you don't get the continuation in hand for the async-everywhere advocated by GP, only blocking operations get the continuation, and most of those are implemented by the runtime.

Async being a “default” function type doesn’t make much sense unless you’re doing io. If you’re just computing, you often can’t proceed without the result and want to proceed as soon as you have the result—i.e. blocking. Both are critical to getting good performance.

I’m curious if discerning which is the best type is possible to tool automatically with static or tracing analysis.

Generators, a-la python for example, are completely equivalent to async functions and have nothing to do with IO.

True! However, I believe rust has a separate, distinct yield/generator functionality coming that should address those needs separately. I think off-hand it’d be pretty easy to wrap them in a stream, or to join a stream into a generator, if you did want to compose them.

That’s correct; async/await uses Generators under the hood, but we haven’t stabilized them yet. It’s the plan to though!

One thing I like about Go is that there is no dichotomy between sync and async. Every function call presents a sync interface, but under the hood all I/O is async. I really miss this in Python, where every library has an async twin, usually maintained by someone unafiliated with the sync library.

I think it should be possible to design an async system that is generically async, i.e. whether a given function with no special annotations or syntax is async (including all of the async-able calls that function makes transitively) is entirely determined by the top-level caller at compile time. The called function doesn't need any special syntax, and in fact doesn't even need to consider being async at all: the async-capability bubbles up based on what other async-able functions it calls. Only the very ends of the chain: at the bottom doing IO via system calls, or at the very top where request handlers are first dispatched (for example) should need to think about being asynchronous or synchronous at all; everything in-between is compiled for whatever the top wants.

To allow for separate compilation you need to compile each function twice though: a CPS version and 'classic' one. This can be wasteful. You also need annotations in the object file to describe the frame size of each CPS function. At some point this was seriously considered for C++ and I'm sure some language implementations actually do it.

Or you go with cactus stacks which have their own set of issues.

> you need to compile each function twice ... This can be wasteful.

More wasteful than writing each function twice? See C# for endless examples of libraries that have manual duplicate X() and XAsync() for every single method. If you compare the two they're almost always have the exact same body, except the async version has "async" and "await" peppered in and calls duplicate XAsync methods (which are implemented the same, except they call XAsync methods... you might be seeing a pattern here).

If I'm not mistaken most rust doesn't use separate compilation and is built from source. In this case the compiler can treat it just like normal generics and just not compile it until it's used. And even with separate compilation for, e.g. a library, a very basic LTO would trivially trim out the unused methods when compiled into a final binary. And if you don't want 2x library itself, even then you can just use an escape hatch to turn it of on the crate/impl/function level and it's still better for you than the status-quo where you have to manually add a bunch of tokens and junk that the compiler already knows how to do precisely.

If anything is wasted it's human effort because we're wasting our time doing something the computer can do better and faster.

> "annotations in the object file to describe the frame size"

Wouldn't you need this anyway for anything async, even if it's manual? This seems like an argument against async in general, not automatic async.

I quite dislike the current async trend, so you do not have to sell me the alternatives. I was just describing the status quo.

I think that a good compromise would be implicitly inferring the 'async-ness' of template functions instantiation (or whatever they are called in your language of choice) based on a magic continuation parameter, which would also go around the separate compilation issue (assuming the language does generic monomorphization).

Right, that's exactly how I would expect this to be implemented. The lowest-level IO functions (at the level of making syscalls) mark themselves as supporting async: e.g. "asyncable" which means it can be called either sync or async (which may actually be implemented with different os apis). Every function that calls one of these functions sees the results directly as if it were syncronous but is also marked as "asyncable". This "mark" continues up the call chain, completely transparently to the users (unless they use an escape hatch). Near the very top of the program, the main function, an main event loop, your http server thread, etc, the code chooses to call the first function either synchronously or asynchronously and decide details like which executor to use. If your "main" function calls it async, the entire call stack is compiled to use the async versions, all the way down to the syscall functions

The point is that only at the very top of the call stack or the very bottom should you ever need to care or think about whether code is async, because 99% of the time (every bit of code between main and syscalls... which is almost all of it) that's the only place you should need to care. This can be completely automatic because the compiler already knows where to insert await/async, it only needs to know when to use async (decided by main) and what to call at the end (decided by the syscall funcs).

I don't think you'd need to have two versions.

If I have a sync function and I call something async, I should be able to just .wait() it, I think, and get the sync behavior.

Technically you do not, as you pointed out you can use the cps version for everything, but it is suboptimal, especially for interoperability with other languages and the os.

You’d still need to specify and/or have a default executer, neither of which are trivial tasks to design. Somewhat ironically, the go runtime is the closest to this and has the most explicit invocation of async behavior of the languages that have seriously tried to tackle this problem.

One of the biggest advantages of await is that it's explicit. Otherwise literally any function call can block, and whether or not it blocks is dependent on the body of the function so you can end up with races appearing in random parts of your code because you update a package. It's bad. At that point you might as well just use threads and blocking calls.

Data races like that can only happen when you share mutable state across logical flows of control. Don't share mutable state, regardless of your concurrency framework. I don't see how explicit vs implicit yield points change this fact. Either way there's precisely one logical flow of control that should own your data. In a language like Rust this is strongly enforced, regardless.

Futhermore, in any complex asynchronous I/O app most functions will end up being tagged async. The only ones that won't are simple leaf functions that wouldn't implicitly yield, anyhow. If someone has the bad idea to, e.g., put a yield point in non-obvious leaf functions as part of some kind of hack (e.g. logging, tracing), they're gonna do it in the async/await case, too, because they're already convinced it has value. Having to a drop a few more annotations here or there won't stop them from breaking the app.

Lastly, the bugs that do occur in these sorts of cases usually have to do with unpredictable latencies violating implicit or accidental ordering assumptions. async/await doesn't mitigate that at all because latencies are just as unpredictable. The solution, as always, is to avoid these ordering dependencies by not sharing mutable state.

This defense of async/await is a red herring.

Rust worked this way earlier in its development. The keyword as others mention is “green threads”. It moved away from this because this model has some overheads, and not every application is a socket server. Languages with this feature include Go, Haskell and Erlang.

And also the ABI compatibility for C interop is harder without a "classic" stack.

Having made "async" the default, you've just returned to regular threading, which is what we should have stuck with all along. The thread model is actually pretty useful, the pitfalls are well understood, and the tooling very mature.

Threading sucks. It's a situation where you have some external process (the operating system) deciding when different pieces of work should be woken up, with no way of feeding this back (so it just bases it on relatively naive schedulers).

In practice, most of your threads are in one form of wait loop or another and you've just got polling both inside the threads and with the scheduler.

Have a look at Erlang if you want a better model :) the erlang "processes" (different to os processes) can intelligently only wake up when there is work for them to do.

For a language to efficiently use cores, it really needs to include it's own scheduling.

To pick some nits and generally elaborate, Erlang's VM also has a scheduler; it's not the presence or lack of a scheduler, it's how efficient it is and what guarantees it allows the programmer to make about their system. For example, OS schedulers are typically pre-emptive, which means your OS thread can get interrupted anywhere. On the otherhand, Go's scheduler (I'm using Go because I'm more familiar with it than with Erlang) only allows context switching at well-defined points in your program. Further, operating system threads have more overhead than in Go (presumably also Erlang) because they're fixed stack size (yes, I know this isn't true for all OSes).

>only allows context switching at well-defined points in your program.

Erlang works the same way. The VM scheduler will only context switch on a function call. for or while loops don't exist in Erlang which means there is no risk of blocking the scheduler.

I assumed it must, I just wasn’t familiar. Thanks for clarifying!

You can have userspace threads (or even hybrids m:n threading). They went out of fashion in the last decade for many reasons but were fairly common in the past.

I’m currently implementing threading into an editor scripting language. Could you talk more about this? It would be helpful.

The idea is that cooperatve userspace task switching can be faster than kernel space task switching (a handful if cycles vs thousands). So moving the scheduler in userspace seems a natural evolution. But now you lose the ability to run on multiple cpus as from the kernel point of view the application is a single thread. The next step is to run n userspace scheduers, for each hardware CPUs, each scheduler running a number of userspace threads (thus m:n). Effectively this creates a two level scheduler (one in the kernel one in userspace) and some OSs have custon APIs (look for scheduler activation) that allow the two schedulers to cooperate allowing full preemption of user threads in all circumstances.

The reason that m:n threading went out of style is that, for CPU bound tasks (where you want to run exactly as many threads as there are CPUs), it is just useless overhead, while for the hundreds of thiusands of IO bound threads scenarios, the cost of stack switching is dominated by cache misses anyway, and the cost of calling into the kernel is amortized by the fact that IO requires a call inti elevated privileges anyway. At the sametime kernel threads scheduling has become very fast and userspace threads, which requirea whole stack of their own are not significantly more lightweight than kernel threads.

The modern async model is a compromise. On one side, the 'threads' consist of a single stack frame are very light weight, on the other side there is no generic userspace scheduler, but scheduling is fully controlled by the application.

Why would kernel threads be polling in practice? Won't most be waiting on locks, file descriptors, timers, etc?

You're right, in theory, if your select() or epoll() doesn't have a timeout. And the threads aren't doing any active work. But if you're optimising for that state, then you don't care about inefficiencies of one model or another.

There are also costs in setting and checking those locks etc. Sure, you can build solutions to optimise a broken model, which we've done over decades (with quite a bit of success!), but it doesn't make the model less broken.

> But if you're optimising for that state, then you don't care about inefficiencies of one model or another.

Sure you do, there is noticable memory and ctx switching overhead(for large no. of them ofc) in stackful coroutines.

FWIW, Scheme (among others) has async by default on steroids (aka call/cc) and is nothing like regular threading.

You can build threading on top of it but also all kind of other abstractions.

I feel like you're making a very interesting point but it seems a bit broad and abstract for me to justify in my head. Could you clarify what you mean?

If you want easier concurrency with higher overhead, that's exactly what OS threads do. The OS kernel is the "event loop" in this case, and they're highly optimized for this, though there is some overhead which is hard to avoid.

Rust has gone back and forth on this. Pre-1.0 versions of Rust had a "green threads" mode. It was removed in favor of OS threads because it wasn't worth the complexity.

Go is a language in which "all functions are async" and it's pretty cool. But it needs a runtime, and interop with C / C++ has some real overhead related to switching to a C-style stack. Rust did not want to make those compromises, it wanted to be appropriate for bare-metal or kernel programming, and for libraries used by big high-performance C++ applications. In these cases, Go's model does not work well.

I personally use Go a lot, it's great for network services and command-line tools, and that's mostly what I do.

This is a very even-handed and fair description of the tradeoffs involved, something that is all-too-often missing in this space. Thanks for describing it so well.

> In these cases, Go's model does not work well.

I don't think there's anything about Go's concurrency model that precludes bare metal; I think it's just that the maintainers didn't want to support a bare metal runtime. In fact, I recall at least one other project that modified Go's runtime to support exactly this. I'm also not sure how much of the C interop overhead is due to the different stack models and how much is due to GC concerns or other things. But generally I agree with you--in practice Rust is well suited for these types of tasks and Go is not.

> Go is a language in which "all functions are async" and it's pretty cool.

Async has come to signify a very specific implementation strategy (i.e. stackless coroutines). Go definitely went in another direction.

This is the situation in Scheme, Erlang and Go (although the latter two strongly bind continuations with a specific, built-in scheduling), and what we're doing in Java as part of Project Loom. All Java functions will be able to run inside delimited continuations without modification. Continuations/coroutines are thus a purely dynamic rather than a syntactic entity.

Any news on Project Loom? When to expect it available?

We should have a public repo you can use to build Loom (a prototype at this stage) in a couple of weeks.

What does such build produce? A JVM? A bytecode converter which produced code for existing JVMs?

A full JDK (with a VM, of course). Loom includes both changes to the VM and the core JDK libraries.

Is there any roadmap for inclusion into release?

No. OpenJDK projects are no longer planned to target a specific release.

I think that being async-by-default makes sense for any new high-level language written these days, but the machinery required to make that happen is probably always going to be magic enough that there will be some space for low-level languages that will find value in giving the user choice between synchronous and asynchronous operations.

Not everything is a webserver/frontend/web related.

I'm curious as to why you think that matters? Async/Await are general purpose concurrent programming abstractions.

Alright, so I do computational plasma physics simulations for a living. Given our problem sizes often reach sizes which require teraflop level of parallelism to get done before I retire (that's my scale, I'm sure you're aware of petaflop sims the fluid/engineering guys do) we do need concurrency but it's done by chopping up the domain, literally the volume of space we simulate, onto processors. Now, we could possibly do async at some level (sub node level), but we absolutely need to be syncronized globally because we are simulating real-life physics and we need to maintain causality, especially when we pass information between nodes. Say a wave front passes across one node boundary into an adjacent one, waves must move at a finite and causal manner, of course, just as if you watched a ripple on the surface of a puddle.

If we did the whole problem async (async by default as I interpret it) it would require so much bookkeeping to keep causality that would make it unreasonably difficult. For things that require causality like that, we are lucky that the default mode of computation is synchronous because it fits my problem domain perfectly.

That's why I say "not everything is web" because while async maps well to problems like a server-client type application, it doesn't map well to all problems like my problem for example. Also, as someone else said, sync is easier to reason about for problems that don't require a lot of parallelism.

At worst, you'll have to explicitly wait for all your async calls to have completed and produce results. I suppose normally the implicit wait occurs on attempt to access the result. I also suppose you have to do it anyway in a multi-processor system, and you can't be doing it on a single core.

At best, some of your CPUs would be able to run calculations for two especially fast-to-compute pieces of volume space, while some other CPUs would be busy computing a particularly gnarly block of the volume space.

The problem is that async adds a lot of overhead, and while the mythical sufficiently good compiler could remove it, in practice it is a lot of work for little benefit.

For those CPU bound jobs that do not fit in the classical openmp style scheduling, more dynamic async style scheduling might be appropriate (cilk style work stealing for example) but the async granularity is hardly ever the function boundary.

They are a real problem in desktop apps when you have to shut down the app gracefully. If you call an async function during shutdown you can't really tell when the async chain is done and you have nothing to wait for.

Just dealt with that in a .NET app that uses a 3rd party SDK that heavily uses async/wait. I had to build a whole layer of state management to just handle shutdowns. Simple threading would have been much easier.

It's different with server side code. There async/await is very nice.

That just sounds like the API wasn't designed to support async/await rather than an inherent limitation. Something like Promise.all [1] is all you need to wait for multiple promises if the API had a way to accept a promise instead of a synchronous callback.

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

The same problem would happen with a promise. In a desktop the user clicks "Quit". If you now have to call an async function the Quit handler will return and somewhen later the await portion of the code gets called. From the Quit handler you can't tell when the await stuff has finished so there is no clear time when you can shut down the app.

In a browser on or server side it's different.

They are, but calling a function asynchronously takes a lot longer than synchronously. For small functions the execution time could easily be increased by more than 100%. It's something that only really makes sense if the function will do (slow) I/O or a lot of computation, and the program has other things it could do in the meantime.

Making every function call asynchronous is likely to make most programs a lot slower.

Control is another reason. You relinquish control the more moving parts there are. If you ever looked into the tokio stack you realise there are a Lot Of Stuff going on.

Which doesn't mean it's bad. I'm excited about tokio and rust's async story. But I love that I get to choose.

Concurrency has overhead. On top of that, synchronous is easier to reason about.

So we remove async/await from our async functions and add sync/block to our sync functions? :D

Actually Tokio already has `blocking` implemented!


This seems like the next logical step. A language where all functions are async/await.

And it’ll likely pull in some nice features from academia like, building a dependency graph of async statements so it can automatically reorder async statements to get optimal concurrency.

This is Haskell's IO type and its do notation !

And the reordering of statements exists too. It's called ApplicativeDo[0] and is heavily used at Facebook[1]

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

[1] https://youtu.be/sT6VJkkhy0o

There's a huge advantage to not everything being async/await. Knowing exactly when you yield control can give you atomicity and more determinism for free.

> a function needs to await something

According to a famous quote, the source of which escapes me, "await does not wait for anything, and async is not asynchronous." (This could be specific to C# though.)

I just finished reading the proposal and I’m super impressed and even excited. I really appreciate the commitment to not leaking implementation details into the standard library (i.e. exposing only the interface). This is fantastic for both embedded and (operating) system applications. Nice work everyone.

I want to reflect on that this is still done without any garbage collector.

Why would anyone assume that a GC is needed for async/await?

The Future object seems to do the all bookkeeping needed by the borrow checker (which I believe is what rust uses to track the lifetime of objects).

Maybe because most future implementations (e.g. in Javascript, the proposed C++ solution, Seastar, etc) work by scheduling a continuation on an event loop. Which obviously requires at least some form of dynamic memory for queuing up these continuations.

I think seastar futures are allocation free. Standard c++ futures are not really a paragon of efficiency or good design. Still no GC though.

I haven't read the code, but there are 2 areas where I expect allocations:

- The continuation which is passed to .then, and which is typically a closure, must be type-erased, which requires an allocation. Storing the continuation in a std::function would allocate too. A short glance at https://github.com/scylladb/seastar/blob/master/core/future.... also confirms that there is a make_unique there.

- Since continuations are most likely not called inline on completion but deferred into the next eventloop iteration there needs to be a dynamically sized queue to hold the ready continuations. I am not 100% sure if that's the case for seastar too, but I would guess so.

I do not think the continuation is type erased. I belive that then, at least optionally (depending on the actual futurator passed in) can returns a future whose type encodes the continuation type and stores it inline.

The queue might be dynamically sized which might eventually require allocation, but that can be ammortized across many futures. A large enough queue might never require reallocation.

I'm also not 100% sure as I have never used seastar.

C++ has futures libs too. See the Seastar framework, for an example.

I think the hard part here is not language design, it’s implementation of the runtime.

Linux is especially problematic, asynchronous IO has arrived late, years later than IOCP and kqueue. It took multiple kernel versions to make it usable, and still the APIs are questionable, e.g. files and sockets use different ones.

Even MS failed to do it right, see a bug I found: https://github.com/dotnet/corefx/issues/25066

I know this may sound silly, but can someone make a quick rundown of the pros (and maybe cons) of Rust as compared to NodeJS, Go and Erlang? Why would people use it as opposed to these far more mature ecosystems, especially if it’s hard to master based on the comments I have seen from Rust users? (Not trying to be biased, actually want to ask people who do choose it.)

Hard to master doesn't mean it isn't worthwhile to master. And the only reason the Rust ecosystem is not more mature is because not enough effort has been put into it. By learning the language and working in it, both of the problems you point out will solve themselves.

This is a classic "being traffic" comparison. If you're in a traffic jam, you are as much the cause of it as anyone else is. Likewise, if the ecosystem is immature, you are as much the reason for that as anyone else who doesn't participate in it.

The same could be said about any charity or relief effort, but typically people join the ones that are already easy to join.

I wasn't meaning to imply any obligation that you should be using Rust or anything. My point was only that the reasons people do things are already very familiar to you, so it doesn't do much to be incredulous of their motivations. I don't watch baseball, but I understand why people do it even if I can see no real value in it. People are motivated by a rational advantage far less often than by things like accessibility, familiarity, habit, curiosity, popularity, culture, counter-culture, identity, or need -- all of which are reasons that people use programming languages not present in your list.

Perhaps this is just obvious. But I do frequently see people pretending to be objectively objecting -- complaining about traffic that they are themselves the cause of as if they didn't already know the answer to their question.

You might find this article useful: https://thenewstack.io/safer-future-rust/

How about lightweight threads, or are they there already?

Going to Go with its LWT goroutines was awesome. Would never want to go back to async, which is really just a manual way of implementing LWT.

You can sort of think of tasks as green threads. It just really depends on exactly how you define your terms; this area has a ton of similar sounding terms that different people define differently.

The downside of Go’s approach is the overhead of calling into C; Rust can’t afford this. Go can. IMHO both languages are making the correct choices with regards to their constraints.

Usually we add notes like (2016) to old articles. In this case it would be appropriate to add (April 2018) to the title given how fast the futures ecosystem is moving in Rust.

2018 is assumed

But April isn't, and (April 2018) is less confusing than just a bare (April).

If there is an even newer resource that is better, just link it here in a comment?

Why does it have to be either one? Why not the possibility of choosing the implementation you want to solve the problem? A language where you could easily run whatever you want would have my vote. Not having it shoved down my throat (I’m looking at you JS). While it’s possible to build most of it yourself you should have the possibility to choose.

Be careful with this. I'm not saying you should not do it, but consider you'll have to design it very carefully.

In python you can chose what event loop to hook to async/await, and hence we have gevent, qt, uvloop, twisted, asyncio, tornado, trio and curio as competing implementations.

They are very difficult to mix, and their ecosystems are mostly isolated, dividing the man power to add features, fix bugs, provide support, create libs or frameworks and write docs or tutorials.

Another problem is that you have (except for gevent which causes other problems by monkey patching the stdlib) to setup the event loop explicitly.

Those mechanisms are complicated, easy to get wrong, confuse beginners, make docs introduction long and annoying or misleading before getting to anything interesting.

E.G: to use asyncio, you are exposed to an event loop, an event loop policy, awaitable, coroutines, coroutine functions, futures, tasks and task factories.

But there is worse... You can setup any event loop any way and time you want, and because the api is public, another lib can come and swap it. No lib to my knowledge provide any form of locking.

This leads to some weird situation where libs are considered so low level you end up writing wrappers on top of it (e.g: https://github.com/Tygs/ayo) just to be able to start using it sanely.

Now compare with JS.

I'm really not a fan of the language. However, even when you can't use async/await, using a promise is straightforward. You don't have to bother about creating the loop, starting it, stopping it, cleaning after it has stopped. You don't have to wonder if somebody is going to swap the loop. You don't have to get a reference to a loop to schedule anything. Actually you can mostly ignore the loop and just code the solution to your problem.

Now this makes JS dependent on one loop implementation for each runtime. Also you can't code any new async feature in JS, only use the existing ones. It's probably not what you want for a language like rust.

However, you should learn from the python ecosystem fragmentation and overly exposed low level API to avoid the same mistakes.

Somebody that just wants to use async/await should not have to learn how the implementation works in details, nor take so much precaution to avoid implementation lock in or break somebody else work.

And you really want a federated ecosystem. Having 7 incompatible websocket lib sucks.


Although perhaps the situation in Python could have been mitigated somewhat if a protocol was defined early that implementations could have adopted? Context managers, iterators and decorators all work nicely together, even across language boundaries via the extensions API.

> Although perhaps the situation in Python could have been mitigated somewhat if a protocol was defined early that implementations could have adopted?

Yes. That's one thing the rust community has to get right.

async/await was supposed to be that, but it's only a protocol to define what blocks/doesn't and when you allow context switching. An event loop also has the notion of scheduling, getting a reference to what is scheduled, request the result or error on said scheduled thing, or cancel it. And even loop must bridge different implementations of concurrency (e.g: asyncio.run_in_executor). An event loop also has a life cycle, which includes at the very least a setup and a tear down. An event loop must integrates in an environment, like what do you do when you have several loops, or if you run one loop per threads ?

So you need to define a general behavior for all that. Then let anyone write the implementation the way they want.

That's an API. Like asyncio in Python is an API. If you make the event loop swapable without making this API mandatory however, it's never going to be a protocol.

The api is mandatory; it’s how async/await works.

So everybody has to implement a task for any alternative event loop and accept tasks from other implementation ?

Yep. You build your event loop to take structs implementing the Future trait. The futures don't care about the executor, and it's easy to switch from an event loop running on the current thread to a thread pool, if needed.

Great. I wish we have done the same.

I hear your pain but it wasn’t what I meant. It’s good to learn lessons and I understand that it’s hard to have them work together. But that’s the thing, having 7 incompatible websockets do suck. And that’s my point.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact