Hacker News new | past | comments | ask | show | jobs | submit login
Async-await on stable Rust (rust-lang.org)
1102 points by pietroalbini on Nov 7, 2019 | hide | past | favorite | 380 comments

This is a major milestone for Rust usability and developer productivity.

It was really hard to build asynchronous code until now. You had to clone objects used within futures. You had to chain asynchronous calls together. You had to bend over backwards to support conditional returns. Error messages weren't very explanatory. You had limited access to documentation and tutorials to figure everything out. It was a process of walking over hot coals before becoming productive with asynchronous Rust.

Now, the story is different. Further a few heroes of the community are actively writing more educational materials to make it even easier for newcomers to become productive with async programming much faster than it took others.

Refactoring legacy asynchronous code to async-await syntax offers improved readability, maintainability, functionality, and performance. It's totally worth the effort. Do your due diligence in advance, though, and ensure that your work is eligible for refactoring. Niko wasn't kidding about this being a minimum viable product.

Is there any resources you could point to learn more about how to use this async programming with?

Thanks for the links. But they are not clickable :)


So I guess the Generic Associated Types should be the next then?

I was trying to refactor one of my Rust project the other day and almost immediately got hit by the "No async fn in traits" and the "No lifetime on Associated Type" truck. Then few days later, comes this article: https://news.ycombinator.com/item?id=21367691.

If GAT can resolve those two problems, then I guess I'll just add that to my wish list. Hope the Rust team keep up the awesome good work :)

They'll probably be a while still as they're quite a complex feature. But they'll be useful for all sorts.

Yes this is amazing! And refactoring is actually quit easy to my experience, with two projects I've done this with at least.

I think it's straightforward, but it took me quite a while to do correctly in my dns project.

Are there any lessons that would be useful for others to know about?

Yes, I think so. I’ve been planning on writing a blog post.

I’ve been playing with async await in a polar opposite vertical than its typical use case (high tps web backends) and believe this was the missing piece to further unlock great ergonomic and productivity gains for system development: embedded no_std.

Async/await lets you write non-blocking, single-threaded but highly interweaved firmware/apps in allocation-free, single-threaded environments (bare-metal programming without an OS). The abstractions around stack snapshots allow seamless coroutines and I believe will make rust pretty much the easiest low-level platform to develop for.

Have you ever heard of Esterel or Céu? They follow the synchronous concurrency paradigm, which apparently has specific trade-offs that give it great advantages on embedded (IIRC the memory overhead per Céu "trail" is much lower than for async threads (in the order of bytes), fibers or whatnot, but computationally it scales worse with the nr of trails).

Céu is the more recent one of the two and is a research language that was designed with embedded systems in mind, with the PhD theses to show for it [2][3].

I wish other languages would adopt ideas from Céu. I have a feeling that if there was a language that supports both kinds of concurrency and allows for the GALS approach (globally asynchronous (meaning threads in this context), locally synchronous) you would have something really powerful on your hands.

EDIT: Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration. I'm really happy for Rust for finally landing this! (but could you pretty please start experimenting with synchronous concurrency too? ;) )

[0] http://ceu-lang.org/

[1] https://en.wikipedia.org/wiki/Esterel

[2] http://ceu-lang.org/chico/ceu_phd.pdf

[3] http://sunsite.informatik.rwth-aachen.de/Publications/AIB/20...

> Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration.

I think if it's interesting and spurs useful conversation, it's appropriate, tangent or not. I for one an thankful for your suggested links, they look interesting.

It's not the same, but Rust async/await tasks are also in the order of bytes, and you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

Ceu looks very neat, I suspect (having not read much about it yet) that async codebases could take a lot of inspiration from it already.

> you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

That sounds very promising, I should give it a closer look (I don't program in Rust myself so I only read blogs out of intellectual curiosity)

Or you can use something like protothreads or async.h [1] if you're stuck with C/C++ and need something lightweight.

[1] https://news.ycombinator.com/item?id=21033496

Céu compiles to C actually, and I believe Francisco Sant’Anna (the author of the language) makes a direct comparison to protothreads and nesC in one of his papers (probably the PhD thesis). I had not seen async.h before though, interesting!

[1] http://ceu-lang.org/publications.html

I also think async (the paradigm) is kind of weird in rust world. I agree with https://journal.stuffwithstuff.com/2015/02/01/what-color-is-....

The solution suggested by that article is to use M:N threading, which was tried in Rust and turned out to be slower than plain old 1:1 threading.

If you don't want to deal with async functions, then you can use threads! That's what they're there for. On Linux they're quite fast. Async is for when you need more performance than what 1:1 or M:N threading can provide.

> If you don't want to deal with async functions, then you can use threads!

Truly? If some very popular lib become async (like actix, request, that I use), I can TRULY ignore it and not split my world in async/sync?

You can easily convert async to sync by just blocking on the result. The other way (sync to async) is more difficult and requires proxying out to a thread pool, but it's also doable.

An `async` function _can_ call blocking functions, of course, it just blocks the entire thread of execution which could otherwise continue making progress by polling another future.

Both actix and reqwest are already async, the fact that you haven't noticed yet just shows that you are already ignoring it.

But the signatures are not. If them mark the types/functions async, what happened?

I must rewrite all the calls?

Async is just a syntactic sugar for Future<...>, you can poll() it manually.

You can take the event loop, run it on one thread and do whatever you wish on other threads.

Fearless concurrency, after all.

you can just `.await` on any Future returned directly by APIs, if it's not marked as an async fn.

How hard was it tried?

I imagine there's a reason that languages like Go adopted M:N threading... obviously part of the reason is that it's way more scaleable, but userspace threading is also supposed to be faster, as context switches don't need to switch to kernel space... were the problems tight loops (which AFAIK is also the problem in Go)? or maybe it's just so much easier / more efficient if you also have GC...

In my opinion they tried hard enough. Here are the links to the discussions of that time if you are into that kind of thing:




It is clear that they really wanted to make green threading work, but in the end it proved incompatible with other goals of the language.

The main problem as I understand it is with the stack: you can't make the stack big from the beginning (or your threads wouldn't be lightweight anymore) so you need to grow it dynamically. If you grow it by adding new segments to the linked list, you get the "stack thrashing"/"hot split" problem: if you are unlucky and have to allocate/deallocate a segment in a tight loop the performance suffers horribly. Go solved the problem by switching to contiguous relocatable stacks: if there is not enough space in the stack, a bigger contiguous chunk of memory is allocated and old contents of the stack are copied there (a-la C++ vector). Now there is a problem with references to the stack-allocated variables - they become invalid. In go this problem is solvable because it is a managed garbage-collected language so they can simply rewrite all the references to point to the new locations but in rust it is infeasible.

The reason is that M:N threading inherently requires a language runtime, and the advantage of increased scalability (GHC threads use less than 1KB of memory) comes with the disadvantage that FFI calls are really quite expensive to do, because you need to move to a separate native thread in case the FFI call blocks.

These languages (Erlang, Haskell, Go, ...) have no ambition to be useful for system programming; they're not intended as a replacement for C/C++ in that domain, unlike Rust.

Author's solution is threads:

> But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Correct me if I'm wrong, but wasn't the lack of threads one of the biggest reasons why NodeJS originally outperformed most of its competitors?

Spinning up threads for each concurrent request was expensive, and (nonblocking) async code was by comparison ridiculously cheap, so the lack of overhead meant Node could just make everything async, instead of trying to decide up-front which tasks deserved which resources.

Granted, it's been over a decade since Node came out. Maybe thread overhead has gotten a lot better? But barring faulty memory, I definitely remember a number of people explaining to me back then that being single-threaded was the point.

I'm pretty sure that was the official talking point at the time, and some people may have even been motivated enough to actually belief it.

Of course that all changed once Node got threads.

Did it, though? AFAIK Node still doesn't have OS threads, which are the expensive version. The "About" page still says that Node is "designed without threads": https://nodejs.org/en/about/

All 5 of his points seem to be 2015 Javascript only. Some of them don't even apply to modern Javascript; I don't see any that apply to rust.

While I'm happy to believe you, with a short comment like that I just have to take your word for it. Could give a short explanation of how Rust already addresses each of these points for those of us who don't program in it?

This article is 90% a rant against “callback hell” that js was facing before async/await was introduced. The remaining 10% stays valid even in the presence of async/await, but the trade-off (ignored by the article) of the alternative is having to manually deal with synchronization primitives (at least channels), which would make zero sense given that JavaScript is a single-threaded environment.

Rust is a different beast, you can have whichever model you like best (OS threads, M:N threads (with third party libs), async/await) but async/await is by far the most powerful, that's why it's such a big deal that it lands on Rust stable.

I hadn't heard of "synchronous concurrency", but looking at the Céu paper (only briefly so far), I think the model looks very close to how `async`/`await` works in Rust - possibly even isomorphic. This is really exciting, because Rust's model is unique among mainstream languages (it does not use green threads or fibers, and in fact does not require heap allocation), and I wasn't previously aware of any similar models even among experimental languages.

I'll open a thread on users.rust-lang.com to discuss the similarities/differences with Céu, but for now, here's the main similarity I see:

A Céu "trail" sounds a lot like an `async fn` in Rust. Within these functions, `.await` represents an explicit sync-point, i.e., the function cedes the runtime thread to the concurrency-runtime so that another `async fn` may be scheduled.

(The concurrency-runtime is a combination of two objects, an `Executor` and a `Reactor`, that must meet certain requirements but are not provided by the language or standard library.)

That is a correct understanding of how `await` works in Céu. What is important to note however is that all trails must have bounded execution, which means having an `await` statement. One abstraction that Céu uses is the idea that all reactions to external events are "infinitely" fast - that is, all trails have bounded execution, and computation is so much faster than incoming/outgoing events that it can be considered instantaneous. It's a bit like how one way of looking at garbage collected languages is that they model the computer as having infinite memory.

Having said that, the combination of par/or and par/and constructs for parallel trails of execution, and code/await for reactive objects is like nothing I have ever seen in other languages (it's a little bit actor-modelish, but not quite). I haven't looked closely at Rust though.

Céu also claims to have deterministic concurrency. What it means by that is that when multiple trails await the same external event, they are awakened in lexical order. So for a given sequence of external events in a given order, execution behavior should be deterministic. This kind of determinism seems to be true for all synchronous reactive languages[0]. Is the (single-threaded) concurrency model in Rust comparable here?

The thesis is a bit out of date (only slightly though), the manual for v0.30 or the online playground might be a better start[1][2].

[0] which is like... three languages in total, hahaha. I don't even remember the third one beyond Esterel and Céu. The determinism is quite important though, because Esterel was designed with safety-critical systems in mind.

[1] https://ceu-lang.github.io/ceu/out/manual/v0.30/

[2] http://ceu-lang.org/try.php

I'm not sure I see the connection between bounded execution and having an `await` statement. Rust's `async` functions, just like normal functions, definitely aren't total or guaranteed to terminate. They also technically don't need to have an `await` expression; you could write `async fn() -> i32 { 1 + 1 }`. But they do _no_ work until they are first `poll`ed (which is the operation scheduled by the runtime when a function is `await`ed). So I believe that is equivalent to your requirement of "having an `await` statement". You could even think of them as "desugaring" into `async fn() -> i32 { async {}.await; 1 + 1 }` (though of course that's not strictly accurate).

I think it's reasonable to consider well-written `poll` operations as effectively instantaneous in Rust, too, though since I haven't finished the Céu paper yet I don't yet understand why that abstraction is important to the semantics of trails.

As for `par`/`or` and `par`/`and`, I expect these are equivalent to `Future` combinators (which are called things like `.or_else`, `.and_then`, etc).

Since Executors and Reactors are not provided by the language runtime, I believe it would be easy (possibly even trivial) to guarantee the ordering of `Future`-polling for a given event. I am guessing that for Executors that don't provide this feature, the rationale is to support multi-threaded waking (i.e. scheduling `poll` operations on arbitrary threads taken from a thread pool). When using `async`/`await` in a single thread, I'd be mildly surprised if most Executors don't already support some kind of wake-sequencing determinism.

In any case, I've now opened a discussion on users.rust-lang about whether Rust can model synchronous concurrency: https://users.rust-lang.org/t/can-rusts-async-await-model-th...

In Céu, an `await` statement is ceding execution to the scheduler. It is the point at which a trail goes to sleep until being woken up again. So "bounded" here means "is guaranteed to go to sleep again (or terminate) after being woken up". The language does not compile code loops if it cannot guarantee that they terminate or not have an `await` statement in every loop.

I'll try to explain how the "instantaneousness" matters for the sake of modelling determinism. Take the example of reacting to a button press: whenever an external button press event comes in, all trails that are currently awaiting that button press are awakened in lexical order. Now imagine that there are multiple trails in an infinite loop that await a button press, and they all take longer to finish than the speed at which the user mashes a button. In that case the button events are queued up in the order that they arrive. If the user stops then eventually all button presses should get processed, in the correct order.

The idea is that external events get queued up in the order that they arrive, and that the trails react in deterministic fashion to such a sequence events: if the exact same sequence of events arrives faster or slower, the eventual output should be the same. So while I might produce congestion if I mash a button very quickly, it should have the same results as when I press them very slowly.

Now, you may think "but what if you were asked to push a button every two seconds, with a one second time-window? Then the speed at which you press a button does matter!" Correct, but in that case the two second timer and one second time window also count as external events, and when looking at all external events then it again only matters in what the order in which all of these external events arrive at the Céu program.

Lacking further knowledge about Rust I obviously cannot say anything about the rest of your comment, but I hope you're right about the `Future` combinators because I really enjoyed working with Céu's concurrency model.

Keep in mind that the current release only brings an MVP of the async/await feature to the table. The two things I've missed are both no_std support and async trait methods, but there are reasons these haven't been completed yet. That doesn't mean they won't be available in the future, just that the team has prioritized to release a significant part of the feature that many will already find useful.

related: async/await is also amazingly useful for game development. Stuff like writing an update loop, where `yield` is "wait until next frame to continue".

This can let you avoid a lot of the pomp and circumstance of writing state machines by hand.

I played around with exactly that concept during Ludum Dare last month, if you're interested: https://github.com/e2-71828/ld45/blob/master/doc.pdf

I don't know any rust,. but that was a really good read. what is your background, and what materials did you use getting into Rust?

Your explanations were surprisingly simple

Thanks. I was a software engineer at several different SF-area companies for about a decade, and then I decided to take some time off from professional programming. I'm now a full-time student in a CS Master's program. As I'll eventually need to produce a thesis, this was partly an exercise to practice my technical writing.

I picked up Rust for my personal projects a couple of years ago, and mostly worked from the official API docs, occasionally turning to blog posts, the Rust book, or the Rustonomicon. Because I didn't have any time pressure, I ignored the entire crate ecosystem and opted to write everything myself instead. This has left some giant gaps in my Rust knowledge, but they're in a different place than is typical.

As far as the explanations go, I realized that good authors don't say important things only once: they repeat everything a few different times in a few different ways. So, I tried to say the exact same things in the prose as in the code listings, and trusted that the fundamental differences between Rust and English would make the two complement each other instead of simply being repetitive.

The Fushia team also uses Rust's async/await this way in the implementation of their network stack.

Yeah, that's a super interesting idea that I'm also toying with in my head. One issue however is the "allocation-free" part. Sooner or later you typically hit a situation where you need to box a Future - either because you want to spawn it dynamically or need dynamic dispatch and type erasure. At that point of time you will need an allocator.

I'm still wondering if lots of the problems can be solved with an "allocate only on startup" instead of a "never allocate" strategy, or whether full dynamical allocation is required. Probably needs some real world apps to find out.

I'm not familiar with rust futures stuff, but having researched similar problems in C++, the important part is giving the control of allocation (when and how) to the application.

Right. There are some possibilities with custom allocators. But that is still kind of a non-explored area in Rust. Especially the error handling around it. Looking forward to learn more.

I've dipped my toes into embedded rust from time to time. Can you give me an example or two of how you would use async/await in an embedded environment? I'm just curious about how it would work.

Another infrequent toe dipper here - say you need to send some command to a peripheral, wait for the device to acknowledge it, then send more data. The naive implementation would poll for the interrupt or just go to sleep, meaning no other user code can run in that time. A naive async implementation would spread your code all over the place - it wouldn't just be a function call with three statements. There are libraries that can give you lightweight tasks, but you might need to keep separate stacks for each of the suspended tasks, there might be other ones that don't, but require the code is written in a non trivial way. Rust with async/await on no_std can give you best of both worlds - easy to read sequential code while the best possible performance.

>poll for the interrupt

Errr, no. You either poll, or set up an interrupt to catch an event. The point of an interrupt is to allow other code to continue to run while waiting for a peripheral.

Sorry, I mean polling for the ready register, not interrupt.

Which executor do you use? Tokio is a bit heavy, no?

Not the parent, but I've been keeping my eye on https://github.com/Nemo157/embrio-rs

Not much doc at the github. Can you provide a quick overview?

Not much I can say other than “an executor designed specifically for embedded.” I don’t do embedded dev myself, so it’s more of a curiosity thing for me than something that I can give you a lengthy explanation of.

Embrio provides only a very limited executor. it can just drive only single Future. Which gets re-polled every time a certain system interrupt occurs. You can obviously have subtasks of that single task (e.g. via join! and friends), but you can't have other dynamic tasks.

jrobsonchase has implemented a more dynamic solution which uses custom allocators for futures: https://gitlab.com/polymer-kb/firmware/embedded-executor

I think there might also be other solutions out there. I remember the Drone OS project also doing something along that.

I'd like to hear more about this - I thought Tokio was just some tools on top of MIO which is just `select` (or similar). Is Tokio heavy?

It's like old Windows programming (Windows 3.X). Cooperative multitasking is process control version of manual unsafe memory management. When your app suddenly freezes, you discovered a bug.

It's for cases where alternatives don't exist or they are too expensive.

Language level green threads are safer abstraction over asynchronous I/O operations.

The difference is that in an embedded context hopefully everything is written and tested all together by the same entity, rather than a mish-mash of high and low quality code held hostage by the weakest link.

> I believe will make rust pretty much the easiest low-level platform to develop for.

This stuff is easily available for C. It's just a function call instead of syntactic sugar. On Windows, use Fibers. On Linux, there is a (deprecated) API as well (forget the name). Or use a portable wrapper library.

Or just write a little assembly. Can't be hard.

How exactly do I write embedded code for a tiny chip on a different architecture with just 4K of RAM and no OS that uses the Win32 fibers?

Probably not using green threads / async at all? That would require a number of stacks as well as a little runtime, which will quickly eat up your 4K.

Just to clarify for those following along, Rust async code does not use green threads and doesn't require a stack per task.

I'm missing some of the technical details here, but from a quick glance of the article it seems like Rust's futures are lazy. I.e. a stack would only be allocated when the future is actually awaited. But in order to execute the relevant code, a call stack per not-finished future is still needed, or am I missing something?

Afaik Rust's futures compile to a state machine, which is basically just a struct that contains the current state flag and the variables that need to be kept across yield points. An executor owns a list of such structs/futures and executes them however it sees fit (single-threaded, multi-threaded, ...). So there is no stack per future. The number of stacks depends on how many threads the executor runs in parallel.

> the current state flag the variables that need to be kept across yield points.

you mean like... a stack frame?

Like a stack frame, but allocated once of a fixed size, instead of being LIFO allocated in a reserved region (which itself must be allocated upfront, when you don't know how big you're going to end up).

The difference being: if your tasks need 192B of memory each, and you spawn 10 of each, you just consumed a little less than 2kB. With green threads, you have 10 times the starting size of your stack (generally a few kB). That makes a big difference if you don't have much memory.

So that's actually green threads in my book (in a good implementation I expect to be able to configure the stack size), with the nice addition that the language exposes the needed stack size.

It's a stackless coroutine. AFAIK, the term “green thread” is usually reserved to stackful coroutines, but I guess it could also be used to talk about any kind of coroutines.

It's more efficient (potentially substantially more so). In a typical threaded system you have some leaf functions which take up a substantial amount of stack space but don't block so you don't need to save their state between context switches. In most green threaded applications you still need to allocate this space (times the number of threads). The main advantage of this kind of green threads is you can seperate out the allocation which you need to keep between context switches (which is stored per task)versus the memory you only need while actually executing (which is shared between all tasks). For certain workloads this can be a substantial saving. In principle you can do this in C or C++ by stack switching at the appropriate points but it's a pain to implement, hard to use correctly (the language doesn't help you at all), and I've not seen any actual implementations of this.

Kinda like a stack frame, but more compact and allocated in one go beforehand.

because of syntactical restrictions of how await work, at most you need to allocate a single function frame, never a full stack, and often it doesn't even need to be allocated separately and can live in the stack of the underlying OS thread.

So that async function cannot call anything else?

They can, but the function itself cannot be suspended by calling something else (i.e. await being a keyword enforces this), so any function that is called can use the original OS thread stack. Any called function can in turn be an async function, and will return a future[1] that in turn capture that function stack. So yes, a chain of suspended async functions sort of looks like a stack, but its size is known a compile time [2].

[1] I'm not familiar with rust semantics here, just making educated guesses.

[2] Not sure how rust deals with recursion in this case. I assume you get a compilation error because it will fail to deduce the return value of the function, and you'll have to explicitly box the future: the "stack" would then look a linked list of activation records.

Async function don't really call anything by themselves: the executor does, and all function called in the context of an async function is called on the executor's stack. You just end up with a fixed number of executors running on their own OS thread with a normal stack.

This is a big improvement, however this is still explicit/userland asynchronous programming: If anything down the callstack is synchronous, it blocks everything. This requires every components of a program, including every dependency, to be specifically designed for this kind of concurency.

Async I/O gives awesome performance, but further abstractions would make it easier and less risky to use. Designing everything around the fact that a program uses async I/O, including things that have nothing to do with I/O, is crazy.

Programming languages have the power to implement concurrency patterns that offer the same kind of performances, without the hassle.

Yup. See https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... for a pretty good explanation of the async/await problem.

I know Rust is all about zero-cost abstractions, "but at what cost" beyond just runtime cost? I appreciate their principled approach to mechanical sympathy and interop with the C abstract machine, but I'm just not enthused about this particular tradeoff.

An alternative design would have kept await, but supported async not as a function-modifier, but as an expression modifier. Unfortunately, as the async compiler transform is a static property of a function, this would break separate compilation. That said, I have to wonder if the continuously maturing specialization machinery in Rustc could have been used to address this. IIUC, they already have generic code that is compiled as an AST and specialized to machine code upon use, then memoized for future use. They could specialize functions by their async usage and whether or not any higher-order arguments are themselves async. It might balloon worst-case compilation time by ~2X for async/non-async uses (more for HOF), but that's probably better than ballooning user-written code by 2X.

Async adds a runtime requirement, so Rust cannot just take the same approach as Go. You only have a scheduler if you instantiate one. And having a runtime or not has nothing to do with the C abstract machine, but with precise control on what the code does.

For instance, Go does not give you the option to handle this yourself: you cannot write a library for implementing goroutines or a scheduler, since it's embed in every program. That's why it's called runtime. In Rust, every bit of async implementation (futures, schedulers, etc) is a library, with some language support for easing type inference and declarations. This should already tell you why they took this approach.

Regarding async/await and function colors (from the article you posted), I would much rather prefer Rust to use an effect system for implementing this. However, since effects are still much into research and there is no major language which is pushing on this direction (maybe OCaml in a few years?) it seems like a long shot for now.

Go is even more different. Rust async has explicit yields with await, but Go does it implicitly at various key locations. This is actually pretty surprising to many folks, it was in the past and maybe still is possible to deadlock Go with a certain incantation of tight looping.

Another difference with Rust is that async functions are stackless. Go has to contend with having a non-conventional stack and interoperability with C code (and sometimes syscalls, vDSO, etc.) that expects a relatively large stack requires pivoting.

I do wish Rust could solve this problem, but the two approaches are very different indeed. I think it’s a fact of life that accidental blocking will exist in Rust, approaches that prevent it are complicated and indeed have runtime costs.

> [...] Go does it implicitly at various key locations. This is actually pretty surprising to many folks, it was in the past and maybe still is possible to deadlock Go with a certain incantation of tight looping.

This is being worked on here: https://github.com/golang/go/issues/24543

Is there any documentation on the latest status on this, how far along they've come and what technical solutions they're considering / have settled on?

The proposal is marked accepted, and commits are being made to reference it, so I suppose the design doc and that issue are most likely the source of truth.

Fascinating. This would make Goroutines one giant leap closer to being thread-like. It feels weird how much of the OS scheduler is being “duplicated” but you can’t really argue much with the results.

> so Rust cannot just take the same approach as Go

I did not suggest that it do so. In fact, I suggested an approach that wouldn't fit for Go!

In Go, you've got pretty strict modular compilation. Because of this, specialization -- which is pervasive in Rust -- is virtually absent in Go. Rust's rich generics system demands robust specialization machinery. This machinery, which handles instantiation of explicit generics, can be repurposed to handle automatic instantiation of implicit generics. In this case, I'm suggesting that a function could be compiled as either synchronous or asynchronous as determined by the use-site.

> In Rust, every bit of async implementation (futures, schedulers, etc) is a library

Nothing about my suggestion precludes this.

Exactly! Most code doesn't care whether it's sync or async, the compiler should decide which specialization to use when the program is built. The only places where you usually care is either 1: the very top of the program (some event loop handler called near main) or 2: the very bottom, when you're implementing an IO library. The specialization is decided by what you do at the very top; that decision propagates down the call chain, specializing everything in the middle that doesn't care; until it gets to a function in your IO library where it chooses between the pre-defined sync or pre-defined async implementations.

So you have a synchronous http server and you decide you want to make it async? Ok, no problem, switch to an async-enabled request handler in main, and boom everything that you wrote is recompiled into a state machine a la async, and at the very bottom where it actually calls the library to write out to the socket it, the library knows what the context is and can choose the async-enabled implementation.

I'm glossing over some important details and restrictions that might make this more complicated in practice, but I think it should at least be possible for functions to opt-in to 'async-specializability' to avoid having to rewrite the world for async.

> An alternative design would have kept await, but supported async not as a function-modifier, but as an expression modifier.

How is that different than an `async` block?

It's not. I didn't realize Rust had that, but the rest of my comment still applies: it would be nice if asynchrony was transparent to intermediate functions and compiled via specialization.

> Programming languages have the power to implement concurrency patterns that offer the same kind of performances, without the hassle.

Can you give one that reaches this goal? Go is often cited on that regard but it doesn't really fit your description since it trades performance for convinience (interactions with native libraries are really slow because of that) and still doesn't solve all problems since hot loops can block a whole OS thread, slowing down unrelated goroutines. (There's some work in progress to make the scheduler able to preempt tigh loops, though).

I think Project Loom would fit. It's a remarkably sane approach:


As a Java developer I'm really looking forward for Project Loom. I think it's a great approach that avoids the pitfalls of the two-colored functions approach.

However, Project Loom doesn't fit into the "zero cost abstraction" paradigm of rust. Project Loom requires quite a lot of support form the JVM runtime:

> The main technical mission in implementing continuations — and indeed, of this entire project — is adding to HotSpot the ability to capture, store and resume callstacks not as part of kernel threads. JNI stack frames will likely not be supported.

It also still require manual suspension so JIT compiled tight loops will likely still cause a problem.

I think that that makes it fairly similar to Rust's approach then... compiler support for suspension, and cooperative scheduling.

Clojure's pmap function (parallel map) is pretty cool: http://clojure.github.io/clojure/clojure.core-api.html#cloju...

There is rayon in rust land. This is a pretty different (simpler) thing.

Rayon uses iterators. Iterators are the dual of first order functions. Iterators are superior because they are more general. (You can implement pmap using iterators. You cannot implement iterators using pmap.)

Does Rayon use iterators, or just iterator-like combinators? I don’t think a for-loop would work with Rayon, would it?

You can also definitely implement iterators using first-order functions:

    const iterate = array => {
      const step = index => ({
        value: array[index],
        next: () => step(index + 1)
      return step(0);

    // add some combinators on top
First order functions can in fact implement anything if you’re willing to accept some bad syntax and speed — that’s the Church-Turing equivalence.

Hm. You're right. I hadn't used Rayon in so long I forgot it is indeed iterator-like combinators using an entry point called `par_iter`.

Python's greenthread implementations are quite good, monkey-patching all calls to work asynchronously with low mental overhead and automatic library comparability (except for libraries that call native code obviously).

Of course Python generally fails to offer "the same kind of performance" for anything limited by CPU or memory, so it technically doesn't fit the description either.

> since hot loops can block a whole OS thread

Asking as a beginner, what does the above mean?

Not sure what does hot loop means, and why does it block Os thread

Go creates the illusion of preemptive multithreading by having implicit safe-points for cooperative multithreading. Each IO operation is such a safe-point. If you write an infinite loop like `for {}` where there are no IO operations in loop body, it will block indefinitely. This will prevent the underlying OS thread from being available to other goroutines. The same thing can happen even if you do have IO operations in there, but the work being performed is dominated by CPU time instead of IO.

Note that this is being fixed in the next release: https://golang.org/issue/10958

That's cool it's finally coming, this issue is open since 2015! I wonder which performance impact this will have.

Maybe this is a dumb question, but what if a language with implicit safepoints lets you opt out with something like a nosafepoint block/pragma?

An “OS thread” is the most basic part of the program that can actually do things. When it is blocked, it can’t do anything.

OS threads can be expensive, so libraries try to only create a few OS threads.

A “hot loop” (usually, I think, called a “tight loop”) is a loop that does a lot of work for a relatively large amount of time all at once. Things like “looping through a list of every building an Manhattan and getting the average price over two decades.”

With some code like networking, you end up having to wait on other parts of the computer besides the CPU often, for things like uploads and downloads.

“Asynchronous programming” tried to make it easy to keep the CPU busy doing helpful things, even while some of it’s jobs are stuck waiting on uploads/downloads/whatever. This keeps the program efficient, because it can do a little bit of many tasks at once instead of having to complete each task entirely before moving on to the next.

The problem comes when you have a tight loop in a thread that is mostly expecting to be doing asynchronous work around I/O or networking. It is basically trying to juggle with one hand. The program can’t multitask as well, and you end up having to wait longer before it can start each piece of work you want.

Hot loops are often "expensive", or where your program is spending a lot of its time.

If you're doing a hot loop, which may be synchronous, you will block the event loop attached to that OS thread, because a hot loop is presumably not yielding control until it is done.

hmm, why is that a bad thing, if your entire thread is synchronous shouldn't it technically be blocking? Or is OS thread the main thread?

The problem is that the async model is a form of cooperative multithreading, so if one computation runs for a long time without returning to the main event loop, it can increase the latency for responses to other events. E.g., if one HTTP request takes a long time to process, and many of the worker-pool OS threads are handling such a request, response time goes up for all the other requests. OS-level concurrency is preemptive (timesliced), so one busy thread doesn't block other requests, but of course with much higher overhead in other ways. Best practice is usually to keep event handlers on event-loop threads short and push heavy computations to other OS-level worker threads.

ah, that make sense, I think another user point out that spawning n goroutine does not actually spawn n physical threads but rather queue n task to m thread in the pool, so if we exhaust m thread, n-m task will be blocked.

Thanks for the explanation

I wonder what is the point where each trade off make sense (ie. what is consider heavy computation vs light computation, it probably is related to OS thread allocation time)

> interactions with native libraries are really slow because of that

Can you explain a bit? What is the connection between concurrency implementation (which I am assuming you are talking about multiplexing multiple coroutines over the same OS thread) and say slowness in cgo? Having to save the stack? I don't get it.

FFI is slow because the main Go compiler uses a different calling convention than everything else. I couldn't tell you if or how that's related to its concurrency features.

Its an important optimization otherwise each go routine would use a lot of memory, but its not required. The stack allocation strategy has changed a couple times in main compiler, gccgo originally support it and CGo functions behave like normal go mod the stack.

> Can you give one that reaches this goal?

First-class continuations. They are an efficient building block for everything from calling asynchronous I/O routines without turning your code into callback spaghetti, to implementing coroutines and green threads. Goroutines are a special and poorly implemented case of continuations. Gambit-C has had preemptive green threads with priorities and mailboxes at less than 1KiB per thread memory overhead for over a decade now, all built on top of first-class continuations.


First-class continuations are too unrestricted to match the performance of Rust-style async/await.

If you take on a few limitations you can get there, but then those are exactly the things people complain about.

It is worse than that. First-class continuations add a small overhead for stack operations, and they always have more memory overhead than callbacks/async transformation into callbacks. But the comment I was replying was about Go, not Rust. For the small overhead that first-class continuations have, they provide a simple and efficient way to handle coroutines, generators, asynchronous I/O, and green threads.

Erlang probably comes closest?

Not an Erlang user, but my understanding is that the Erlang VM (BEAM) schedules on function calls. Which works fine for that use, since Erlang does looping with tail calls, but is not a solution for procedural languages.

Erlang vm is fully preemptive and schedules on something called a reduction. You can call an external function with EFI that can screw things up, but otherwise it's not necessarily on what you might consider a function call .

In practice, it's enough to preempt at potentially-recursive function entry and loop backedges. A thread which doesn't recurse or loop has to come to an end pretty quickly, at which point the scheduler will get a go.

There's no fundamental difference between inserting a yield before a tail call and inserting a yield before a continue statement or closing bracket of the for block, is there?

Fundamentally no, but it's a more difficult thing to do in practice at least. Unless you want to give up other optimizations like loop unrolling and other things like that since you end up losing the fact that the loop exists after some optimization passes.

BEAM (Erlang VM) provides pre-emptive scheduling

True for functions written in Erlang/Elixir, but not in NIF functions implemented in C.

Thanks to dirty schedulers that is not a big issue: https://medium.com/@jlouis666/erlang-dirty-scheduler-overhea...

Thanks for the link. It's a very deep and interesting technical post.

> If anything down the callstack is synchronous, it blocks everything.

If you use a work stealing executor tasks will get executed on another thread. Therefore the impact of accidentally blocking is lowered. Tokio implements such an executor

I haven't used tokio but in my Scala days the execution context was backed by a thread pool. One blocking call wouldn't kill you because it would just tie up one thread, but the thread pool would quickly get exhausted and lock up the application. Does tokio have the same problem?

There is a maximum number of threads and it's by default set to the # of cores (based on the docs for tokio-executor's ThreadPool and Builder). The docs also say that the # of threads starts at 0 and will increase, so one can do the Scala strategy of starting with large threadpools - one of my projects last year defaulted to 100-200 threads per pool to avoid just this problem.

I think the question you're asking is, "are deadlocks possible?" and the answer to that seems like it would be yes. I would hope that Rust's memory model makes accidental memory sharing dependencies & deadlocks harder to cause, but you can always create 8 threads waiting on a mutex and a 9th that will release it, and have the 8 spinlock on the waiting part.

The "async-std" library, one of a few attempts to write async-aware primitive functions around blocking calls to the filesystem, mutexes, etc, implements async wrappers around mutex and others that should ensure that yield points are correctly inserted and the task goes back into the queues if it's blocked.

That seems to me the right solution - make sure all your blocking calls with a potential for deadlocking check "am I blocked?" first and if so, put themselves back onto the task queue instead of spinning.

Rust cannot guarantee the lack of deadlocks or overall thread-safety. It does guarantee no data races, though.

Yes, is there some way I could have been more precise about that in my comment?

It will certainly have it's limits. I don't know whether tokio spawns new threads if it detects all others are used up - likely not that this point of time. However work-stealing at least allows to mitigate the impact of some accidentally blocking code. E.g. if a syscall is made that takes longer than expected - or if a library holds a mutex for longer than necessary. It shouldn't be used as an excuse for simply blocking everywhere in an async task executor - but it will to reduce the impact, and give developers some time and wiggle room to improve the associated code.

> Async I/O gives awesome performance, but further abstractions would make it easier and less risky to use. Designing everything around the fact that a program uses async I/O, including things that have nothing to do with I/O, is crazy.

Microsoft kind of tried to do this with the new APIs for UWP: pretty much everything is async, the blocking versions of APIs were all eliminated, so there was no way for the async-ness to "infect" otherwise synchronous code. It was actually a pretty nice way to program; it's a shame it never took off.

They're finally opening the APIs (already have, I think, to some extent) for use with normal desktop apps and "UWP" apps outside of the Store. You can even embed the new UI stuff inside of Forms and WPF apps via XAML islands.

They also lowered their portion of the revenue share considerably for Store apps, afaik.

The JavaScript world is pretty close to this. Not quite everything is async, but almost everything is async-first.

The javascript world was forced into this. Since it doesn't (or at least didn't) expose threads, everything had to be non-blocking. Otherwise programs would be non-responsive all the time.

They do, but only at the expense of interoperability. Any "green thread" solution breaks down once you have to invoke code written in something else while allowing it to call you back. Async futures, on the other hand, can map to any C ABI with callbacks.

So until there's some standardized form of seamless coroutines on OS level, that is sufficiently portable to be in wide use, we won't see widespread adoption of them outside of autarkic languages and ecosystems like Erlang or Go.

Targeting the browser via WASM, we don't even have syncronous I/O for many/most things - I've been looking forward to async/await as a means of reigning in some of the awkward APIs and callback hell.

I am waiting for a language to solve this with the type system and compiler. Give me the ability to mark a thread as async only and a clean (async) interface to communicate with a sync thread. If my async code tries to do anything sync, don't let it compile.

Whether a function is async-safe isn't black and white. A function that performs calculations may return instantly on a small dataset but block for "too long" on large parameters. On what is "too long" will vary widely depending on your application.

But whether a function is not async-safe is pretty black and white (i.e. there's a lot of clearly unsafe code). Even a definition as simple as "performs blocking I/O" would be extremely helpful.

This is the value people get out of async only environments like node. Yes, you still have to worry about costly for loops but I don't have to worry about which database driver I'm using because they are all async. In a mixed paradigm language like rust I would really appreciate the compiler telling me I grabbed the wrong database driver.

What about long-running computations? They're not really async-safe (they will block the thread), but they don't perform any IO.

I think the suggestion is “clearly blocking io should be marked as such (with some marker like unsafe) and other functions can be marked “blocking” if the creator decides it is blocking.”

In the worst case a problem could still occur, but once found, the “problem” function can be marked appropriately. At the least that would start to solve the issue.

There are plenty of synchronous calls in node and js. Not everything is async https://nodejs.org/api/fs.html#fs_dir_readsync

And that's just io, most function calls are synchronous also

Aren't you effectively asking the compiler to solve the halting problem?

I think the best you could do would be heuristics - having inferred or user-supplied bounds on the complexity of functions, having rough ideas on how disk or network latency will affect the performance of functions, and bubbling that information up the call tree. It wouldn't be perfect, but it could be useful.

Compilers can "avoid" the halting problem if the underlying language is not turing complete/ can express function totality.

You can even express algorithmic complexity in a language.

But it's a bit more complex than that, really. You will have a harder time saying "this loop will block the event system for longer than I'd like".

Sorry, how is Rust not that language? You can use single threaded executors that only require Send and not Sync on data being executed on.

Nit: Singlethreaded executors also don't require "Send", since they don't move things between threads. That allows you too e.g. use non-atomically refcounted things (Rc<T>) on singlethreaded executors, which you can't use in the multithreaded versions.

Yes, you're quite correct, and I don't think a nit. That is much more accurate.

Does the compiler prevent me from calling a library that calls a library that performs a sync action?

What is a 'sync action'?

Async is fundamentally cooperative multitasking. There is no real difference between the 'blocking'-ness of iterating over a large for-loop and doing a blocking I/O action - the rest of your async tasks are blocked either way while another task is doing something.

While the behavior of a large for-loop and a blocking I/O action doesn't change the event loop, I'd still appreciate the compiler helping me identify the blocking I/O loop. I'll take whatever help I can get.

I think I can agree with you, but first I think we'd have to somehow define how to even approach this feature.

Eg, fundamentally I feel like you're making a distinction between blocking I/o and a "blocking" for loop. At the end of the day, they're the same in my view - one is just more likely to be costly.

So I think for this feature to be done right, we'd have to somehow be able to analyze the likelihood of an expensive operation - and the negative consequence that action might have on the rest of the workload. Eg, I would want the same hypothetical behavior and compile-time warnings/errors that a huge file-load might cause, with a huge loop.

Otherwise a simple function call which involves no I/O and looks innocent could have the same terrible behavior as some I/O call does.

Defining that, and informing the compiler seems obscenely difficult. To that degree, I think any interaction with any sort of heap-y thing like iterating over a Vec would have to error the compiler if used in a Future context.

_everything_ would have to be willing to yield. Not sure I like it. Interesting thought experiment though. I imagine some GC languages do exactly this.

I mean, what you're asking for is just profiling, but only of your async methods. I'm not familiar enough with the details to predict how async messes with profilers, so maybe it'd need support. But using a flamegraph profiler would show a big chunk of time in a function that only has small amounts of time spent deeper in the stack.

No, but if that library claims to be an async library, wouldn't that be a bug in the library?

Edit: I'm interpreting your use of sync here as "blocking" and not as Sync in Rust, meaning safe to share across threads. To be clear in my initial response I was talking about shared memory across threads, and may have misunderstood your original statement.

Yes. And I want the language to use types and compilers to eliminate that whole class of bugs.

I'm not sure this is really possible. Given that async programming is cooperative by nature, how do you tell the difference between a blocking IO task, and a really long running loop in a piece of code that is itself blocking others from executing because it's doing too much work?

The blocking IO might be something in Rust that a type could be created for to denote that they are not async, and therefor warn you in some way, but I think that one is easy to detect in testing.

> I think that one is easy to detect in testing

I have seen that not detected in testing too many times. With a work stealing execution context the code will still run fine unless under heavy load (which will exhaust the thread pool and lock the application).

“Non-blocking” code is basically just code that takes a short enough amount of time that we don’t care that it blocks the thread. It’s inherently a matter of judgement.

Static analysis should help with this. Basically it should identify every call site where I/O happens (and other syscalls), and then you have to check them that they are invoked with the right async/nonblocking dance.

This is basically a code audit problem.

Of course something like taint analysis could also work. Every such callsite should be counted as tainted unless it gets wrapped with something that's whitelisted (or uses the right marker type wrapper).

Even effects as types can't help much, because the basic interfere to the kernels (Linux, WinNT, etc.) are not typesafe, and as long as the language provides FFI/syscall interfaces you have to audit/trust the codebase/ecosystem.

There are two ways a language can help

1. A performant "i/o" layer in the standard library that allows a large number of concurrent activity (forget thread vs coroutine differences).

2. Ability of programmer to express concurrency. Ideally, this has nothing to do with "I/O". If I am doing two calculations and both can run simultaneously, I should be able to represent that. Similarly for wait/join.

Explicitly marking a thread as async-only will just force everyone else (who need sync and cannot track/return a promise/callback to their caller) write a wrapper around it for no reason.

The alternative is to write SansI/O code (https://sans-io.readthedocs.io/), so that your program don't have to think about that.

Besides, you don't have to put async/await everywhere: if your code is not performing IO, it completely ignore this concern.

The problem is that most of your code is mixing I/O and non I/O code, and people just don't think about it. E.G: a django website is not just a web server, but has also plenty of calls to the session store, the cache backend, the ORM, etc.

Now you could argue that the compiler/interpreter is supposed to hide the sync/async choice to the code user. Unfortunately, this hides where the concurrency happens, and things have dependencies on each others. Some are exclusive, some must follow each others, some can be parallel but must all finish together at some points, some access concurrent resources...

You must have control over all that, and for that to happen, you can either:

- have guard code around each place you expect concurrency. This is what we do with threads, and it sucks. Locking is hard, and you always miss some race condition because it can switch anywhere. - have implicit but official switch points and silos you must know by heart. This is what gevent does. It's great for small systems, not so much at scale. - have explicit switch points and silos: async/await, promises, go-routine. This is tedious to write, but the dangerous spots are very clear and it forces you to think about concurrency upfront.

The last one is the least worse system we managed to write.

> this is still explicit/userland asynchronous programming: If anything down the callstack is synchronous, it blocks everything. This requires every components of a program, including every dependency, to be specifically designed for this kind of concurency.

Welcome to the 1980s world of cooperative multitasking, but now with "multi-colored functions."

Or just use a preemptive scheduler (such as a regular OS scheduler). Or just be explicit, and take difficulties with being explicit as an indication that the data flow is maybe not very well designed.

I don't know, maybe there are valid applications for await (such as much frequented web servers, where you might want to have 10s of thousands of connections, that would be too expensive to model as regular threads, but still you just want to get some cheap persistence of state and it's not a big problem that the state is lost on server reboot). I can't say, I'm not in web dev.

But I bet it's much more common that await is simply a little overhyped and often one of the other options (real threads or explicit state data structures) is actually a better choice.

> Or just use a preemptive scheduler (such as a regular OS scheduler).

Well... I can't help but whenever I see the await stuff it reminds me of times where I had to do cooperative multitasking and was longing for OS and/or CPU support for something which is non-invasive to my algorithms. But then I'm not sure whether I'm the grumpy old man or it is just history repeating.

await allows to write concurrent (for IO) or parallel (for CPU) code like it was serial.

The issue it solves is programmer having trouble executing parallel code in their head, and when relationship became intricate (a computation graph) they just breakdown and write buggy software.

A scheduler is targeted at use cases. A preemptive scheduler optimize for latency and fairness and would apply for real-time (say live audio/video work or games) but for most other use cases you want to optimize for throughput.

With real thread you risk oversubscription or you risk not getting enough work hence the task abstractions and a runtime that multiplex and schedule all those tasks on OS threads. Explicit state data structure is the bane of multithreading, you want to avoid state, it creates contention point, requires synchronization primitives, is hard to test. The beauty of futures is that you create an implicit state machine without a state.

You don't need await for data (CPU) parallelism. You'd typically use something like https://github.com/rayon-rs/rayon or OpenMP instead.

OpenMP does not handle nested parallelism.

Compute-bound parallelism is not always data parallelism, for example a tree search algorithm would need to spawn/async on each tree nodes.

You can't "avoid state". State is essential to any computation.

The only question is, can you express some of the state-relations as serial code? If "relationships became intricate (a computation graph)", chances are, you shouldn't use serial code anymore, because that splits dependencies in a two-class society: those that are expressed using code and those that are expressed using dead data. It is usually preferable to specify everything as dead data if the relationships get complex, and to then code sort of a virtual machine that "executes" the data.

So it's in fact the simple cases that lend themselves to being expressed as serial code. I won't argue with that there are nice looking example usages. Problem is, as always, systems that help making the simple things easy often make the hard things impossible.

Haskell avoids state.

When I say state I meant "shared state" is the bane of multithreading. Amdahl's law show that if 95% of your code is parallel you limit your speedup to 20x even with thousands of cores, any shared state contribute to those 5%.

> Haskell avoids state.

That's a myth. You need state for computation. It's not a language issue. You need just as much state in Haskell as you need in other languages. In Haskell you just specify each little component as a function which takes as a parameter what is effectively global state.

Which, by the way, is usually a bad idea because it causes so much boilerplate. Plus, it makes it unclear how many instances of a given concept can actually exist in a running program.

Isn't this the most convenient setup though? I'm most familiar with async/await in UI programming and you most often have a main thread for synchronization. You want to assume that most of your main thread work is synchronous and non-yielding until you explicitly yield. Seems like it would be a lot harder to use main thread synchronization in the style your suggesting.

Maybe I just can't imagine it. Whats a good language that shows off the style you're suggesting?

>Whats a good language that shows off the style you're suggesting? javascript.

What you describe sounds like native UI work since forever before javascript. "Don't block the main thread" and all that.

Javascript is diferent in that it's a single-thread with an event loop. Synchronous functions execute until they end. Asynchronous functions are handled by the event loop which "loops" between the pool and runs each one for some time, then switches to other, concurrenly (think round robin). What happens when the runtime is running an asynchronous function and inside it reaches a synchronous one? it stops round-robin and executes this function until it ends.

What OP wants is a language like javascript but without having to write code distinguishing synchronous and asynchronous functions and instead having some other tool to tell the runtime when a function is synchronous or asynchronous without having to write it again.

Yes, I realize all this. My question is how you can have such a system and still keep UI thread synchronization without having the opposite problem of marking all your synchronous methods.

in a strict language? I don't think it's possible. Because if you take a better look you'll see that it's not enough to mark functions as sync or async since inside the functions each line of code can be considered a synchronous function in it's own.

What you want is something like Haskell that's lazy and its not about "executing statements" but rather "evaluting expressions".

Not the OP, but Go doesn't have this problem because all I/O is async under the hood, but it exposes a sync interface. This means the entire Go ecosystem is bought into a single concurrency model and runtime, which some find irksome, but it works pretty well most of the time. Of course, Go also lacks Rust's static safety features, but I think that's orthogonal to its concurrency approach.

We tried this in Rust and found it was slower than 1:1 threading.

What you tried wasn't "this", though. It was one particular implementation of lightweight threading that has to cope with Rust's peculiarities, special requirements and compilation targets. There is absolutely nothing essential about lightweight threads that prevents them from emitting essentially the same code as the stackless-coroutine approach. It's just that in Rust it might be very hard or even not worth it, given the language's target audience.

I don't understand what your objection is. It's a given that what I wrote applies to Rust. This is a thread about Rust. I didn't say that M:N threading is always slower than 1:1.

Besides, fibers don't emit essentially the same code as async code. One has a stack, and the other doesn't. That's a significant difference.

If the stack could be sufficiently small, it's not that different from heap allocated async state. But you probably needs segmented stacks, or at least separate stack async preemptible or non-async-preemptible code (has anyone tried making a system like this?)

It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.

> One has a stack, and the other doesn't. That's a significant difference.

They both have some memory area to which they write state. Calling it "a stack" refers to the abstraction in the programmer's mind, not to how the memory is actually written/read. It is true that in order to support recursion, a thread might need to dynamically allocate memory, but so would async/await, except it'll make it more explicit.

> It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.

I don't see any way around the problems of segmented stacks and FFI. There is no way to implement stack growth by reallocating stacks and rewriting pointers in Rust, even in theory. It would break too much code: there is a lot of unsafe (and even safe!) code out there that assumes that stack pointer addresses are stable. In fact, async/await in Rust had to introduce a new explicit pinning concept in order to solve this exact problem while remaining backwards compatible. And when calling the FFI, you have to switch to a big stack, which was an insurmountable performance problem. Rust code by its nature is FFI-heavy; it's part of the niche that Rust finds itself in.

You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching". But it does mean doing backend manipulations in a platform-dependent way. There are several good ways to do this, none of them particularly easy. What is perhaps impossible is allowing FFI code to block the lightweight thread, but async/await doesn't solve this, either.

> You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching".

We tried it. It was too slow.

OK. It really is hard when you're what you call "FFI-heavy" and don't like a significant runtime. So Rust has several * self-imposed* constraints (whether they're all essential for its target domains is a separate discussion, but some of those constraints certainly are) that makes this task particularly hard, but my point is that there is nothing fundamental to n:m threading that makes it slower than async/await, and async/await does fundamentally come at the significant cost of a particularly viral form of accidental complexity.

Fibers under the magnifying glass [1] might be a relevant paper here. Its conclusion, after surveying many different implementations, is that lightweight threads are slower than stack less coroutines.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p136...

No, its conclusion is that fibers with certain properties in C/C++ are slower -- and particularly hard to implement correctly -- than stackless coroutines in C/C++. That's because of the particular characteristics of those languages. In fact, you'll note that the only negative thing he says about Go is that it incurs an overhead when interacting with non-Go code.

And that overhead is a deal-breaker in Rust.

Sure, but that overhead is also not essential, but a feature of Go's particular implementation. Fibers aren't one thing and there are many, many ways of implementing them. As I said before, implementing them for Rust well would have likely required changes to LLVM and Web Assembly, and even then it would be harder than async/await, perhaps to the point of being too hard to be worth it and probably against aspects of Rust's philosophy (I would say that that is the main difference between the two: achieving similar performance is much easier for the language implementors with async/await). But it's just not true that there is something essential about them that makes them slower. After all, you're running all of your code inside a particular implementation of threads.

> Sure, but that overhead is also not essential, but a feature of Go's particular implementation.

The only way to get around the FFI performance problem would be for all fibers to have big stacks. At that point you've thrown away their biggest selling points: high scalability and fast spawning.

> The only way to get around the FFI performance problem would be for all fibers to have big stacks.

I don't know all of Rust's specific constraints, but it is not the case in general. There are two levels for FFI support in this context, based on whether you want to allow FFI to block the lightweight thread (perhaps through an upcall), or not. Only if you want to allow that do you need "big stacks", but even then they can be "virtually big" but "physically small". If you don't, then all you need to do is to temporarily run FFI code on a "big stack", but you know that all the FFI frames are gone by the time you want to block. Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.

> Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.

Rust does this with virtually all FFI.

It's not that crazy. There's an async ecosystem of libraries, you opt in to using them, if you run a blocking method inside an executor then you obviously pay the cost for that. Granted most executors run on multiple threads so you'd only be blocking one thread.

Still, in practice it's just not that hard, use async methods if you're writing async code.

Is it a single threaded state machine?

> Async I/O gives awesome performance

No, it doesn't really. 'Async' is a strictly Python problem, due to the insanity of the GIL. Predictably, the Python solution to it is also insane.

Why you have to turn a sane language like Rust into an insane one by cargo-culting a solution to a non-problem is a mystery to me.

Oh well, good thing at least C++ hasn't dropped the ball.

It’s important to distinguish 3 categories of coroutine implementations: stackful, stackless-on-heap, stackless-as-struct. C++ is stackless-on-heap. Stackless-as-struct is essentially creating an anonymous type (ala lambda), used to save the data across suspension points. This is the approach taken by Rust for its async/await implementation. I believe there were declaration/definition and ABI concerns about this approach for C++:

> "While it is theoretically not an insurmountable challenge , it might be a major re-engineering effort to the front-end structure and experts on two compiler front ends (Clang, EDG) have indicated that this is not a practical approach."

So the short answer seems to be that due to technical debt on the part of existing C++ compilers and their "straight" pipeline, the front-end cannot anticipate the size necessary for some book-keeping information traditionally handled by the code-generator. I'll take Richard Smith's word on Clang.

You are mixing up a lot of stuff here.

- Asynchronous programming is completely orthogonal to Python

- If you don't need it, don't use it. Rust without asynchronous functions still looks and works like before

- By the way: welcome to C++20's coroutines

> - Asynchronous programming is completely orthogonal to Python

Theoretically, yes. In practice, people are just copying Python mistakes word-for-word, in an attempt to score the "coolness" factor that async garnered in the Python community. (Never mind that the "coolness" came from solving a problem that doesn't even exist in other languages...)

If you want to see what asynchronous programming would look like if it weren't copied from Python, look at the recent C++ proposal.

> - If you don't need it, don't use it. Rust without asynchronous functions still looks and works like before

The problem is that every Python library now comes in two versions, regular and 'async'. Even if you don't need or care about anything 'async'.

I guess splintering your code base into two incompatible flavors is just the pythonic way of doing things? Now that the 2 vs 3 insanity is finally dying down, they needed to start a fresh one?

Why is Rust trying hard to repeat these mistakes?

> - By the way: welcome to C++20's coroutines

That I already answered above.

This is big! Turns out that syntactic support for asynchronous programming in Rust isn't just syntactic: it enables the compiler to reason about the lifetimes in asynchronous code in a way that wasn't possible to implement in libraries. The end result of having async/await syntax is that async code reads just like normal Rust, which definitely wasn't the case before. This is a huge improvement in usability.

Why would it be different for async code than sync code? The goal of Rust's checker is to track lifetime of an object so for example it knows that at the end of a function the object should be freed. Async shouldn't matter here.

The point is that Rust's borrow checker can't reason about lifetimes very well over function boundaries. It can reason about coarse things that are expressable in the type language, but everything more nuanced than that, such as reasoning about how control flow affects the lifetimes is limited to inside function bodies.

The difference between synchronous code and async code implemented as libraries is that async code involves jumping in and out of functions a lot, while employing runtime library code in between. A piece of code that is conceptually straightforward, may, in the async case, involve multiple returns and restores. In the sync case it doesn't need to do that, since it just blocks the thread and does the processing in other threads and in kernel land.

Rust's async/await support makes it possible to write code that is structurally "straightforward" in a similar way than synchronous code would be. That allows the borrow checker to reason about it in a similar way it would reason about sync code.

> The point is that Rust's borrow checker can't reason about lifetimes very well over function boundaries. It can reason about coarse things that are expressable in the type language, but everything more nuanced than that, such as reasoning about how control flow affects the lifetimes is limited to inside function bodies.

BTW this is a big pain point for me (unrelated to async). Code like this:

  let ref = &mut self.field;
gets rejected because self.helper_mutating_another_field() will mutably borrow the whole struct. The workaround is either to inline the helper or factor out a smaller substruct so that helper can borrow that which doesn't always look good.

Of course it is preferable that all information needed for the caller to check if the call is correct is contained in the function signature but it truly is frustrating to see the function body right there, know that it doesn't violate borrowing rules and still get the code calling it rejected.

Couldn't you just pass in the (other) borrowed field as an argument for the function? If you need it to work without adding the argument when called outside of the class, you could overload it with a version that borrows the field and passes it to the version that takes the field as an argument, right?

I'm newish to Rust, so this is just an intuitive guess. Please let me know if I'm wrong.

Yes, it works, but feels unnatural. I also don't like the possibility (quite remote, I admit) that the function gets called with a field from another instance.

The latter case should be impossible if the method is private?

In Rust, privacy isn't enforced like that. Private just means that things outside the module can't access them. There's no concept of privacy at the instance level.

Yes. So don't make any methods that could potentially be passed incorrect arguments public.

That still doesn't preclude the possibility that within the module the function gets called with a field from another instance. I think the idea is that by making it a method that just takes a reference to self, it's impossible to accidentally mutate a field on a different instance, while taking a reference to the field itself doesn't prevent the programmer from accidentally calling it with the wrong instance of the field.

Yes, that's the usual workaround.

Yes, encapsulation sometimes conflicts with the borrowing rules. After all, this is not so surprising: enforcing the borrowing rules requires to follow every concrete access to resources, while encapsulation aims to abstract them away.

It's really tricky; there's a tension here between just making your code work, and not making it brittle to further modification.

If you're familiar with this, can you describe some of those new concepts in ... slightly more detail? I say slightly, because I'm still seeking high level explanations, but at the same time I'm curious what new features might be making this async lifetime talk possible.

To further frame that question: I had assumed Rust was implementing Async within the capabilities of normal Rust. Such that, if lifetimes were being managed across function bounds, I assumed the lifetime would have "simply" been bound to the scope of the executor, polling the future you're actively waiting on.

However quickly I can see confusions in that description. Normal lifetimes within a function would need to bubble up and be managed by a parent, since that function's lifetime is, as you put it, being jumped in and out of frequently.

So I imagine that is partly where new features come into play? Giving Rust the ability to take a normal lifetime and extend or manage it in new ways?

I'm not going to elaborate super deeply in a HN comment, but here's the gist: inside a function body, you can take a reference to a thing, and store that reference in a variable. Then if that function were "yields" to the executor, we have a situation that we couldn't have with sync code: in sync code, the "yielding" only happens at the function return, and by that point, all the internal references must be gone, as the function stack frame is going to disappear. But with async, as the yielding can happen without the function actually ending, we have to be able to store the stack frame, and the references stay alive. The new feature helps the compiler to reason about this. Before the "yields" were implemented as just returns, so the compiler didn't allow borrowing over yield points.

Ah hah, that sounds awesome, thanks!

Yea my past experience with Futures in Rust, mostly manually, made them feel like they were "No magic, just Rust" and so the Yields were just a lot of returns.

I had thought the .await impl was largely just standardization around the syntaxes. Something you could do manually, so to say.

The yielding holding the stack frame, in my head then, sounds quite similar to actually returning. However, it's a special return that stashes that stack somewhere for later use. Neat!

The challenge with async code is that the state across yield points is put into a struct and if a reference to a stack variable is used across yield points, then that struct is self-referential which Rust can't reason about (yet?). I honestly do not know who much can be recreated with `unsafe` and `Pin` vs how much is built-in.

I'd love for Rust to eventually get a Move trait (something like C=+ move constructors) to resolve this. Besides some complexity in designing it, there is resistance from some corners about having anything execute on a move.

That is exactly what `async` blocks are about: they do support self-referential structs, and `Pin` is what allows us to use those in a safe way, as `Pin`ned data can't be moved again (unless the data is `Unpin`, which self-referential structs are not), so the self-references are safe.

In Rust, it is normally not possible or at least very difficult to create structs where one field references another, and if you were to create a future that borrows some field, awaits a future, and uses the borrowed field, the resulting future will need to have a field with a reference to another.

This is the challenge and async await lets you make this kind of self referential types without unsafe code.

That's just the "Pin" type, which is heavily used in async code behind (and occasionally in front) the scenes, but is by no means restricted to it.

It's impossible to use the guarantees provided by the `Pin` type without `unsafe` code, except in `async fn`.

Isn't it kind of a poor design choice that Rust will not actually begin execution of the function until `.await` is called? If I didn't want to execute the function yet, I wouldn't have invoked it. Awaiting is a completely different concept than invoking, why overload it?

If you want to defer execution of a promise until you await it, you can always do that, but this paradigm forces you to do that. The problem is then, how do I do parallel execution of asynchronous tasks?

In JavaScript I could do

   const results = await Promise.all([
and those will execute simultaneously and await all results.

And that's me deferring execution to the point that I'd like to await it, but in JavaScript you could additionally do

   const results = await Promise.all([
Where I pass in the actual promises which have returned from having called the functions at some point previously.

So how is parallel execution handled in Rust?

> Isn't it kind of a poor design choice that Rust will not actually begin execution of the function until `.await` is called?

Begin execution where?

If every future started executing immediately on a global event loop, that event loop would need to allocate space for every future on the heap. A heap allocation for every future is exactly the sort of overhead that Rust is trying so carefully to avoid. With Rust futures, you can have a large call tree of async functions calling other async functions. Each one of those returns a future, and those futures get cobbled together by the compiler into a single giant future of a statically known size. Once that state machine object is assembled, you can make a single heap allocation to put it on your event loop. Or if you're going to block the current thread waiting on it, depending on what runtime library you're using, you might even get away with zero heap allocations.

This sort of thing is also why Rust's futures are poll-based, rather than using callbacks. Callbacks would force everything to be heap allocated, and would work poorly with lifetimes and ownership in general.

This! Even if some of us would prefer „hot“ futures that immediately start execution from a conceptual point of view they are simply not possible in Rusts async/await model. Executing a Future requires it to be in its final memory position and be pinned there forever. Otherwise borrowing across await points would not work. But if every future would immediately start in its final place you would not be able to return them, and composition would not be possible. That is why we have a Create - Pin - Execute and Await workflow.

These issues are not an issue in JavaScript and other languages since objects are individually allocated on the heap there anyway

In Rust, you can use a future adapter that does this:

    futures::join!(asyncTaskA(), asyncTaskB()).await
See the join macro of futures[0]. The way it works is, it will create a future that, when polled, will call the underlying poll function of all three futures, saving the eventual result into a tuple.

This will allow making progress on all three futures at the same time.

[0] https://docs.rs/futures/0.3.0/futures/macro.join.html

I don't know Rust but I understand the question if await is the only way to yield execution.

Without a yield instruction its strange to ask "how do I start all these futures before I await" and join does make sense because it does both of those things. But other languages can start futures, yield, reenter, start more futures, and wait for them all while making progress in the mean time.

I'm curious what the plan in there.

Join doesn't do everything. It's just a way to take multiple futures, and return a Future wrapping them all. I think there's a misunderstanding of how Futures and Executors interact in rust here which is why everyone is having a hard time understanding things.

A future in rust is really just a trait that implements a `poll` function, whose return type is either "Pending" or "Ready<T>". When you create a future, you're just instantiating a type implementing that function.

For a future to make progress, you need to call its poll function. A totally valid (if very inefficient) strategy is to repeatedly call the Future's poll function in a loop until it returns Ready.

All the `await` keyword does is propagate the Pending case up. It can be expanded trivially to the following:

    match future.poll() {
        Pending => return Pending,
        Ready(val) => val
Now, when we create a top-level Future, it won't start executing. We need to spawn it on an Executor. The Executor is generally provided by a library (tokio, async-std, others...) that gives you an event loop and various functions to perform Async IO. Those functions will generally be your leaf-level/bottommost futures, which are implemented by having the poll function register the interest in a given resource (file descriptor or what not) so that the currently executing top-level Future will be waken up when that resource has progressed by the Executor.

So if you want to start a future, you will either have to spawn it as a top-level future (in which case you cannot get its output or know when it finishes executing unless you use channels) or you join it with other sub-futures so that all your different work will progress simultaneously. Note that you can join multiple time, so you can start two futures, join them, and have one of those futures also start two futures and join them, there's no problem here.

This is essentially what I assumed and what I believe what ralusek assumes to be the case. This does not change our question.

What would be the syntax for how you would spawn a future, add it to the current Executor, cooperatively yield execution in the parent such that progress could be made on the child, but also return execution to the parent if the child yields but does not complete?

In C# I believe you could simply call

This will schedule the action to the current executor. If you yield execution through await that task will (have a chance to) run. The crux of the question is that the fact that you do not need to await that specific task, you just need to cooperatively yield execution such that the executor is freed up.

    var shortTask = Task.Run(Task.Delay(100).Wait);
    var longTask = Task.Run(Task.Delay(500).Wait);
    await Task.Delay(1000);
    await shortTask;
    await longTask;
In C# this will take ~1000 milliseconds as the child futures yield execution back to the parent such that it can start its own yielding task.

With async-std, you can just do

    let future = async_std::task::spawn(timer_future());
And then do more work. The timer_future will run cooperatively, and the future returned by spawn is merely a “join handle”.

But this is a feature of the executor, not something that’s part of the core rust async/await. With tokio, you’d have to spawn a separate future and use channels to get the return value.

It's the same. :)

    use std::time::Duration;
    use async_std::task;
    let shortTask = task::spawn(task::sleep(Duration::from_secs(1)));
    let longTask = task::spawn(task::sleep(Duration::from_secs(2)));

or simply:


I don't like this at all. Having to rely on futures::join! means that I don't have the flexibility to control the execution of these things unless Rust adds that specific utility, right?

In JS, for example, the `bluebird` library is a third party utility for managing execution of functions. You can do things like

    const results = await Promise.map(users, user => saveUserToDBAsync(user), { concurrency: 5});
And I pass in thousands of users, and can specify `concurrency: 5` to know that it will be execute no more than 5 simultaneously.

Implementation of this behavior in user space is trivial in JS, is it possible in Rust?

The async map you laid out above could be accomplished with a `Stream` in async Rust. You can turn the array into a `Stream`, have a map operation to return a future, and then use the `buffered` method to run N at once:


Not only does the `futures` crate provide most things you'd ever want, it also has no special treatment – you can implements your own combinators in the same way that `futures` implements them if you need something off of the beaten path.

It feels like you're complaining about things in the Rust language without taking the time to understand how the language idioms work. RTFM.

Additionally, you're making snarky comments about how you don't like how the base language doesn't handle something like JS...then reference a third party JS library. Base JS doesn't solve your 'problem' either.

To answer your question, async/await provides hooks for an executor (tokio being the most common) to run your code. You things like that in the executor.


I reference a third-party library to show that you have full control of this stuff in user-space. Implementing Promise.all or Promise.map in userspace is trivial.

I'm not complaining about things without taking the time to understand how the language works, I'm giving examples of things that don't seem possible based off of my understanding of how the language works...in hopes that someone will either clarify or accept that this is a shortcoming.

Rust and JS have very very different execution models. You can absolutely control how many futures are allowed to make progress at the same time fully in userspace. If you're joining N futures, and want to only allow M futures to make progress at a time, make an adapter that only calls the poll function of M futures at a time, until those futures call ready.

Rust gives you all the flexibility you need here. It might not be trivial yet because all the adapters might not be written yet, but that's purely a maturity problem.

The `join` macro does nothing magical. Go check out its implementation, and it will make it obvious how to implement a concurrency argument.

Fortunately, `futures::join!` isn't provided by Rust- it's provided by library code. I am not aware (off the top of my head) of an existing equivalent to your `Promise.map` example, but it is also implementable in userspace in Rust.

The primitive operation provided by a Rust `Future` is `poll`. Calling `some_future.poll(waker)` advances it forward if possible, and stashes `waker` somewhere for it to be signaled when `some_future` is ready to run again.

So the implementation of `join` is fairly straightforward: It constructs a new future wrapping its arguments, which when polled itself, re-polls each of them with the same waker it was passed.

There are also more elaborate schemes- e.g. `FuturesUnordered` uses a separate waker for each sub-future, so it can handle larger numbers of them at some coordination cost.

The macro is just a wrapper around this (and a couple of other functions): https://docs.rs/futures/0.3.0/futures/future/fn.join.html

And from a quick scan of the source it doesn't look like anything there is impossible to implement in userspace: https://docs.rs/futures-util/0.3.0/src/futures_util/future/j...

The join macro is also implemented “in user space”.

Thank you, so what is the syntax that is used in order to execute without awaiting? Do you create a thread for each?

This is where things get tricky: Rust didn't standardize an event loop - it only standardized common traits that allow implementing one, and a way to communicate whether a computation needs more time or already has a result.

If what you want to do is run multiple CPU-bound computation and have a central event loop awaiting the result, then yes, you'll need to spawn threads and use some kind of channel to transfer the state and result. If what you want is to run multiple IO-bound queries, then you'll want to use the facilities of the event loop of your choice (tokio, async-std, etc...) to register the intent that you're waiting for more data on a file-descriptor.

The "proper" way to execute without awaiting it on the current future is usually to spawn another future on the event loop. The syntax to do that with tokio is

    use tokio;
    let my_future = some_future();

You do not create a thread for each- that would defeat most of the purpose of futures.

Instead you call `Future::poll`, which runs a future until it blocks again, and provide it a way to signal when it is ready.

That signal would be handed off to an event loop (which tracks things executing on other hardware like network or disk controllers) or another part of the program (which will be scheduled eventually).

In Rust, everything is in ‘user space’ since nothing is managed by the language itself, you have full control on how the Futures are executed. Even the executor (the “event loop” in js parlance) is a third party library. You can't get more control in js than what Rust gives you.

"At the same time"? How does that happen on a single thread?

It just interleaves execution. This really seems more similar to python's generator/yield semantics.

That's how it is implemented under the covers, it uses the nightly only generators feature which builds a state machine of the entire future.

Executors are not inherently single threaded. They can be, they can also not be.

> Awaiting is a completely different concept than invoking, why overload it?

The wording is a bit imprecise; you can't 'await' something to invoke it, exactly. It also won't begin execution until .await is called. What happens is, at some point, you have a future that represents the whole computation, and you pass it to an executor. That's when execution starts.

There's a join macro/function that works the same way as Promise.all, and is what you'd use for parallelism.

In rust, they separate the concept of a future from how it's actually run. Tokio is one way to run a bunch of futures, and it has many different options.

For instance, I use the CurrentThread runtime in tokio, because I'm using the rust code as a plugin to Postgres, and it accesses non-thread-safe APIs.

What you are asking for is essentially for the futures runtime to be hidden from you. That's fine for some languages that already have a big runtime and don't need the flexibility to do things differently, but doesn't work for rust.

JavaScript automatically starts the tasks and this is bad design IMHO. One loses referential transparency and the ability to run the workflow with different schedulers. It looks like Rust has done a better job.

On referential transparency, I couldn't tell from the blog but are the Rust futures memoized/cached?

If they are, then they're still not referentially transparent. But if they aren't then it might be a bit of a surprise to developers coming from other languages (especially ones not familiar with something like an IO monad).

Rust futures are just types that implement an interface providing a method to advance execution if possible and signal when they have more work to do.

JavaScript does not automatically start the tasks. It executes the function if you...execute the function. If you just want to pass around something with deferred execution, you can just pass the function around, or wrap it in a closure.

If I understand correctly, applying a JavaScript Async function is effectful and starts the background tasks. A better design is for the async function to yield an Async computation when applied. These computations can then be composed and passed around with full referential transparency. Only when we run the final composed Async computation, should the effects happen (threads running our computation). This is how the original F# asynchronous workflows were designed, which predate the C# and JavaScript implementations. Thankfully Rust works this way too!

It does automatically start the tasks. JavaScript asyncs are "hot". You can simulate "cold" asyncs using a function, as you describe, but in other languages this is how they work by default.

I’m not sure exactly what hot is here, but if that’s the case then all javascript functions are hot. It’s behaving consistently with any other function. I assume the point here is that Rust in this case changes the way function calls work based on async, which is, well, inconsistent.

An async function immediately returns a Future to the caller, so in that sense it is fully consistent with a sync function that immediately returns a value.

If you want the asynchronously computed value, then the async equivalent to "foo()" is "foo().await", and that is also fully consistent - the function body starts running, and returns the value once it's done.

But there's no sync equivalent of invoking an async function without awaiting it, which is where the hot/cold distinction manifests. Thus, there's no inconsistency, because there's nothing to be consistent with.

Are you sure? How would the JavaScript functions execute simultaneously on a single thread?

Async is about interleaving computations on a single thread.

Javascript has two differences that contribute here:

* First, it runs the callee synchronously until the first await, which can fire off network requests, etc.

* Second, continuations are pushed onto queues immediately- the microtask queue that runs when the current event handler returns, for example.

Rust does neither of these things:

* Calling an async function constructs its stack frame without running any of its body.

* Continuations are not managed individually; instead an entire stack of async function frames (aka Futures) is scheduled as a unit (aka a "task").

So if you just write async functions and await them, things behave much more like thread APIs- futures start running when you pass a top-level future to a "spawn" API, and futures run concurrently when you combine them with "join"/"select"/etc APIs.

> * Calling an async function constructs its stack frame without running any of its body.

This is actually possible to do by using async blocks instead of async functions. E.G. you can write this:

    fn test() -> impl Future<Output = ()> {
        // Run some things directly here
        println!("I am running in the current frame");
        async {
            // Run some things in the await
            println!("I am running in the .await");

They're asynchronous tasks, there would be no point in using promises with synchronous tasks. So we're not talking about simultaneous execution of functions that are making use of the single thread, we're talking about serial execution of functions on a single thread which are basically doing nothing more than scheduling an asynchronous task and not bothering to wait for the result.

Think of it this way. I have 3 letters I need to send, and I'm expecting replies for each. A single threaded, synchronous language, would basically send the first letter, wait for the reply, send the second letter, wait for the reply, then send the third and wait for the reply. In JS, you're still single threaded, but you just recognize that there is no point in sitting around and waiting before moving onto the next item. So you send the first letter, and when it would be time to wait, you continue executing code, so you immediately send the next letter, and then finally send the 3rd.

How they're scheduled simultaneously on a single thread is exactly what makes JS so fast for IO. Once it starts making an http call, db call, disk read, etc, it will release the thread to begin execution of the next item in the event loop (which is the structure JS uses under the hood to schedule tasks).

So what really happens is when we call

JS will go into `asyncA`, run the code, and at some point it will get to a line that does something like "write this value to the database." This will be an asynchronous behavior, that it knows will be handled with a callback or a Promise, so it will immediately continue execution of the code. So now it pops out of executing `asyncA` and executes `asyncB`, meanwhile the call to the DB has gone out and we don't care if it has finished, we'll await both of these when we need them.

Async doesn't have to be single-threaded generally. JavaScript is limited to a single-threaded model, but in other languages (like clojure), futures are tasks run on some other thread. I don't use rust and this article doesn't make any mention of threads so I'm not really sure what's going on here.

> futures are tasks run on some other thread.

This is usually false, in Rust and elsewhere. Futures are tasks that run on whatever thread is scheduling them. They are an even lighter-weight version of green threads/fibers/M:N.

But only if you await them in that thread. If you `spawn` them, they can be executed in another free thread; although you'll miss the return value in this case.

So for people new to async/await in rust, how a web server usually works:

Request/response future is spawned to the runtime, and a thread decides to take the future for execution if the runtime uses multithreading.

Inside of this request/response future whatever futures are .awaited will continue the execution in the same thread when ready.

Users can also spawn futures inside the request/response future, which might go to another thread. The result of this future is not available for the spawning thread.

Some executors have special functions, such as `blocking` that will execute the block in another thread, giving the result back to the awaiting thread when finished.

Funny fact: code in a second snippet (awaiting on promises invoked before) can result in node throwing `uncaught promise rejections` errors.

More info: https://dev.to/gajus/handling-unhandled-promise-rejections-i...

Those don't actually execute simultaneously in Javascript, though. In Javascript, C# and I suspect Java, each usage of await generates a state machine that can _defer execution_. So while one function is blocked, another can continue to execute. The concept is the same in Rust, except in Rust a nested await does not generate a new state machine. The entire async operation just uses a single state machine.

The reason Rust does this is because its priorities are different than these other languages. Rust is built around zero-cost abstractions because it is intended to be as fast as possible while still safe. That's one of many reasons why Rust is considered a systems language and Javascript is not. They're for different things.

For more on how async in Rust works, I'd invite you to read the actual manual on the subject (linked from the announcement): https://rust-lang.github.io/async-book/06_multiple_futures/0...

Java actually doesn't have async/await as a language level feature. It does have the Future interface, but it's just an interface, and not special in any way.

Got it, thanks for the info!

This is just the lazy way of handling tasks. Rust allows you to join two futures, and awaiting on the resulting future will run the underlying two in parallel and wait for them.

In a language like Haskell where everything in lazy, it's not uncommon for someone to model their logic in such a way that computations that are not necessary are never run but appear to be used in code anyway.

Depending on what you want to do, it's also possible to start threads/green-threads (using something like Tokio), and use message passing for async tasks where you do not need to process the result synchronously.

Python has the hybrid approach. Just calling an async function only returns a future but it's execution hasn't been started yet. As with Rust it will only start when you await it. However there is also a way to execute it in the background before the await by calling create_task(future).

I found that most code that actually use create_task tend to be super complex and often bug-ridden since eventually you will have to await it anyway, and it is easy to miss this, which will leave errors unhandled, especially propagating cancel() to all these executing futures floating in air.

You would use the join method combine the futures (https://docs.rs/futures/0.2.1/futures/trait.FutureExt.html#m...).

Both will execute at the same time and you'll get a tuple of results.

Note that your first promise doesn't actually do anything in parallel. It just schedules all of those tasks at once. Execution will still be concurrent.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact