Hacker News new | past | comments | ask | show | jobs | submit login
Inside Rust's Async Transform (nemo157.com)
203 points by Nemo157 on Dec 9, 2018 | hide | past | favorite | 75 comments

Async/await pattern always confuses me, someone please let me know if I get this right:

First, async/await does NOT mean "threading" or "multiprocessing" or "concurrency". It simply means "using a state machine to alternate between tasks, which may or may not be concurrent." Right?

Further, in Javascript, futures and async are utilized heavily because we so frequently need to wait for IO events (i.e.: network events) to complete, and we don't want to block execution of the entire page just to wait for a IO to complete. So the JS engine allows you to fire off these network events, do something else in the meantime, and then execute the "done" behavior when the IO is complete (and even in this case, we might not be concurrent, because ).

That makes sense to me.

But say I have written something in Rust that makes use of async/await. And say there is absolutely no IO or multithreading. Say I have some awaitable function called "compute_pi_digits()" that can take arbitrarily long to complete but does not do IO, it's purely computational. Is there any benefit to making this function awaitable? Unless I actually spawn it in a different thread, the awaitable version of this function will behave identically to if it were NOT awaitable, correct?

And one last idea: the async/await pattern is becoming so popular across vastly different languages because it allows us to abstract over concepts like concurrency, futures, promises, etc. It's a bit of a "one size fits all" regardless of whether you're spinning up a thread, polling for a network event, setting up a callback for a future, etc?

I think the JS world might be particularly excited about async/await because other people just escape to threads when they don't want to block the whole thing on IO.

Both in JS or Rust, you don't gain anything just by declaring your thing to be async, or awaitable. Your function needs to be built around some kind of "primitive" that explicitly supports the "do something else in the meantime" mechanism. Using "await" on that thing lets your function piggyback on its support, but all your explicit, normal code is synchronously blocking as usual.

In Rust, I think it's a fairly established pattern to turn blocking code, where the blocking part is not some IO action that has explicit support for the futures mechanism, into a asynchronous, awaitable function by punting the work to a threadpool. That makes sense for CPU-bound work as well as IO done by libraries that don't support futures or things like disk IO where the OS might not actually have decent support for doing it in a non-blocking fashion.

I'm not sure if it's the canonical mechanism, but this crate seems to implement what I'm thinking of: https://docs.rs/futures-cpupool/0.1.8/futures_cpupool/

The latest tokio has a work-stealing threadpools, so you don’t need to explicitly do this anymore, IIRC.

Neat! Do you know if there's anything in there resembling the rayon::join API? I didn't see anything on a cursory look.

I am not sure, to be honest. I've been waiting until everything settles down to really dig in.

> Both in JS or Rust, you don't gain anything just by declaring your thing to be async, or awaitable.

I'm not familiar with Rust but in JS you do gain something. Just the fact that the function is async means that it now explicitly returns a promise, which means that anything awaiting that promise will be in a new execution context and will definitely not run synchronously.

> Is there any benefit to making this [compute_pi_digits()] function awaitable?

If you introduce suspension points in that (e.g. every 100 computed digits), then you can co-schedule other tasks (e.g. a similar `compute_phi_digits`) or handle graceful cancellation (e.g. if a deadline is exceeded, or its parent task aborted in the meanwhile).

> And say there is absolutely no IO

Well, then we can simply optimize away your entire program as it does nothing and running it has no side effects.

Even if your program is entirely CPU bound, there are uses for writing in an async/await style. As an example, parsers can be quite natural to write in that style.

Or you can use it to have multiple computations running at the same time, and give updates on their progress. It is voluntary time slicing, which is significantly less overhead than the OS doing time slicing for you.

CPUs do the physical calculation and the number of cores determines the number of calculations that can happen at the same time.

Software threads, from the OS to your application, run on one of these cores for a certain amount of scheduled time, then get switched out with some other thread. If threads are waiting on IO then they're not making much use of the time they get and are wasting CPU capacity.

An older approach is to just make more threads and switch them out faster, but this is very inefficient. Await/async is a way to let threads not get stalled by a single function and switch to a different function in that process that does have work available. It's basically another step of granularity in slicing CPU time within a thread.

The keywords do not force anything, they are just signals to the underlying software that it may pause and come back later if necessary, along with setting up state to track results. Some methods may still run all on the same thread if there's nothing else to do, or if the async result is already available and there is no waiting needed.

Most async/await is usually built on top of yield, generators, promises or other constructs that basically are state-machines or iterators.

Incremental computation can be useful for better responsiveness even if you only have one thread. A simple example in JavaScript would be a Mandelbrot viewer, where you don't want to lock up the UI doing a heavy computation. So, you could have something that looks like an async call that really does the heavy computation in small chunks using idle callbacks, and the future completes when it'd done. (Using a background worker thread is probably better though.)

The async call itself doesn't return intermediate results, though, so you'd have to handle that a different way. And if you want to cancel the task, you need another way to handle that too.

Something like computing the digits of pi would be better represented by a stream or iterator since the caller should decide when it's done.

That's called cooperative multitasking, and it was roughly how things worked in the Windows 3.1 era. Nowadays it's much better to use real threads, as they much better protect against latency (responsiveness) issues.

Concrete example right on the Dart homepage: https://www.dartlang.org/

The correct way to deal with that problem is to decouple the UI from the computation, once process for each would be the ideal, anything less is going to messy and require all kinds of hacks to give the same outward appearance. Better still: one supervisor process and one for UI and computation each.

Well, in a web app you don't get to make those choices, but you could use a web worker.

It's not about IO, it's about any long-running operation that can happen in a background thread. JS developers just think it is about IO because that's the only thing (web workers notwithstanding) in JS-land that can run in another thread.

It would be misleading for consumers of the interface to mark compute_pi_digits awaitable -- at least in .NET world that have quite a lot of experience with async-await at this point.

See http://blog.ploeh.dk/2016/04/11/async-as-surrogate-io/ for further discussion. Regular programmers are not using Task<T> as IO-monadic marker consciously, but they are surprised when a usage differs from that model.

Look into continuation-passing style. Semantically, async/await is much like syntactic sugar for CPS (or rather futures, but at the most basic level they can be thought of as single-shot continuations).

But ultimately, to make use of async, you need async primitives - something that lets you say "do this in the background somehow, and let me know once you're done". Any async/await call should ultimately end at one of those primitives, and it's at that point that another call might get interleaved. If you don't actually do I/O or anything else that can do a non-blocking wait, you're not getting anything useful from async.

It's only semantically CPS in as much as all code is semantically CPS. Thinking about the parallels with CPS does nothing to help, here. Async/await is just like threaded code with the blocking/concurrent calls specially marked.

Except it's not like threaded code, because there aren't necessarily multiple threads. And with a single thread event loop, you don't need locking and other synchronization mechanisms at all, further reinforcing the point.

Async/await is literally all about explicit continuations. It's not about concurrency or parallelism per se, although it can be used in that context.

That's how I understand it.

> is very different to other well-known implementations (C# and JavaScript [...]). Instead of performing a CPS-like transform where an async function is split into a series of continuations that are chained together via a Future::then method, Rust instead uses a generator/coroutine transform to turn the function into a state machine.

C# async/await is also very much resumable state machines

Yes, the state machine generation aspect is similar.

However the execution aspect is a bit different: In C#, once a leaf future/Task gets resolved, it will in many cases sychronously call back into the state machine which awaited the task Task (by storing a continuation inside it). A whole promise chain might resolve synchronously directly on the stack of the caller. And "in many cases", because the whole thing depends on some very subtle properties like whether a SynchronizationContext or TaskScheduler was configured.

In Rusts task system a leaf future will never call back into the parent. It will always only notify the associated task executor, that it can retry running/polling the Future to completion. When the task gets executed again, it will run again from the normal scheduler thread in a top-down fashion.

This makes Rusts system a little less performant for some use-cases, but also a lot less error-prone (no synchronization issues because it's not known where some code runs). It also is one of the key ingredients for avoiding allocations on individual futures.

Javascripts system is closer to the C# mechanism, but avoids the error-prone part: When a leaf future is finished, it will lead to calling the continuation of the parent future. However this is never done synchronously, but always on a fresh iteration of the eventloop (to avoid side effects). That works fine for Javascript because the eventloop is guaranteed (it's not in C# async code), and Futures are on the heap anyway.

Tried to clarify this, it's not that C# and JS are implemented as CPS-like, rather that that's how they're commonly explained; and because of the GC this difference is not really externally visible, whereas with borrowing in Rust it would be.

You can use ILSpy and disable async decompilation to see that C# creates a state machine.

IDK about JS runtimes, but this is very similar to the approach by most JS transpilers.

They will use language-level generators if compiling to ES 2015, or user-land generators if compiling below that.

Same for Kotlin.

Off topic, but I’d just like to point out how blindingly fast this site loads: it loads quite literally instantly for me (I’m on mobile so I can’t give precise figures) but I don’t think I’ve ever used a site that loads that fast ever before.

Is the website author here? What are you running server side that’s giving such great performance?

Only two requests: favicon and html page itself. No front-end framework, no tracking. The only JS is the 10 lines necessary for the buttons in the upper right corner.

Gosh I wish more websites were this minimal. It's refreshing to use something so streamlined.

What cokml19 said, it’s all about minimising the work. This is just a normal Jekyll site hosted on github pages, but with just minimal css and js inlined into the page. There are actually a few more lines of js hidden around the place, e.g. for adding the “play” buttons onto the code snippets, but it’s all simple library-less code.

I got lost in the weeds fairly quickly with this blog post, why is it that you didn't have to implement Future?

Pinning is required here because your AsyncRead read_to_end returns a future bound by some reference lifetime?

The actual Future implementation comes from std, std::future::from_generator takes a generator with the right associated types and turns it into a future.

Yep, the generator created by quote_encrypt_unquote is creating internal self-references from the future created by read_to_end into the AsyncRead it's storing in its environment, while this is happening the AsyncRead must not move and therefore the generator must not move, which is what pinning represents.

Doesn't both JS (via Babel) and C# implement asynchronous functions as state machines in a similar fashion?

Yes, Babel's transform of `async` to ES5 is similar. It turns has to turn a function into a state machine. Rust goes a step further and turns the entire chain of futures into one large combined state machine.

But the rest of the event-loop machinery is quite different. JS's async is still fundamentally callback-based. Rust's futures are polled. In JS there's a single global event loop and promises run automagically. In Rust you create futures managed by their executors, each handling its own kind of tasks (CPU pools, network polling).

Thank you for the detailed explanation!

Would you mind elaborating on the polled aspect of rust futures, or link me to some documation? Do you mean that there is a loop polling the result of a future? How does that work with things like select?

Futures do nothing until the poll method is called. That method returns an enum, saying if it’s ready, pending, etc. the event loop calls poll repeatedly.

We have some really in depth docs in the works here but it’s not quite ready yet.

Yes, the executor is that loop. Here is an older blog about Rust futures that is still relevant: http://aturon.github.io/2016/09/07/futures-design/

Could generating a single combined state machine result in code bloat? (Similar to inlining.)

The state machine here is a struct given to `poll()` methods that advance the state. It's similar to iterator structs that have `next()` calls.

It could cause code bloat, but in practice these are small functions that get inlined and optimized out to almost nothing (sometimes even the whole struct disappears).

One difference that may exist is that in Rust, async fns don’t immediately execute, they simply create one of these values. I forget if JS and C# do something different, that is, the execute up until the first suspend point. This was one of the major design decisions we’ve made that’s different than other languages.

Just for comparison, Dart started out with async functions that suspended immediately, but in Dart 2, they switched to running synchronously to first await for performance (fewer unnecessary suspends) and to avoid race conditions.

Without this, you sometimes had to write a write a wrapper function that does some synchronous setup and returns a Future, which was a bit annoying for stylistic reasons.

There's an interesting but somewhat old discussion here:


I wonder if anything changed since then? I'm not a Rust programmer so I didn't really understand the article.

Immediately executing in Rust doesn't work very well for a few reasons. One of them is "pinning". In order to "start" a future, it must be pinned to a certain memory location and must never be moved from there anymore. If a future would start directly from the call which returns it, it wouldn't be possible to move the Future anymore. E.g. return it from another function, store it in a Vec<Future>, etc.

I found out it works also better together with some other features, like "select!" and the current cancellation mechanics. But I can't remember all the details right away, and it might be pretty hard to explain.

That said the choice makes sense for Rust for those reasons, but is not a general one! I think for Javascript, C# and Dart synchronously executing until the first await point is easier to understand. I e.g. felt that "async" in F# was a bit harder than in C# due to the non immediately executing property.

A Rust async function doesn't work like Dart 1- it doesn't suspend immediately (i.e. return back to the event loop to be scheduled), it just doesn't run to the first await until it's actually awaited or otherwise `poll`ed itself.

So the performance concern simply does not apply- Rust suspends the same number of times as Dart 2. The race condition concern might, depending on how you look at it, but in return the execution model from within a Future is much more straightforward and predictable.

Yes, that was a big bit of feedback I was very happy to see.

I don’t 100% remember, but I think some of the details for us still ended up significantly different. There are so many ways to implement this stuff...

No you're correct, C# async/await synchronously executes up to the first yield point. In general the tasks are 'hot' in C#, as opposed to F#'s 'cold' Async type.

That’s true for `async` methods, but keep in mind that you can `await` any value in C# that implements the ‘awaitable’ pattern. A custom ‘awaitable’ can perform its own scheduling however it likes, including when it begins its execution.

The pattern-based implementation of `await` is, in my view, the coolest part of the async/await feature set.

C++ follows up on the same idea.

Though F# Async is quite a bit different in that it is the computation rather than a cold invocation. One can use Tasks in F# of course but last time I checked, it will only use a CPS transformation rather than the more efficient state machine representation C# has (F# does use state machines for seq expressions so it’s not a fundamental limitation but more a matter of work).

Aside: This can be a source of errors in F# code I’ve seen as folks will mix up the cases where they want to result vs running a computation many times. There certainly is plenty of expresivity but in practice it’s a muddier representation (it took me over a year to appreciate this).

JS starts immediately but even if the function's return value is "ready" synchronously, you can't get its value back synchronously.

I've never thought about the execution order before but that makes a lot of sense. Thinking about it now, an async function is just sugar for wrapping the return in a Promise (besides the async/await restructuring). And constructing new Promises works that way exactly.

According to this blog post, no:

"... performing a CPS-like transform where an async function is split into a series of continuations that are chained together via a Future::then method"

they are referring to c# and js implementation of promises/futures here

> Instead of thinking of a CPS-like transform where an async function is split into a series of continuations that are chained together via a Future::then method, Rust instead uses a generator/coroutine transform to turn the function into a state machine

State machines are also what Clojure(Script) core.async uses.

(Easy choice as there are no continuations available)

I'll take fibers that yield automatically on blocking operations over async/await most days for most tasks. It's slightly less flexible, since you can only wait for one async action at a time per fiber; but a pleasure to use in comparison. But for that you need fibers built in.

Go sort of does the same thing, but insists on running fibers in separate threads at its convenience; which means giving up the lovely simplicity of cooperative multitasking for the same old multi-threaded circus.

I'm unfortunately not aware of any languages more recent than Smalltalk that get this right. My own baby, Snigl [0], is just getting to the point where it's doable.

[0] https://gitlab.com/sifoo/snigl

The problem with fibers is interop. The moment you need to do some FFI, especially FFI that involves callbacks, things get a lot more complicated, since code you're calling into/through doesn't have any of that fiber magic (and, depending on how you implemented yours, it might actually break it).

Another win for embedded languages/inside-out FFI in my book, since controlling the outside world (C in Snigl's case) makes it trivial to register a trampoline with whatever state needed to deliver the call.

Sure, but now you need C code that needs to be aware of your trampoline, specifically. Good luck if it's an existing library. Also, what happens if there are multiple interleaved C parts of the stack, and the innermost one invokes the trampoline? What happens to the ones in the middle? It all sounds awfully like setjmp/longjmp (which is the one thing that you never do in C if you want to interoperate with anything in a sane fashion).

And I don't think FFI direction matters much. The moment you have callbacks, your stack has interleaving of languages anyway (i.e. X called into Y which called back into X). Does it really matter which language the innermost and the outermost stack frames belong to? You still need to handle the mix in the middle.

Why? You really only need the C library to be able to pass a data pointer to the callback, and most do.

This sounds like a native compiler perspective to me; with pure VM fibers like Snigl's these are not issues.

It's not about direction, it's about controlling the world from the outside.

You sound more like you're on a mission to prove to the world it's impossible, since Rust didn't manage to get it right.

What happens with all the interleaving stack frames when that callback gets invoked?

Have you given Erlang a look? What it calls processes sound like what you call fibers; although it does have quasi-premptive yielding as well as yielding when waiting for a message. Since Erlang is built on async message passing, you can easily send a request and block on the response, but you can also send multiple requests and process the responses as they arrive.

Sure, I played with Erlang back in the days. Preemptive kills the whole idea, which is why you can't share values. And it's probably the least integrated language I've come across.

Wren and I'm guessing Lua have fibers that work similarly.

Lua has fibers? I thought they settled on throwing coroutines over the fence, which is very much not the same thing even though you can sort of build fibers using them. I'm pretty sure Wren's fibers aren't integrated with the IO-layer, but I'd be happy to be wrong there.

(shameless plug) The state machine approach is one I used in an old sweetjs library called cspjs[1] which implemented async tasks into JavaScript well before async/await. I still think there are some good ideas left there - especially async error handling and native "data flow variables".

[1] https://github.com/srikumarks/cspjs

Why this instead of "green threads"? Runtime overhead/footprint?


Bikeshedding: I think Solarized light/dark color themes have insufficient contrast ratios.

wow... talk about informative.


It isn't about “memes” or “ignorant coders.” Please grow up.

The way goroutines work isn't compatible with Rust's core language goal of zero-cost abstractions. In order for a goroutine to suspend and resume at arbitrary points in otherwise normal functions, each goroutine needs its own stack. This is convenient but comes at a cost, and I certainly wouldn't say that it matches reality or is close to the system.

The way Rust implements async ensures that there is zero overhead. You can think of the compiler as constructing an elaborate state machine. Each task is a normal structure in memory, just a few bytes rather than an 8 kb stack. This means that by paying the cost of dealing with a somewhat invasive language feature, Rust async code is more efficient than the equivalent Go code.

This is part of Rust's core design. The language should be as convenient or productive as possible, but never at the cost of performance or efficiency even if that cost is very small.

Those lose zero-overhead interop with C, which is a core constraint for Rust.

Your guesses are incorrect.

> lose zero-overhead interop

How many LOC is tokyo nowadays?

I don’t understand how that’s relevant.

To me, at least, this means that the overhead is substantially greater than zero.

That’s not what that phrase means.

Tokyo is a city

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact