Hacker News new | past | comments | ask | show | jobs | submit login
How to think about async/await in Rust (cliffle.com)
191 points by mpweiher 10 months ago | hide | past | favorite | 258 comments



Not specific to rust, but I think asynchronous programming in general is a hype.

It didn't start because it is so awesome, it started because JS can't do parallel any other way. That's the long and short of it. People wanted to use JS in the backend for some reason. The backend requires concurrency. JS cannot do concurrency. Enter the event loop. Then enter some syntactic sugar for the event loop. And since JS is popular, async became popular.

Code written using threads is, at least to me, much more readable and easier to reason about. Each section in itself is just synchronous code. The runtime/kernel take care of concurrency. The overhead is negligible in a day when we have greenlet implementations. It works for both i/o bound concurrency and cpu bound parallel computing. It doesn't require entire libraries rewritten to support it. There is no callback hell. It scales both horizontically and vertically. Modern languages support it out of the box (Hello `go` keyword).

I realise that this is going to get a lot of downvotes. I don't really care. To me, async is just "cooperative multitasking" with a quick paintjob. We left behind that paradigm in Operating Systems decades ago, and with good reason.


Your comment seems to be conflating concurrency with parallelism.

JS doesn't have any language-level abstractions for parallelism (async or not) but you do have Web Workers[0] and process forking (depending on runtime) to get actual parallel programming. JS async deals with concurrency, not parallelism.

Threads are the opposite: They are interfaces for parallel programming and their use is orthogonal to how your application handles the resulting concurrency.

You say "the runtime/kernel take care of concurrency" - are you telling me you never write a mutex or implement locking logic? Because that's what "taking care of concurrency" is. I'd choose refactoring callback-/Promise-hell over debugging a complex deadlock any day (unless intra-process parallelism is actually a requirement, which may tip the scale in the other direction).

In the context of doing concurrency and parallelism in Rust, I'd 100% agree that the JS/C#-style async/await approach isn't necessarily always the best approach and it's good to consider alternative idioms if your requirements call for it. For anyone writing "apps" or glue-libraries, though, I'm thankful that they stay away from spawning threads all over my system by default and that they need more tools than "learn Rust async in 15 minutes" gives them to become dangerous.

Messing up your single-threaded event-loop concurrency can hog roughly a single CPU core and cause OoM. Messing up thread-based concurrency can have larger implications on the hosting system.

[0]: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...


> Your comment seems to be conflating concurrency with parallelism.

No, it really doesn't. I mention both concurrency and parallelism, and their main difference.

> Threads are the opposite: They are interfaces for parallel programming

No, they are not. Threads can do both. When waiting for an i/o bound operation, a thread can simply sleep. Added bonus: A thread basd implementation supports io bound concurrency and cpu bound parallelism using the exact same principle, and letting the kernel/runtime take care of the details.

> are you telling me you never write a mutex or implement locking logic?

Pretty sure I never said that. As for what I prefer to debug: Most Mutex-based synchronicity tasks that come up in practice are easy. And if "complex deadlock" does occur, it's usually pretty clear what resource was locked. Debugging that is just a question of going over all callers that access that resource.

And as mentioned before, all that code is synchronous. So each one of them is easy to reason about.

So yea, all in all, I prefer debugging problems arising from deadlocks over wading through callback-hell. By a huge margin.

Oh, and all that is before we even talk about using CSP as an approach to synchronizing threads, which makes it both harder to mess up, and again easier to reason about.


> It didn't start because it is so awesome, it started because JS can't do parallel any other way. That's the long and short of it

If the comment is not conflating concurrency and parallelism as you say, mind expanding on this part?


But concurrency is just "single core parallelism" anyways, so this isn't really germane to the discussion.

JS has neither.


Concurrency is not "single core parallelism". Concurrency describes tasks/threads of execution making independent progress of each other. Parallelism describes tasks/threads actually running at the same time.


>Concurrency is not "single core parallelism"

Of course it is. Concurrency gives the impression to the user that parallel processing is being done, even when it's not. That's why my parents old 386 could render a moving mouse cursor and a progress bar at the same time (usually).

Concurrency lets you do things "in parallel" even if you can't actually do them in parallel.


> mind expanding on this part?

Certainly. That part is a sentence written to be short and catchy. It sacrifices precision for reasons of brevity and style. It also doesnt mention either concurrency or parallelism, it just uses the word "parallel".

This is acceptable, because the post goes on to more precise statements later on, quote:

    It works for both i/o bound concurrency and cpu bound parallel computing.
End Quote.


> When waiting for an i/o bound operation, a thread can simply sleep.

I mean if you're fine with blocking I/O then obviously you don't need async, but on the other hand having non-blocking I/O is the whole point of async ^^


It really just depends on what you mean by non-blocking I/O.

Most node code I see in the wild is just a simple `await loadData()` which doesn't block the main node thread but does block that code flow until the data returns. This is roughly the same as what would happen in a normal blocking multithreaded language other than the extra overhead of a thread. If you don't have enough threads (or they are efficent enough in your language of choice) for this overhead to be an issue then you are adding all this complexity for almost no benefit.

Basically it comes down to if you trust your language of choice's threads more or less than your language of choice's event scheduler. Since Node is fully single threaded there isn't really an option but with other languages, a single thread per worker is much simpler.

In python it is even more opaque which to use as the CPython itself is singled threaded so you are comparing its thread implementation to its event scheduler implementation. For this small win you get to rewrite all your code to new, none-standard apis.


> It really just depends on what you mean by non-blocking I/O.

> Most node code I see in the wild is just a simple `await loadData()` which doesn't block the main node thread but does block that code flow until the data returns.

Agreed. Higher level languages tend to discourage or outright decide not to expose asynchronous I/O. Instead, they optimize blocking I/O within their own runtime - skipping the higher resource needs for the system schedule and thread representation.

If I am writing a web server in C or C++, I'm likely writing asynchronous I/O directly. I may also decide to use custom memory strategies, such as pooling allocators.

If I write one in classic Java, I'm allocating two threads to represent input and output for each active connection, and hoping the JVM knows how to make that mass of threads efficient. In Go, I'm likely using a lot of goroutines and again hoping the language runtime/standard library figured out how to make that efficient.

Java packages like NIO/Netty and Go packages like gaio are what expose asynchronous programming to the developer.

The footgun is that it is hard to use an asynchronous I/O package when you have a deep dependency tree that may contain blocking code, perhaps in some third party package. This was one of the attractions to server-side javascript; other than a few local filesystem operations, everything sticks to the same concurrency model (even if they may interact with it as callbacks, promises or async/await)


I've seen a lot of people who seem to think all blocking IO completely blocks the entire OS process.

A language + runtime like Go or Erlang doesn't so much have "blocking" or "non-blocking" IO as the terms simply not applying. I see them yielding far more confusion than understanding when people try to come from Node and apply them to such threaded runtimes.

But if you had to force a term on such a system, the better understanding is that everything in a large-number-of-threads language+runtime is non-blocking. Both terms yield incorrect understanding, but that one gets you closer to the truth.


> Threads [...] are interfaces for parallel programming

Threads are both for parallelism and for concurrency. Threads have been used for decades on machines without hardware parallelism.

> are you telling me you never write a mutex or implement locking logic

aside from the fact that mutexes vs futures is completely orthogonal to async vs threads, I definitely prefer dealing with mutexes. 99% of mutex usage is trivial and deadlocks are relatively easy to debug. The issue with locks is their performance implicaitons.


async/await doesn't entirely remove the need for mutexes and locks. We still need them if we have multiple coroutines using a shared resource across multiple yield points.


> We still need them if we have multiple coroutines using a shared resource across multiple yield points.

We still need them if we have multiple parallel tasks (coroutines spawned non-locally) using a shared resource across multiple yield points.

As long as the accesses to the shared variable are separated in time, sharing is fine.

This is correct code:

        let mut foo = 1;
        async { foo += 1 }.await;
        foo += 1;
        println!("{foo}");
See - a shared variable used across multiple yield points. Another (more useful) example I showed below in another post with `select!`.


the equivalent threaded code wouldn't need a mutex either:

   int foo = 1;
   std::thread ([&] { foo+=1; }).join();
   foo+=1;
   std::cout <<foo <<'\n';
(sorry for the C++, I don't speak much rust).


Point taken. What about this pattern (pseudo code, obviously it would require e.g. adding some code for tracking how much data there is in the buffer or breaking the loop on EOF, but it illustrates the point):

   mut buffer: &[u8] = ...;
   loop {  
     select! {
       _ = stream.readable() => stream.read(&mut buffer),
       _ = stream.writable() => stream.write(&mut buffer),
     }
   }


One you add enough tracking meta data to to know how much there is in the buffer, you literally have implemented an SPSC queue.


Well, not really, because async/await guarantees I don't have to deal with the problem of producer adding data at the same time as consumer is removing the data in this case. In a proper SPSC queue some degree of synchronization is needed.


You stop adding data when the queue is full, you stop popping when it is empty. You need the exact same synchronisation for async, just different primitives.


But that's not synchronization between two concurrent things. I can still reason about queue being full in a sequential way.

   select! {
     _ = channel.readable(), if queue.has_free_space() => read(&mut queue),
     _ = channel.writable(), if queue.has_data() => write(&mut queue),
   }
The point is I can implement `has_free_space` and `has_data` without thinking about concurrency / parallelism / threads. I don't need to even think what happens if in the middle of my "has_free_space" check another thread goes in and adds some data. And I don't need to invoke any costly locks or atomic operations there to ensure proper safety of my queue structure. Just purely sequential logic. Which is way simpler to reason about than any SPSC queue.


As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

In any case acq/rel is the only thing required here and it is extremely cheap.

edit: in any case we are discussing synchronization and 'has_free_space' 'had_data' are a form of synchronization, we all agree that async and threads have different performance characteristics.


> As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

I don't think it is a universal solution. What if I am interested in parallelism as well, only not for the coroutines that operate on the same data? If my app handles 10k connections, I want them to be handled in parallel, as they do not share anything so making them parallel is easy. What is not easy is running stuff concurrently on shared data - that requires some form of synchronization and async/await with event loops is a very elegant solution.

You say that it can be handled with an SPSC queue and it is only one ack/rel. But then add another type of event that can happen concurrently, e.g. a user request to reconfigure the app. Or an inactivity timeout. I can trivially handle those with adding more branches to the `select!`, and my code still stays easy to follow. With threads dedicated to each type of concurrent action and trying to update state of the app directly I imagine this can get hairy pretty quickly.


Don't you need some kind of way of telling the compiler you would like barriers here? I think otherwise the helper thread could run on another cpu and the two cpus would operate on their own cached copies of foo. But then again I'm not 100% on how that works.


There are barriers for join. But without barriers, the risk is compiler reordering/lift to registers/thread scheduling. The CPU cache would not be the direct cause of any “stale” reads. https://news.ycombinator.com/item?id=36333034


Well I knew there were possible issues both from the compiler and the cpu. It seems you are right that the cache is kept coherent, however there is another issue owing to out-of-order execution of cpu instructions. Either way, gpderreta is probably right that thread.join tells the compiler to make sure it's all taken care of correctly.


No. All synchronization edges are implied by the thread creation and join. Same as for the async/await example.


There is an implicit mutex/barrier/synchronization in the join.


You definitely need a mutex here (or use atomics), otherwise you have a race condition


Where exactly? Can you point me to the data race? Consider that the thread constructor call happens-before the thread start and the thread termination happens-before the join call returns.


Ah sorry, I missed that you only spawn a single thread. Mea culpa!


> process forking (depending on runtime) to get actual parallel programming

If you get into a time machine back to the 1980s, then yes.


Web Workers are not parallel - they are only concurrent to the main context: think OS threads. Async JS is akin to using very lightweight simulated threads.

You will not necessarily utilize more CPU cores by spawning additional Web Workers because they are not inherenent parallel. The actual performance of Web Workers depend on how your browser and OS schedules threads.

They are OS threads despite the mountain of misinformation on the Internet about them implying that they are truly parallel. They are not.


I thought the primary purpose of web workers was that the browser can run the workers in parallel to the main thread. As the spec says:

> [Web workers] allow long tasks to be executed without yielding to keep the page responsive


The workers don't block painting and they do not run in a separate process. That's why it's concurrent but not parallel. The web worker does work whenever the main thread is not painting and there is a free time slot. The browser is not painting all the time.

You don't get extra calculation performance with web workers. You just create the illusion of a smooth experience because you don't block painting. It does not complete faster.


Threads can certainly run in parallel with one another if the OS schedules them on different cores. I did a quick experiment and the main thread and worker threads run in parallel.

https://github.com/jschaf/web-worker-experiment/

> You don't get extra calculation performance with web workers

The primary purpose of web workers is extra calculation performance. From MDN:

> Workers are mainly useful for allowing your code to perform processor-intensive calculations without blocking the user interface thread


I should clarify that you can't get extra calculation performance which easily scales with core count due to the gotchas around threading that you mentioned.


Asynchronous programming is a great fit for IO-driven programs, because modern IO is inherently asynchronous. This is clearly true for networking, but even for disk IO, generally commands are sent to the disks and results come back later. Another thing that’s asynchronous is user input, and that’s why JS has it.

As for threading vs. explicit yielding (e.g. coroutines), I’d say it’s a matter of taste. I generally prefer to see where code is going to yield. Something like gevent can make control flow confusing, since it’s unclear what will yield, and you need to implement explicit yielding for CPU-bound tasks anyway. Its green threads are based on greenlet, which are cooperative coroutines.

Cooperative multitasking was a big problem in operating systems, where you can’t tell whether other processes are looking for CPU time or not. But within your own code, you can control it however you want!


> Asynchronous programming is a great fit for IO-driven programs

Yeah, but this could already be solved without "async/await compiler magic" in native code just with OS primitives, for instance with Windows-style event objects, it might look like this in pseudo-code:

    const event1 = read_async(...);
    const event2 = read_async(...);
    const event3 = read_async(...);
    wait_all(event1, event2, event3);
This would run three IO operations "in parallel", and you're waiting for all three to finish until execution continues after the wait_all() call.

Looks just as convenient as async/await style, but doesn't need special language features or a tricky code-transformation pass in the compiler which turns sequential code into a switch-case state machine (and most importantly, it doesn't have the 'function-color problem').

(this really makes me wonder why Rust has gone down the Javascript-style async/await route with function coloring - the only reason why it remotely makes sense is that it also works in WASM).


> this really makes me wonder why Rust has gone down the Javascript-style async/await route with function coloring - the only reason why it remotely makes sense is that it also works in WASM

As someone who’s done asynchronous programming in Rust before Futures (I’ll call it C style), then with Futures, then with async/await, it’s because it is far simpler. On top of that it allows for an ecosystem of libraries to be built up around common implementations for problems. Without it, what you end up with is a lot of people solving common state machine problems in ways that have global context or other things going on which make the library unportable and not able to easily be reused in other contexts. With async/await, we actually have multiple runtimes in the wild, and common implementations that work across those different runtimes without any changes needed. So while I’m disappointed that we ended up with function coloring, I have to say that it’s far simpler than anything else I’ve worked with while maintaining zero overhead costs allowing it to be used in nostd contexts like Operating Systems and embedded software.


But the difference is that wait_all() is blocking the thread, right? Or does it keep running the event loop while it's waiting, so callbacks for other events can be processed on the same thread?

If it does the latter, the stack will keep growing with each nested wait call:

main -> runEventLoop -> someCallback -> wait_all -> runEventLoop -> anotherCallback -> wait_all -> ...

The async/await transformation to a state machine avoids this problem.


Yeah it blocks the thread, any other "user work" needs to happen on a different thread. But if you just need multiple non-blocking IO operations run in parallel it's as simple as it gets.

(the operating system's thread scheduler is basically the equivalent to the JS "event loop").


Desktop operating systems all have application event loops that run within a single thread because the OS thread scheduler is not the same thing. If you just want an event loop, trying to use threads instead for everything will often end up in tears due to concurrent data access issues.


An async/await runtime doesn't necessarily need to run everything on the same thread though (that's really just a Javascript runtime restriction), instead it could use a thread pool. In this case, the same concurrent data issues would apply.


Yes, basically golang model


Use one or more service threads to do most work off the UI thread.


Yes, sure. Operating systems nowadays provide useful thread pool runtimes for this purpose, like Apple’s GCD.

In no way does it mean that you don’t need an event loop because threads exist, as was the contention here.


1. You don't need either thread pools or GCD for this. GCD generally makes things worse.

2. It absolutely does mean you don't need the main thread event loop for non UI-events.


Right, so it's less efficient than async.

Async would let you yield at the gather.


The wait_all() "yields" to the operating system's thread scheduler, which switches to another thread that's ready to run (or if fibers are used instead of threads, a 'fiber scheduler' would switch to a different fiber within the same thread).

Taking into account that an async/await runtime also needs to switch to a different "context", the performance difference really shouldn't be all that big, especially if the task scheduler uses fibers (YMMV of course).


That's not how an operating system models disk access though. You synchronously write to the kernel cache, and the kernel eventually gets those written to disk.

Wanting to do asynchronous I/O to disk is only useful if you're aiming to bypass the cache. In practice it is very hard to reach higher performance when doing that though.


I was referring to the fact that interaction with the disk itself is asynchronous. Indeed, the interface provided by a kernel for files is synchronous, and for most cases, that's what programmers probably want.

But I also think the interest in things like io_uring in Linux reflect that people are open to asynchronous file IO, since the kernel is doing asynchronous work internally. To be honest, I don't know much about io_uring though - I haven't used it for anything serious.

There's no perfect choice (as always) -- After all, for extremely high-performance scenarios, people avoid the async nature of IO entirely, and dedicate a thread to busy-looping and polling for readiness. That's what DPDK does for networking. And I think for io_uring and other Linux disk interfaces have options to use polling internally.


Networking and disks are inherently entirely different.

Pretending they're the same under some generic I/O concept only goes so far.


To me, async is just "cooperative multitasking" with a quick paintjob

It is, and not only to you. It is a way to save a call stack until a runloop calls it back.

But what I can’t agree with is parallels with OS. Coop MT is only problematic in OS MT. When it’s your code there’s no unknown bad actor, and having multiple cooperative (mostly waiting) processes without scheduling them on a thread pool is a useful concept regardless of threads availability.

E.g. when you have to wait on multiple sources, the options you have are:

- serialize

- perform few non-blocking calls and wait for any/all of them to complete

- schedule them as tasks on a thread pool and wait for their completion

Async can do all three, it’s orthogonal. I’d say that just awaiting on PMT task completion is much more convenient that setting up locking primitives. Same for NB polling. Promise is just an abstraction and all it does is waiting for an event to fire on a current thread’s runloop while retaining the comfort of a lexical scope, all with a couple of keywords.


But I'm my own worst enemy, and blocking the event loop is unpleasant even when I do it myself. Once I had to pass compute-intensive tasks (hashing some data, which took long enough to matter) to a thread pool, to not hurt latency for other tasks in the event loop.

> I’d say that just awaiting on PMT task completion is much more convenient that setting up locking primitives.

Don't use the primitives then - write e.g. a parbegin/parallel-map atop thread primitives, and use that.


Async is great for dealing with I/O but it forgets that the CPU is a resource too.


Which is why you queue up CPU intensive tasks to threads where you can time slice, and IO intensive tasks to async/fibers whatever you wanna call them. It's hard to make this completely seamless. Power to Java for getting the closest.


Async is, in many situations, better than traditional threads.

Threads are a resource hog. They take a lot of system resources, and so you usually want to have as few of them as possible. This is a problem for applications that could, in theory, support thousands of concurrent connections, if not more. With a basic thread-based model, you need 1 thread per connection, and if you have long-lived connections with infrequent traffic, those threads mostly do nothing but consume precious system resources. When you're waiting for data, the thread is blocked and does nothing. With async/await, you can have far fewer threads, maybe even just one, and handle blocking through a system call that wakes a thread up whenever any one of the currently blocked tasks is ready to progress. In languages with much lighter thread alternatives, such as Go's goroutines or Erlang's Beam processes, this problem basically doesn't exist, and so those languages don't need async/await at all.


> Threads are a resource hog.

Not really on any decent operating system, but if they are too heavy, there's still fibers aka green-threads aka stack-switching (which at least on Windows are an operating system primitive - but can be implemented in user code on any system that gives you direct access to the CPU stack and registers).

I doubt that the async-await state machine code transformation which 'slices' sequential function bodies into many small parts which are then jumped in and out frequently is any better for performance than stack-switching (in async/await you still need to switch a 'context pointer' on slice-entry/exit instead of the stack pointer).

One obvious advantage of the state-machine code transformation is that it also works in very limited single-threaded runtime environments without access to the callstack (like WASM).

In any case, from the user perspective, async/await should just be language syntax sugar, how it is implemented under the hood ideally shouldn't matter (e.g. it should also be possible to implement it on top of a task scheduler that runs on fibers or threads instead of a state-machine code transformation).


The async/await model gives you exactly one guarantee: because the yield continuation is second class, at most one stack frame can be suspended, so the the amount of space that needs to be reserved for a task is bounded and potentially can be computed statically. This can be important for very high performance/very high concurrency programs, so I think the upsides can be more than the downsides in something like rust, C++ [1], and possibly C#. I still do not understand why async was deemed appropriate, for example, in python.

As an aside, there is a lot of confusion in this thread between general async operations and async/await.

[1] but of course C++ screwed it up by requiring hard to remove allocations.


> I still do not understand why async was deemed appropriate, for example, in python.

My best guess is that it's because of implementation limitations in CPython and likely other interpreters. StacklessPython is a fork of CPython with real coroutines/fibers/green threads but apparently they didn't want to merge that patch. Very disappointing, because async/await is a nearly useless substitute for my desired usecase (embedded scripting languages with pauseable scripts).


There is also gevent which is a library only coroutine extension which didn't require any changes to the interpreter itself. I'm also sure it would be easier to maintain and evolve if it was part of python core.


> I still do not understand why async was deemed appropriate, for example, in python.

Because queues backed by thread/process pools for serving web requests has sharp edges.


Userspace fibers (no clue about Windows fibers) still have the blocking IO problem. If your fiber calls read() but there's no data and read blocks for a few minutes, until the next message is received, no other fibers can be scheduled on that thread in the meantime. With async, the task just gets suspended, something like epoll gets called with info about all the suspended tasks, and the thread unblocks once any task can move forward, not necessarily the one that requested the read. This problem doesn't exist if your pseudo threads have first-class language and runtime support, see goroutines for example.


If the blocking function would be fiber-aware, and yield execution back to the fiber runtime until the (underlying) async operation has completed, it would "just work". One could most likely write their own wrapper functions which use the Windows "overlapping IO" functions (those just have a completion callback if I remember right - PS or maybe completion Event?)

Not possible with the C stdlib IO functions though (that's why it would be nice to have optional async IO functions with completion callback in the C stdlib)

PS: just calling a blocking read in async/await code would have the same effect though, you need an "async/await aware" version of read()


if your async task performs a raw read it also will block. In the coroutine case you of course need to call a read wrapper that allows for user mode scheduling. That can literally be the same function you use for async. Coroutines also allow library interposition tricks that transparently swap a blocking read with one that returns control to the event loop, so in principle existing blocking code need not change. Libpth did something like that for example. YMMV.


> Threads are a resource hog.

I agree. They are if we spawn OS threads everytime we need a thread. The equivalent in async would be spawning the entire overhead of the event loop every time we need concurrency.

Obviously, we don't do that.

WorkerPools don't need respawning. Greenlets don't need respawning. Virtual Threads handled by the runtime don't need respawning.


Isn't that problem generally easily solved with a thread pool ? (that's what nginx does I believe)


There are use cases where a thread pool doesn't solve your problem. If you're handling a few short-lived connections at a time, it's more than enough, but if you're developing something like a push / messaging / queuing service, with thousands of clients connected for hours at a time and receiving very little data once every few minutes, a thread pool won't help you.


This is a solved problem.

I can run millions of goroutines on a laptop. These get mapped to a relatively small number (number of available CPU cores with default settings) by the runtime.


> a thread pool won't help you.

What do you see as the main limitations of spawning 2048 threads in a pool in this scenario?


2048 threads would be fine, but they are talking about 10s of thousands of clients.


> if you have long-lived connections with infrequent traffic

This is an interesting case. Is it difficult to recover state in the case of an error in such a connection? If not, then you could just use that ability. If so, that seems fragile.

Also, this doesn't sound like an inherent limitation of the design approach. Couldn't the linux kernel just improve the performance of that case?

> those threads mostly do nothing but consume precious system resources.

You mean a small amount of virtual memory?


One advantage of async/await is that its easier to cancel things. For example, this leads to the design pattern where you have multiple futures and you want to select the one that finishes first and cancel the rest.

In regular threaded programming, cancellation is a bit more painful as you need to have some type of cancellation token used each time the thread waits for something. This a) is more verbose and b) can lead to bugs where you forget to implement the cancellation logic.


>In regular threaded programming, cancellation is a bit more painful

No, it isn't. Nothing is stopping your threading library from implementing the same thing. It just turns out it is a bad idea to kill threads at random points in time because they may own things like locks. Or in the case of async await doing something that is thought to be atomic.


You're describing exactly why it's painful for threads. If you only cancel co-routines at yield points (which unless you do dark magic is the only time you can cancel them) is always safe.

A co-routine that yields in the middle of an atomic operation is an oxymoron. Anything can happen before you're scheduled again.


Well each time you use the await keyword you are saying its a safe point to exit, which is more predictable than killing at random points. Holding locks across await points is an anti-pattern, and Rust at least can give a hint if you try to do that. Async/await implementations will also generally allow you to run cleanup code on cancellation (but the exact mechanism depends on the language).

In the end, its about expressing a state machine in a more concise implicit way, which is a suitable level of abstraction for most use cases.


> One advantage of async/await is that its easier to cancel things. For example, this leads to the design pattern where you have multiple futures and you want to select the one that finishes first and cancel the rest.

> In regular threaded programming, cancellation is a bit more painful as you need to have some type of cancellation token used each time the thread waits for something. This a) is more verbose and b) can lead to bugs where you forget to implement the cancellation logic.

Yeah of course Rust just makes cancellation so easy by allowing the Futures to be dropped. What about the resources these Futures could have allocated within the context that are not just memory? You are saying it as if async/await somehow solved the whole problem of stack unwinding.


Since rust follows RAII, any and all resources allocated in the context should be deallocated when their destructor (the drop trait) is called. The unfortunate exception to this are resources which require an async call to deallocate properly. Though this can be worked around and there is work being done to fix this properly.


> Since rust follows RAII, any and all resources allocated in the context should be deallocated when their destructor (the drop trait) is called. The unfortunate exception to this are resources which require an async call to deallocate properly. Though this can be worked around and there is work being done to fix this properly.

Yeah as I said async does not, in fact, "provide easily cancellable execution patterns".


only if the resources are just used within this task. If an async function at some point in time generates another task (or even spawns a thread!) that can not be synchronously cancelled then it might outlive the destructor and thereby the async task. It's therefore nowhere near guaranteed that every Future can just be dropped to stop an action in a side-effect free fashion.


I think the comment was about async in general, not just Rust (although that's the topic of OP).

In Python, cancellation causes an exception to be injected at the await site, which allows it to clean up whatever resources it likes (even if that means making other async calls). If you use Trio or the new TaskGroup in asyncio (inspired by Trio) then an exception leaking out of one task causes the others to be cancelled, and then the task group waits for all tasks to complete (successfully, with exception, or cancelled). It's extremely nice and easy to write reliable programs.

In principle, I think many of these ideas could be applied to threaded IO. But I haven't seen it done in practice.


> In principle, I think many of these ideas could be applied to threaded IO.

That's how POSIX deferred [1] cancellation works. An uncatchable exception is thrown from blocking calls if a thread is requested to terminate. As POSIX is C centered, you can imagine that handling exceptions was never popular, but it should work fine in C++. For some reasons it wasn't added to std::thread though.

[1] there is also async cancellation, but friends do not let friends use PTHREAD_CANCEL_ASYNCHRONOUS.


> In Python, cancellation causes an exception to be injected at the await site, which allows it to clean up whatever resources it likes (even if that means making other async calls). If you use Trio or the new TaskGroup in asyncio (inspired by Trio) then an exception leaking out of one task causes the others to be cancelled, and then the task group waits for all tasks to complete (successfully, with exception, or cancelled). It's extremely nice and easy to write reliable programs.

But not in JavaScript.


Fundamentally, async/await and threads are different tools. Async/await is "in vogue" at the moment, but there are still real advantages in certain scenarios.

For example, the blog authors project is an OS running on minimal resources that would not be appropriate for any threading model I'm aware of.

  This is a wee operating system written to support the async style of programming in Rust on microcontrollers. It fits in about 2 kiB of Flash and uses about 20 bytes of RAM (before your tasks are added). In that space, you get a full async runtime with multiple tasks, support for complex concurrency via join and select, and a lot of convenient but simple APIs.
Rust actually used to have green threads before 1.0. You can read about the proposal and reasoning for it's removal here https://github.com/rust-lang/rfcs/blob/master/text/0230-remo....

If you'd like more info on the story around the adoption of async/await in rust you can see this excellent talk by Steve K. https://www.infoq.com/presentations/rust-2019/.


> it started because JS can't do parallel any other way

I remember doing async in C++ with CORBA and ACE's Reactor pattern about 25 years ago, it was not beautiful nor easy.

But if memory serves, most interpreted server side languages used for web programming in the late 2000's didn't have mature multithreading or async capabilities and most of the deployments consisted on exec/forked full application servers. I would also bet that this is exactly what made Node.js popular.

To each its own, async is just another tool in the proverbial belt.


async/await allows to do concurrency without the need for explicit synchronization to shared data structures.

E.g. I can do:

    loop {
      select! {
         _ = src_channel.readable() => src_channel.read(&mut buffer),
         _ = dst_channel.writable() => dst_channel.write(&mut buffer),
      }
    }
without any mutex guarding the buffer, even though the reads and writes happen concurrently and share the same mutable buffer. This is possible because with async/await the concurrency is cooperative, the code precisely controls where context switches can happen (in this case this is the select! waiting for event), and the compiler can see that even though the code as a whole is concurrent, the branches of select do not run at the same time in parallel.

This is not possible to achieve with threads directly. If using blocking I/O + threads model, then you'd need to dedicate one thread for reading and one for writing and then synchronize access to the shared data structure (where using a queue/channel also counts as synchronization). Which obviously would be much harder to get right.


> Which obviously would be much harder to get right.

Not obvious to me I'm afraid. Using CSP, this is almost trivially easy. All access to the data goes through a guardian thread. Accessing the resource is just sending a message.

And mutexes are not hard either.


+1. This approach of having guardian threads/actors etc communicating by messages is just a natural for Rust, too, because it nicely deals with a whole pile of borrow checker & lifecycle issues, too.

crossbeam_channel FTW


This is great until someone calls you through with a multi-threaded task runner and you've just gone and added consistency and race conditions to your code.


But this approach already works with multithreaded runner and there are no data races or consistency problems. They would be caught by rust borrow checker anyways.


SCHED_FIFO


The biggest benefit of async/await is imo. in GUI programming where you simply can't have blocking code a lot of the time, and moving everything between the GUI thread and the backend thread can be rather costly and annoying as well as buggy if people are not 100% aware which thread accesses what.

I find it rather odd that you say "easier to reason about.". I find it much harder to keep a mental model in my head which thread currently does what compared to async/await code which you can write like synchronous code. You generally don't have to be that hyper-aware.


> in GUI programming where you simply can't have blocking code a lot of the time,

This can be transparently solved by the GUI library. The main thread does a loop and polls events from a queue. Those events are generated by a gui on another thread. It can be designed so gui itself can be manipulated on the mainpulated on the main thread, and the event handling and rendering is double buffered, or synchronized for you.

> easier to reason about

The style of code OP is describing looks like `if then else`. You can reason about the state of the system using traditional programming logic.


What GUI system are you referring to? As far as I know, pretty much all major UI frameworks use a single thread, or at best a gui thread and a render thread.


I believe SDL works how I described. But yes, I was trying to describe a gui/render thread split.


Pulling out the render thread just helps offloads the GPU calls, no? The GUI thread still has the usual single thread concurrency issues most programmers deal with.


There's a role for async, and it's when you're very I/O bound, you have one thread, and async means you don't need locking. This is simple to think about. That's the classic JavaScript model.

If you have compute-bound components, things get more complicated. If you have threads, locks, and async, all in one program, things get much more complicated. I'm not sure that's a win. At some point, it's easier to use something like Go's green-thread "goroutines", which try not to block, but can block if they have to.


> when you're very I/O bound, you have one thread

Yes, but this is very unusual. In a web server, you have pool of threads that can respond to incoming connections. When one is blocking, another is ready to go. All the transition are handled transparently by the kernel.


In node.js, each process used to be single thread. There's now a hokey threading model with limited shared memory areas.[1] Annoyingly, this is also Android's threading model.

[1] https://nodejs.org/api/worker_threads.html


The tradition of async programming goes back much further back than JS. Doing async I/O — usually referred to as event-driven programming — has been a popular technique in C and C++ for decades, with epoll(), kqueue, libevent/libev/libuv, Boost Asio, ASE, and so on. A lot of modern C software is built on async I/O, notably projects like Nginx, Memcached, Tor, Chrome, ntp, Redis, etc.


> has been a popular technique in C and C++ for decades, with epoll(), kqueue, libevent/libev/libuv, Boost Asio, ASE, and so on.

JavaScript is older than all of those. (Libuv in particular was harvested from Node, which was itself built on top of JavaScript.)

Obviously JS didn't invent callback-based async IO, but I think you're forgetting how old JS is and how relatively new that style of IO is.


This thread is about async/await, which was added to JavaScript in 2017. Before then, async programming was only done with Node.js (unless you consider windows.setTimeout() to be "event-driven programming"), which came out in 2009. Event-driven programming was an established paradigm years before then.


> This thread is about async/await, which was added to JavaScript in 2017.

The article is about async/await, but the comments I'm replying to seem to be about asynchronous programming in general. JS was doing async for many years before async/await was added.


Nobody was doing async JS before Node, which came long after async I/O was an established paradigm.


XMLHttpRequest shipped in 1999. AJAX was coined in 2005. The "A" in "AJAX" is for "asynchronous".


XMLHttpRequest may be async, but it's not an asynchronous programming model.


The alternative with threads on an IO-bound server eventually cycles back to async/await but with extra steps. You write synchronous request handlers until you notice IO-waits wasting all your thread time, add more threads to the pool, start hitting overhead from that, then implement greenthreads.

NodeJS did it right, and it's hard to call a 14-year-old technology a fad.


> add more threads to the pool, start hitting overhead from that

I would like to know more detail about this claim.

The scenario you are describing is one were 64-128 OS threads are fully blocked waiting for IO. If that's the case, is it likely that you will have additional unused IO resources that could be being utilized?

Also, what overhead do you see as the main limit on spawning a lot of threads? Is that the CPU time of context switching? If so, in this scenario CPU is not the bottleneck, and switching between processes will be nothing, especially with a 32-64 core CPU.

This is a genuine question as I have never worked on an application that got close to maxing out either approach.


> The scenario you are describing is one were 64-128 OS threads are fully blocked waiting for IO. If that's the case, is it likely that you will have additional unused IO resources that could be being utilized?

One likely scenario is that you've issued 128 RPCs to some other services and are waiting to hear back. Even if each RPC is, say, on a separate TCP connection, your network stack can handle plenty more.

> Also, what overhead do you see as the main limit on spawning a lot of threads? Is that the CPU time of context switching? If so, in this scenario CPU is not the bottleneck, and switching between processes will be nothing, especially with a 32-64 core CPU.

I don't remember what specific feature of OS threads contributes the most to overhead, and maybe someone else can answer this better. But context-switching burdens both the CPU and RAM (due to saved stacks).

> This is a genuine question

I always assume this anyway. Maybe not on Reddit ;)


Thanks for the reply. I am still having a hard time seeing why "turning up the number of threads" doesn't solve this. Maybe for languages with JIT runtimes where each process occupies a larger piece of memory, that could be a problem. But then I see virtual memory coming in, because as you say, most of those processes are doing nothing.

I think I'm going to do some research and see what benchmarks/measurements I can find.


Can't explain exactly why, but the overhead of one OS thread is much greater than the overhead of one routine in the event loop, and it's worth researching if you're interested. It's also worth looking into how kernel IO resources would deal with 3000 threads making calls all at once; like, the network stack has a queue. A while ago I ran a test of how many min-size UDP packets I could send per second with a multithreaded process.

Also can say at least that paging to/from disk or using memory compression would dominate all other overheads, and it's not something to rely on here.


Maybe I’m wrong but the “asynchronous programming started with JavaScript” take doesn’t seem factual.


Correct, I believe it originated with Microsoft via .Net in the mid-2000s and was picked up by the JS ecosystem much later. The fact that Microsoft had a hand in async/await’s emergence may influence how people feel about it.


> The fact that Microsoft had a hand in async/await’s emergence may influence how people feel about it.

I can see this.


This whole thread reads like backlash to the async hype, not so much the merits of the paradigm itself.


Yep


It began in C# in 2012 - 5 years before JavaScript. And C# does threads. And made its way to JavaScript via Typescript (whose creator also created C#).


It began in 1970's with programming languages like Concurrent Pascal.

https://en.wikipedia.org/wiki/Concurrent_Pascal

Simula,

https://en.wikipedia.org/wiki/Simula

And plenty of other ones,

https://en.wikipedia.org/wiki/Coroutine#Implementations


Yes, I remember when my head was blown when I finally understood what this new async/await thing is doing in C#. It was nice to observe other language adopt it with basically the same syntax (Javascript, Python, Hack, then Rust with slightly different syntax). I had some satisfaction that my language got it first

C# really helped to popularized asyc/await as language feature, even though F# had something similar first.


Maybe I'm misunderstanding what you mean by "It began in C#", but F# introduced async about five years before C# 5 was released.


async in F# is not a language feature, it’s a library that leverages F# computation expressions (monads).

It’s also possible to do async-like behaviour - without the async/await language feature - in C# using LINQ; so you could argue C# has had the capability (like F#) since LINQ was released.

But, I believe C# was the first mainstream language to implement the async/await method-splitting coroutines state-machine (as a language feature)


Async is a special case of continuations. If you have fist class continuations (and monads do notation in practice gives you that), you hardly need async as a language feature.


That's exactly my point. It's not a language feature, it's a library. Haskell, F#, and any other language that supports monads (or as you say, first class continuations), have the ability to do async/await - in a way that appears first-class - but actually is just regular code.

C#, and other languages that have taken the C# approach [to async/await], don't have first class continuations (well, C# does with LINQ, but that compromises most ways the average OO dev works). They implement async/await with first-class keywords that indicate where to slice a method in two.

In my language-ext [1] project I have added the LINQ operators to `Task<T>` which allows C# tasks to be used in the same way that Async is done in F#.

[1] https://github.com/louthy/language-ext/blob/main/LanguageExt...


Do-notation is a language feature though, and that's a superset of async/await.

Incidentally there are papers that are trying to improve on async in C++ by trying to sneak in generalized do-notation.


I know it influenced the async await design, but I didn't think it was quite the same. Will go do some reading!


could be wrong about c#, but c# and f# are comes from same parent


I think that's spot on.

More mature model of that is message passing like in Erlang or in a bit more bastardized version, in Golang. And it works there because you can write "normal" code with no colored functions and other baggage that JS-like async brings with it. And it works beautifully on multi-core machines.

async is just strict, shitty subset of that where you're limited either by bad implementation (any single-process scripting language) or language limitations (no GC in Rust would make Erlang/Go-like message passing much harder).


I feel like there are 2 mostly independent angles to this:

Some believe that async makes the code easier to reason with, especially in cases where num_threads << num_sessions. This is a very subjective engineering conversation. You could certainly make strict threads work in any situation with some DIY, but I would be near the front of the line to make an argument to at least start with async/await if there is a paying customer involved. I think handling I/O becomes a joy when you have these abstractions at your disposal.

The other angle is performance. If you are reaching for async programming out of the gate because you want to go fast, you are making an epic mistake. Unnecessary context switching between threads will chop multiple orders of magnitude off a single thread best case. Unless you are 900% sure that the cost of communicating between threads is worth the squeeze, you should stick with a single-threaded paradigm (or use async/await responsibly).

Now, you may decide that losing performance in order to leverage more "sugary" programming primitives is worthwhile. We certainly make that decision many times over throughout - Interpreted languages, GC, etc. While I could implement our webapp using a socket select server and still easily meet our performance objectives, the complexity of managing this is not worth it. Async/await would still kick my ass in terms of performance because the runtime has been so carefully tuned around it. I can beat it in latency terms, but only for a trivial # of clients.


async/await is about superimposing useful CPU computations with slow I/O operations.

For example, you would read data fragment D0 from a network socket, and right before doing any work on it, you ask the OS to fetch data fragment D1, etc. This would is faster than reading and working in distinct time intervals. And despite whatever you seem to believe, it would also be faster than reading and working using thread parallelism. Because even if you eliminate the synchronization overhead or you devise a good lock-free algorithm, thread context switches still have a massive overhead, not to mention issues related to memory bandwidth and cache coherence.

Still, how does one gather the nerve to call a feature present in most modern languages, from C# to Zig, a hype?


I should add that the concurrent algorithm I described for processing data from a socket, i.e asynchronously read data fragment (i + 1) then do work on data fragment i, would only be optimal if the throughout of the work you do on the data is higher than reading throughout.

Even if the above condition doesn't hold, you would have to be very careful to do better with threads.

Is it just an arbitrary design decision that NodeJS is single-threaded?


Really weird to throw in JS as the culprit, this is a long standing issue, the asynchronous nature of web work probably highlights it but dealing with asynchronous tasks is part and parcel of writing complex performant applications. Much may be hidden by modern dev environments, but you won't get far beyond the most simple apps before you need to start thinking about how to deal with it.


The point is that other solutions were able to use traditional threads or green threads (via stack switching) to solve the same problem without the compiler having to transform sequential code into a state machine under the hood (since pre-emptive threads can be suspended and continued at any time, and green threads at specific 'yield-points').

Javascript is limited by the single-threaded browser runtime, and the way that WebWorkers were bolted on later didn't help much because WebWorkers with message passing are too inflexible to implement even green-thread-style task switching.

The state-machine code-transformation was indeed the only way out of this dilemma.

Async/await is essentially high-level language syntax sugar and it shouldn't matter whether it is implemented via code-transformation, or with green-threads or real threads, or a combination of all those under the hood (but in reality it matters because there's a difference between 'code slices' being scheduled on the same thread or different threads when dealing with shared resources).


Especially weird because the feature actually originates from C# and was only copied by JS later.

And in C#, you can absolutely combine it with Task.Run if you want it to run be concurrent. This feature made asynchronous operations so much better than all the patterns that came before it - and I think they goes for every language that ended up copying it.


Your rant reminded me of this classic post, which I believe shares your views but from different reasoning. From the discussion, you may be interested in looking at zig[0].

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

https://news.ycombinator.com/item?id=36597229 (fresh repost)

Discussed previously:

  https://news.ycombinator.com/item?id=8984648 (8ya)
  https://news.ycombinator.com/item?id=16732948 (5ya)
  https://news.ycombinator.com/item?id=23218782 (3ya)
  https://news.ycombinator.com/item?id=28657358 (2ya)

[0]: https://kristoff.it/blog/zig-colorblind-async-await/


[Somewhat offtopic] Speaking of Zig, I understand concurrent programming is undergoing a rethink of some sort as async has been temporarily removed? I am only following cutting-edge Zig peripherally so I am probably wrong here. Does anyone know what Zig’s future concurrency story is?


Stackless coroutines are pretty useful to model state machines. I particularly like generators (yield) to model lazy evaluation and iterators.

I do agree however that async as a concurrency approach sucks.


In C, I have written and worked on a lot of code that is event-based. Async is a natural extension to that.

So, no it is not a hype. These types of techniques have been used for a very long time.


Allow me to clarify:

Event Loops are an old technique and not a hype. I am aware of that. One of my earliest C experiments as a kid was writing a snake-implementation for the terminal. I still have that code, and it uses an event loop to process the input.

The hype I am talking about, is an assertion currently "en vogue", that asynchronous should be the default method of doing concurrency.

This hype, imho, started with the ubiquitous use of JS as a backend implementation language. JS couldn't work another way, so this is what all these JS programmers used, and JS is super popular. So, it must be cool and great. And that's how a workaround for a language limitation became a popular paradigm.


> Code written using threads is, at least to me, much more readable and easier to reason about.

It's “easier” because it lies to you and makes you assume that everything is sequential, but it's not, and sometime that “everything is sequential” abstraction is leaky, and you can't really see what's going on without diving to the bottom of every functions.

I've been accustomed so much to the transparency of async/await, that I now whish we had the same kind of thing for functions using blocking syscalls (for instance you could annotate the function with the `blocking` keyword and need to use `block` to call it) so you known you need to spawn a new thread if you don't want to wait until the completion of some I/O-bound function.


> “everything is sequential” abstraction is leaky,

Can you explain what details leak?

The sequential model was developed for programming because that's a natural way to reason about proccesses. `if then else`. `do this, then do that. The "async/await" designers seem to agree, as they attempt to tame async by emulating this behavior.

Note that to do anything other than sequential is extremely complicated to reason about, not because of computers, but because of logic/math. All sorts of concerns like: race conditions, synchronization, dead lock, etc are inherent.

Any approach that does not directly address these issues is the one that's creating a leaky abstraction.

> so you known you need to spawn a new thread if you don't want to wait until the completion.

All functions take "blocking time" to execute. It's a spectrum of how long you want to wait.


> Can you explain what details leak?

You answer half of it a few lines later:

> All sorts of concerns like: race conditions, synchronization, dead lock, etc are inherent. > Any approach that does not directly address these issues is the one that's creating a leaky abstraction.

By writing `await` you're telling your reviewers, coworkers and even your future self than your program stops executing sequentially at this step, and that other concurrent task can do things in the meantime. When using blocking code, the same thing can happen, but this is hidden from you.

But in my perspective as a back-end engineer, the biggest issue is related to latency: with annotations you know (and tell others: code is written once but read many time) what takes significant time, with threads and hidden yield point you don't. It looks sequential, but the latency is an observable behiavor that show it's not: the definition of a leaky abstraction.

> All functions take "blocking time" to execute. It's a spectrum of how long you want to wait

That's technically correct, but keep in mind that the magnitude difference between your typical REST API call and a CPU instruction is roughly the same as the difference between the size of a football field and the distance to the Sun…


> When using blocking code, the same thing can happen, but this is hidden from you.

I'm not convinced. I don't see how it is hidden from you. The blocking code works exactly as written. The mistaken assumption would be that your program will never be premempted or have to wait for resources.

> the magnitude difference

Rarely is a function call in C a single instruction. They are typically algorithms of non-constant complexity. So yes... but you're also picking the most extreme comparison. What about `fopen` vs `partial_sort`?


> It's “easier” because it lies to you

No, it really doesn't. If I make the wrong assumption that "everything is sequential", then it's not the paradigm lying to me.

Pretty much the first thing I learned about threads: I have to assume that my code can, and will, be interrupted at arbitrary points.

As long as I keep that in mind, there is very little that can surprise me. Because the other side of that coin reads: "Unless it branches into more threads of execution, each block itself will run sequentially, no matter what", which makes it very easy to reason about each block.

The rest is a matter of synchronizing these interruptions to a useful outcome, which, as stated elsewhere, CSP and modern languages integrating the primitives for that natively, make really easy.


I strongly disagree. Async/Await is one of the nicest, cleanest ways to deal with asynchronous tasks, and asynchronous tasks are everywhere. Loading data from disk without blocking and doing something once this is done -> async. Memcpying data from CPU to GPU without blocking -> async. Memcpying data from GPU back to CPU without blocking -> async.

Sure, there are other ways to handle these things like polling state, but async makes it trivial and readable.

But this mostly applies to JS which has a kickass implementation of async/await. Whatever C++ tried to do, it's an awful mess so there I still use threads with busy loops, polling, etc., whatever makes sense for the task at hand.


> But this mostly applies to JS which has a kickass implementation of async/await.

The fact that JS simply has no other options for doing anything concurrently, might have something to do with that.


Huh? It had other options longer than it had async/await. Async/await is a fairly recent addition. E.g. before fetch with async/await, there was XmlHTTPRequest with callbacks. It also had Web Workers as a means for parallel&concurrent processing for way longer than it had async/await.


This is an occurence of co-Blub paradox https://reasonablypolymorphic.com/blog/coblub/index.html


> And it’s not hard to see why; while humans have dedicated neural circuitry for natural language, it would be absurd to suggest there is dedicated neural circuitry for fiddling around with the semantics of pushing around arcane symbol abstractly encoded as electrical potentials over a conductive metal.

Huh? It's certainly the case that people who program have dedicated neural circuitry.

Most of the work is not being done by language centres[0]

[0] https://hub.jhu.edu/2020/12/17/brain-activity-while-reading-...


> We left behind that paradigm in Operating Systems decades ago, and with good reason.

I'm curious, what reason?

I grew up on Python and C#, and only know async/await, never done real threading (C# async is threading and coroutines under the hood, Python is just coroutines, single-threaded). I find that way of writing code very elegant, as one can encode points of blocking/switching explicitly. A bit like encoding logic into the type system (cliffle has an article on the type state pattern, a good read!): the underlying async implementation can change without code adjustments.


> I'm curious, what reason?

In days gone by, processes who got to run on the single core that contemporary CPU had available, had to actively relinquish control of the core back to the kernel.

If a single process refused to do so, e.g. because the program hang, there was nothing the kernel could do about it, and the entire OS was blocked. The scheduler never ran, no other process would get CPU time, the whole thing was dead in the water, and all you could do was kick the "Reset" button (if present) or pull the power cord and reboot.

Obviously, this is a very bad situation for an OS, which runs many processes from many sources. And because of that, we ditched this system, and went on to preemptive multitasking, where control is relinquished back to the scheduler after a time whether the process is okay with that or not.

Async basically re-invented that system in userspace. We have an event loop, and we have processes that actively yield control to it. What happens if a subroutine refuses to do so? There is nothing the event loop can do about that.

And it's really easy for this to happen. All it needs is a single synchronous call, say, to an external datastore, somewhere deep down in the callstack, and the awesome throughput of async goes bye bye.


The reason we left that paradigm in _Operating Systems_ is that OS's are supposed to be resilient. A single buggy app could easily freeze/crash the whole Windows 3.1 system, because the system has the naive assumption that all the apps are benevolent, bug-free, and happily co-operate with time-sharing. Try the same in Windows 2000; you can't, because the system is pre-emptive and forcibly ends the time slots of apps that don't yield. (And possibly kills the app; "MyApp isn't responding" etc.)

However, that same reason doesn't apply within a single app, because a single app by a single author _can_ safely co-operate with itself. So co-operative time sharing can work and make sense within single app.


> However, that same reason doesn't apply within a single app, because a single app by a single author _can_ safely co-operate with itself. So co-operative time sharing can work and make sense within single app.

That's not the case in any non-trivial app. Any real program truly has dozens if not hundreds of authors whose code you're using sight unseen, and which may be difficult to modify.

And your program getting stuck can be just as bad as the OS getting stuck. Suddenly your program doesn't reply to API requests, or doesn't relinquish some expensive resource (like an expensive VM), or some such.


For me async/await was a godsend. I've had so bad experiences with multithreading in C++, where I ended up in a Semaphore/Mutex hell in a big project, that async/await is a really welcome compromise between using idle CPU resources and just not doing multiple things at once.

Then there's `#[tokio::main(flavor = "multi_thread", worker_threads = 2)]` which actually lets you use multithreading with async/await which is just a nice feature to have, given that Python can't do this due to the GIL. But asyncio in Python is also so great.


Coming from C and having used thread and being totally ignorant to javascript, this article really helped me grasp async in js by clearing away implicit misconceptions I held :

https://medium.com/young-coder/5-misconceptions-about-asynch...

And also that talk on the event loop : https://www.youtube.com/watch?v=cCOL7MC4Pl0


Rust does not have a runtime and making the kernel take care of it is not remotely as efficient.


Do you think async/await started with JavaScript? This take is pretty revisionist.


Threads are way slower than event loop. Having 2-3 threads with their event loops is massively faster than having every task running on their own thread.


>Not specific to rust, but I think asynchronous programming in general is a hype.

Hmm, wasn't the whole point of doing things in event loop because it way out perform thread-based architecture? Like when Nginx way out performs Apache? On a single core CPU of course.

Edit: It is not that event-loop is better than thread-based, just in a web server scenario it just perform much better.


The parent comment mentions green threads. While there is some performance hit to using them, I don’t think it is way less performant. I mean, go was built for being a web backend, and is based on green threads.

For rust specifically, though, green threads/coroutines were discarded because they are not zero-cost.


Until go 1.14 it basically the same as any other async/await under the hood. Every function call included an implicit .await - that is it offered a `yield` to the runtime scheduler. All the io was built around non-blocking/polling, etc. Tight loops in go would potentially screw up your app performance because there were no yields. In 1.14 they introduced some sort of preemption for tight loops too.


Stackful vs stackless, that's the big difference and the point of the original comment. Stackful abstractions are strictly more powerful of stackless ones (go coroutines subsume async/await but not viceversa).

You can have good ergonomics and performance with stackful cooperatively scheduled tasks instead of a stackless sync/await abstraction.

Async/await makes sense when you have so many tasks that you cant afford to dedicate a full stack to each of them and segmented stacks or heap allocated frames are not an option (for performance or compatibility).


Coroutines, async, parallelism and concurrency is my main hobby.

Business logic programmers shouldn't be dealing with machine level parallelism and async unless heavily abstracted, such as in a job queue or evented message queue.

JMP or RET is how the machine transfers control flow at the machine level. So coroutines are a natural solution to switching between code at the machine layer.

If you're working in Javascript, then you shouldn't have to worry about this stuff.

Cooperative multitasking is elegant, within a process for scheduling but not as the main approach for the operating system to switch between processes, it's a subtle difference.

If the operating system depends on cooperative multitasking, some buggy processes can keep control flow to themselves. But using cooperative multitasking inside a process for code elegance, is a good way of scheduling and decoupling concerns.

I have a lightweight thread runtime similar to Go and I find event loops really interesting, I want to make the pain of async and parallelism go away.


> it started because JS can't do parallel any other way

There are two different subcultures pushing for asynchronous. You are correct, one is based on people learning to program in JS and suffering from either Stockholm syndrome or some other problem that blocks them from thinking in sequential IO.

The other grew mostly out of the 10k problem at the late 00's, and is very correct on their assessment in that asynchronous code allows for some high-performing architectures that are much better than anything you can get synchronously.

Those two groups are trying to solve different problems, with different architectures, in different parts of the software stack. Rust's async supports both of them, so the discourse is really confusing.


Implementing full-duplex protocols asynchronously is much simpler.


Upvote from me. I couldn't have written it better myself. Async is a bug not a feature. The only problem with threading (aka CSP aka goroutines aka actors) is scalability to very large numbers of threads. imho it's better to focus on solving that problem than on switching to an unworkable alternative concurrent model.


> It didn't start because it is so awesome, it started because JS can't do parallel any other way.

That's just not true. It started because starting a thread for connection isn't scalable at all. Asynchronous programming was in use way before nodejs even in languages that have proper threads.


Most languages don't have preemptively scheduled green threads like golang.

In Lua and Zig we have "cooperative multitasking" but we get to use the same library for both kinds of applications :)


Look at this program:

    val another_path = await readFile(path);
    val data = await readFile(another_path);
    console.log(data.length);
How would you do that using threads?


For this specific case, just use traditional blocking functions:

    val another_path = readFileSync(path);
    val data = readFileSync(another_path);
    console.log(data.length);
The "runtime" (for instance the OS, or a task system) will take care of scheduling other things that are ready to run while the blocking functions are "stuck". That's exactly what processes and threads had been invented for.


This program has no concurrency so you could do it without threads.


Not in Rust if you use something like Tokio.


I'm pretty sure this program has no concurrency even if you port it to Rust and use Tokio.


This particular one, yes. However, most programs are a bit more complicated than that. That and Async is mostly used for io operations. In which case, the value here is not from non-blocking operations but from freeing up the CPU while this operation finishes execution. You can think of that as another form of executing this program on a separate thread (since the CPU will be freed to do other tasks).


Doing a blocking read also frees your CPU to do other tasks!


Exactly like that, but using blocking functions. Then I run the code in a worker pool or in a greenlet runtime. The kernel and runtime take care of putting threads to sleep when they hit blocking IO.


uh, open a thread that does that and `join` on it waiting for it to complete, but why would you need that in your example?


Which, ironically, is what ought to have been done in the example using join!(a,b).await.


`epoll` was added to Linux in 2002.


Proper asynchronous programming is not something that's hype. In fact, it's something that is quickly disappearing from most I/O-bound code. Low-level asynchronous I/O (think epoll, kqueue, IOCP) was popularly implemented by many high-performance servers in the early 2000s, often as response to the C10k problems. It's really Nginx, Haproxy, Lighttpd, libevent (and later libev and libuv) which popularized this programming style in systems programming.

It's worth noting that during the 1990s, multi-threaded programming did not completely dominate as the model for network servers. I believe it was mostly due to multi-threading support was uneven across the different UNIX flavours of the day, but regardless of the cause, some popular servers (mostly notably Apache httpd) started out as multi-process based, using a pool forks the same way you'd use a thread pool. Other servers were written using the 1990s incarnation of asynchronous programming, essentially using select() or poll() (or WSAWaitForMultipleEvents on windows). From the programmer's perspective, these act mostly the same epoll, but are just less efficient.

It is during that time that the C10k problem and its asynchronous solution was experiencing its peak hype cycle that high-level languages got interested in the game, and implemented asynchronous I/O with callbacks. I believe it started with Python and Twisted, but node was the poster-child. OS-level threads were either not supported by the language (Node.js) or severely encumbered by having a GIL (Python). Green threads or coroutines would have probably been a better fit for this languages, but if you're just writing a library or a runtime for a language you don't control, that's harder (of course, gevent in Python went and manage to do that anyway).

By the time async/await came to Javascript, this wasn't part of a hype. Javascript has already widely adopted callbacks and then promises as a bottom-up, library oriented solution. Most I/O APIs were promise-based. Even if ES6 added go-like coroutines, all the APIs you had were already accepting a callback or returning a promise. You'd still had to do something like "await(myApi())" every time you're calling that API, not to mention having to introduce synchronization primitives to the language and watching code that never had to care about synchronization before break.

Async/await by itself, is not really asynchronous programming. Behind the scenes, it is implemented asynchronously (just the same as I/O in goroutines is!), but the programmer is writing code that looks linear and synchronous. The real trend nowadays is to eschew synchronous I/O and hide the complexity of asynchronous I/O behind synchronous-looking code. Explicitly asynchronous programming (like callbacks or non-awaitable promises) is just as trendy as Ruby on Rails or flip phones, that is - yeah, sure, it was fairly trendy back in the 2005.

Nowadays you've got two popular M:N thread models for running multiple synchronous tasks which perform asynchronous I/O behind the scenes: The green thread model (Go, Java's Virtual Threads) and the state machine transformation model (a.k.a. async/await). If you think the async/await model is inferior to the green thread model used by Go, that's a different story. I think each has its own pros and cons, but claiming that only async/await receives hype is untrue. The green threading model receives a fair share of its own hype ("Which color is your function"), and its usually the proponents of the green threading model who claim that their model is strictly superior while the other model has no merit at all, and not otherwise.

If you go back to Rust, Rust definitely tried the green threads model, as many people have already said. It had to abandon it. Go is not to be a full-spectrum systems language, and can get along pretty well with being garbage collected and running its own scheduler. Rust has to run on some environments and contexts where you just can't do that. Rust is also very sensitive to overhead introduced by features (that's the entire "zero-cost abstraction" theme), and it does not shy away from adding some complexity in exchange of performance. Otherwise why won't it just do away with lifetimes altogether?

While I concur the callbacks hype in JS was probably misguided (although understandable), I find it hard to believe that the decision to use async/await in Rust was based on hype.


I think this is a really important comment, and until about 6mo ago I would have completely agreed with you. I even made these same arguments with my coworkers; it's just cooperative multithreading, it's making up for a defect in js, just use threading primitives. I think some people might use async in a fad-y way when they don't need to, or don't understand what it really is and think of it as an alternative to multithreading. You've generated a lot of good discussion, but maybe having a specific example of where async/await made writing a multithreaded process easier will help.

What changed my mind was accidentally making a (shitty/incomplete) async system while implementing a program "The Right Way" using threads and synchronization primitives. The program is for controlling an amateur telescope with a lot of equipment that could change states at any moment with a complex set of responses to those changes depending on what exactly the program is trying to accomplish at the time. Oof, that was a confusing sentence. Let's try again; The telescope has equipment like a camera, mount, guide scope, and focuser that all periodically report back to the computer. The camera might say "here's an image" after an exposure is finished, the mount might say "now we're pointing at this celestial coordinate", the focuser might say "the air temperature is now X", and the guide scope might say "We've had an error in tracking". Those pieces of equipment might say those things in response to a command, or on a fixed period, or just because it feels like it.

Controlling a telescope can be described as a set of operations. Some operations are fairly small and well contained, like taking a single long exposure. Some operations are composed of other operations, like taking a sequence of long exposures. Some operations are more like watchdogs that monitor how things are going and issue corrections or modify current operations. When taking a sequence of long exposures the program would need to issue commands to the telescope depending on which of those messages it receives from the telescope or the user; If the tracking error is too high (or the user hits a "cancel" button) we might want to cancel the current exposure. If the air temperature has changed too much we might want to refocus after the currently running exposure is finished. If the telescope moves to a new celestial coordinate we probably want to cancel the exposure sequence entirely. So, how do we manage all that state?

The way I solved it was to make a set of channels to push state changes from the telescope or user. Each active operation would be split into multiple methods for each stage of that operation, and they would return an object that held the current progress and what it needed to wait on before we could move onto the next stage. That next stage would be triggered by a controlling central method that listened for all possible state changes (including user input) and dispatch to the next appropriate method for any of the operations currently running. To make things a little simpler I made a common interface for that object that let the controlling central method know what to wait on and what to call next. This allowed me the most control over how different concurrent operations were running while staying completely thread-safe. It was great, I could even listen to multiple channels at the same time when multiple operations were happening concurrently.

At this point I realized I'd accidentally made an async system. The central controlling method is the async runtime. The common interface is a Future (in rust, or Promise in js, or Task in C#). Splitting an operation into multiple methods that all return a Future is the "await" keyword. Once I accepted my async/await future, operations that were previously split across multiple methods with custom data structures to record all of the intermediate stages evaporated and became much more clear.

I'm still using multiple threads for the problems that benefit from parallel computation, but making use of the async system in rust has made implementing new operations much easier.


My opinion is the opposite, to the point I would argue that anyone advocating for multithreading for reasons other than executing things in parallel on different cores is extremely dangerous and shouldn't be allowed anywhere near a serious codebase.


Rust pretty much alleviates these dangers. At least no memory safety bugs because of multithreading. Logic bugs are still possible, of course, but the channel API and the scoped thread API in the standard library do help with those.


That is not true at all, "fearless concurrency" gives no valuable guarantees at all and is widely seen as one of the worst concurrency models in literature.


I don't dispute that, but I thought you were having C++ with threads as the starting point, in which case Rust very much gives you valuable guarantees.


Erlang/OTP has entered the chat.


Are you talking about Rust (which I don't know)? In Python the async people don't care much about correctness.


It's language-agnostic. The language with the most advanced concurrency memory model is C++.


I probably wouldn't go as far as saying "extremely dangerous" but I do agree. For the most popular use case (i.e. web services) a thread per core with an event loop in each thread is the best model.


Others have already answered to you, but maybe a bit more affirmation may help.

Concurrency and parallelism are different concepts. Many many years ago it was easy to confuse them both because there were no parallelism. You had a single big core and CPU pipelines were much simpler.

I won't delve into details, although I found them fascinating, but concurrency and parallelism are different tools. I confess I found the name "concurrency" not useful.

Concurrency allows you to transfer control from one piece of text (I mean executable code) to another while it waits for the return.

Parallel means instructions are being executed, well, in parallel.

The OS scheduler does not inspect the code, nor know that the next instruction will be a noop sleep. Some languages with runtime environments provide basically functionality pra intercepting calls and nudging the OS.

Attempts in the past of requiring each application to be clear about sharing control but it failed. One single bad application could hang and compromise the entire system. As a matter of fact, some RTOS uses this premise of development.

Async has been implemented by providing a runtime library which saves the context and swap tasks. The control is only hidden from the programmer.

I do not know about Golang, but I suppose coroutines are implemented in a different way, as it seems to me, that the compiler handles this. But I don't know.


Hiding an asynchronous, co-routine, style program flow behind a syntax which looks like a sequential program flow is IMHO a design mistake I wish Rust had not made.

It's partially hiding what's under the surface.It makes programs harder to reason about. And, worst, it "infects" whole programs pushing async further and further up the call-chain, as the "sync to async" boundary is not a good story in Rust.

Finally the fact that async in Rust is effectively "tokio", it means this framework also gets all over the place, leading to dependency bloat and framework-itis. It potentially makes things that could be simple and re-usable components much more complicated and coupled to tokio than they could be.

I would have rather seen the language provide a nice way of doing continuation passing and an explicit coroutine construct.

That's my hot take. Personally I try to avoid async and stick with explicit concurrency & communicating by channels, until I am in a situation where I'm provably blocked primarily on I/O in a way that it would make sense to reach for it.

But I'm also not writing web services.


Last time I used Rust was long ago, before it even had async, so idk what the outcome looks like. What kinds of libs depend on async in an infectious way? Cause if they didn't, the only alternatives seem to be spawning threads or asking for a threadpool.


HTTP client library is the one that got me yesterday. "reqwest" has a blocking mode, but it's less featureful than its async version, and the code I needed to do signing etc was only possible with the async API.

I don't think this is the right way to design libraries; I think the right thing to do is to expose the state machine etc that drives the I/O and then provide async and sync entry points to the same thing. But 'async' keyword makes it too easy to just sprinkle it all the way down, and now you've imposed your lifestyle choices on your user.


Yeah, seems like HTTP client libs should just be used in blocking/sync mode, and the user can wrap the calls in async if desired. I said the alternative is a threadpool under the assumption that the lib actually needs concurrency, but this doesn't.

Now that you mention it, this reminds me of ObjC where promises were a cool new thing and got put into tons of libs for no reason. The fad died down after a year or so.


The article shows a great example of how to implement a state machine with internal delays (do something, wait for a defined time, do something else), which is very useful in a driver or embedded context where you often just have to wait for an external device to be ready.

However, it doesn't really address how you'd construct a state machine with an external tick. It's pretty common to have a state machine called at a fixed frequency. I guess to implement that (using the pending!() Macro between state actions) you'd need to implement a custom executor?


Code using a fixed timestep usually explicitly takes advantage of this design, and is intentionally a hand-rolled state machine, so async doesn't improve it IMHO. Async is meant to hide the event loop (the ticks).

It depends how deep you want the ticks integrated with your async code. At minimum you can do:

   async { loop { next_tick_time().await; do_tick(); } }
You could also write your own async executor that just polls all spawned futures on every tick, instead of the event (waker) mechanism used by async.

But both approaches are IMHO pointless. Async is meant to be a sugar on top of events and run code only when the events happen, not run it all the time at a fixed timestep.


Your case sounds like where you'd use select()/select! to wait on multiple things? A lot of writing about async/await neglects to mention multiple potential events, but async/await's reason is really that case.

    select! {
        () = wait_for_tick() => println!("tock"),
        v = woken_thing() => println!("woke with {v}"),
    }


What is the difference between „call state machine nextStep() with a fixed timer“ vs „call async fns with a delay“?


You mean calling async functions with internal delays? The delay is defined internally to the async function, rather than externally.

The difference between:

  async fn bla() {
     doWork();

     waitForSecs(x);

     doMoreWork();

     waitForSecs(x);

     lastBit();
  }
  
and

  fn somethingElse() {
    // state is persistent
    match state {
      FirstState => doWork(); state = SecondState;
      SecondState => doMoreWork(); state = ThirdState;
      ThirdState => lastBit(); state = Done;
      _ => ()
  }
is that the first example controls the delay period, while in the second example the caller decides the period. The timing of the first example is also dependent on the execution time of the work functions, while the timing of the second is only dependent on the caller.

The benefit of the second example is that it can be completely synchronous with other parts of the system. You know that when your global tick happens, all the state transitions also happen. If each function manages their own time delays, that's not a given.


I'm a strong believer in structured concurrency over async.

That said, I do not know if there would be an easier way to implement the state machine in the article using structured concurrency over async. Maybe that is actually one place where async would be better.

I need to look into that.

However, for those in the comments arguing that async is better for I/O-bound stuff, I heavily disagree.

I implemented a multiplexer system. You can start any number of operations you want and then multiplex them. This blocks until one operation is done, yes, but hey, you have threads, so use another to do something else if you need.

But this multiplexer allows me to decide what function gets called for each type of task that finishes. This means that the caller still controls what to do then the "future" completes.

So it's equivalent to async, but it's still synchronous. Nice.

I can have it multiplex on multiple types of things too. I actually haven't implemented asynchronous I/O with it yet; I mostly use it to multiplex on child processes.

So I agree with one of the top-level comments: async is a hype. There are better ways for most use cases.


> So it's equivalent to async, but it's still synchronous. Nice.

Based on your description this is equivalent to async/await implemented with callbacks but not async/await implemented via polling.


Do you mean polling, as in calling poll()? Or polling, as in continuously checking if something is finished?


The latter.


In the broad use case where you have an IO-bound server, how is this solution better than async? It sounds like your non-async solution was to reimplement async, which shows the value of the feature.


I reimplemented async, except that the event loop is only invoked explicitly and under the programmer's control.

I know it doesn't sound like a lot, but it is.


Is the idea that you might be fanning out and want to delay starting the event loop? Fanning out in JS looks like:

  const fn = async (arg) => { ... };  // calls some RPC
  const results = await Promise.all(args.map(fn))
which might technically be starting the first func before the second is in the event loop, but I don't see why that matters for what I'm doing.


In Rust (which is the async/await system mentioned in the article), futures don't start until they're polled, so doing the pattern you describe wouldn't even have the ordering part you mention (unless you wanted it to, in which case you could use something like Tokio's `spawn`).


It does matter for what I'm doing.


You said most use cases, so I was thinking of a typical backend that's handling requests, calling RPCs, doing DB queries, etc.


async event loops in Rust are invoked explicitly by the programmer as well.


Are they always invoked explicitly? Or is it sometimes implicit?


The closest to implicit you can get is the `#[tokio::main]` attribute macro [1], which expands to something like this

    fn main() {
        tokio::runtime::Builder::new_multi_thread()
            .enable_all()
            .build()
            .unwrap()
            .block_on(async {
                println!("Hello world");
            })
    }
[1]: https://docs.rs/tokio/latest/tokio/attr.main.html#using-the-...


That's technically something the runtime implementation can choose, but for Tokio, the most common runtime, the sibling comment is correct that the only way to avoid having to explicitly define it is to use an annotation on the main function (which just generates the call to instantiate the runtime via a macro).


That's technically something the runtime implementation can choose, but for the most common runtime, Tokio,


The article only barely touches the performance reason for async. Without an explanation of that vs OS threads, it doesn't make much sense what you're trying to accomplish with async. And while you can think of it as the caller having control like the article says, you can also think of it the more typical way, that you're issuing work and getting async responses.

So I would still start at https://rust-lang.github.io/async-book/01_getting_started/02... And if you want to go deeper, think about how you'd build a massive IO-bound server without async (the answer will look like async but with extra steps).


I think Go got it right by inverting the logic around async/await. In Go you have to explicitly state that a function is to run in the background via "go fn(...)". This makes it much clearer that this code will execute concurrently. In the async/await world you can't tell by looking at a function call if it will block until it's done. Forgot an await? No compile error but your program might behave in weird ways. This has bitten me in JS too many times. Haven't done too much async Rust yet but I don't think it solved this issue from what I've seen. Why can't "await" be the default when calling an async function and if you don't need the result right away then call it with "async func(...)"?


> Haven't done too much async Rust yet but I don't think it solved this issue from what I've seen.

In Rust an async function is really just a const fn that synchronously only constructs and returns a state machine struct that implements the Future trait.

So

async fn foo(x: i32) { }

essentially desugars to

const fn foo(x: i32) -> FooFuture { FooFuture { x } }

struct FooFuture { x: i32 } // technically it's an enum modelling the state machine

impl Future for FooFuture { ... }

You have to explicitly spawn that onto a runtime or await it (i.e. combine it into the state machine that your code is already in). So that's actually really cool about how Rust handles async; that an async fn really isn't doing any magic, it just constructs a state machine and never interacts (or spawns) with a runtime at all, so it never starts running in the background, you are always in full control. And by throwing the future away, you are essentially cancelling it, there's no need to interact with any runtime either.


It's not const (const fn has a very specfic meaning in Rust), but other than that you're correct.


It is "const enough", as in, there is nothing preventing it from being called at compile time, in fact with nightly features (and no changes to the async fn), you can call it at compile time just fine. I also put const fn there to emphasize that it really can't do all that much beside constructing the state machine.


You're talking about particular JS implementation problems, not general async/await problems.

> In Go you have to explicitly state that a function is to run in the background via "go fn(...)".

In Rust you have to explicitly `spawn` a task to detach it from the current coroutine and make it run in background. Typically this is much more costly than not spawning and executing async function concurrently as part of the same coroutine's state machine (and Go actually doesn't give you that option at all).

> In the async/await world you can't tell by looking at a function call if it will block until its done.

   foo().await();  <-- blocks
   foo();          <-- doesn't block
> Forgot an await? No compile error

   warning: unused implementer of `futures::Future` that must be used
> Why can't "await" be the default when calling an async function

For similar reasons you don't want `clone()` to be implicit or rethrowing errors to be implicit (like exceptions in Java).

Awaiting implicitly would hide a potentially long and important operation. Await typically means the control is yielded back to the executor and it can switch to another task. You don't want it in a language that wants to give as much control about performance as possible to the developer. Being able to see that "this fragment of code will never be preempted" is a great thing for predictability. Rust is not Go/Java - nobody is going to celebrate achieving sub 1 ms latency here.

Additionally there are certain things you are not allowed to keep across await points, e.g. mutex guards or other stuff that's not safe to switch between threads. E.g. using a thread-local data structure across await points might break, because you could be on a different thread after await. If await was hidden, you'd likely be much more surprised when the compiler would reject some code due to "invisible" await.


> Awaiting implicitly would hide a potentially long and important operation.

but as you point out else thread, you can still hide blocking and potentially expensive operations in any function, so not seeing await give no guarantee that the operation won't block (it only guarantees that the operation won't return to the event loop, assuming that the rust event loop is not reentrant).

Hence await doesn't really protect any useful invariant.


    > foo();          <-- doesn't block
Only if you know that foo is an async function. You can't tell by the function call itelf.

    > warning: unused implementer of `futures::Future` that must be used
Interesting, I haven't seen this warning in the Rust codebase I worked a little with. I'll have to check the compiler settings. Anyways wouldn't it make sense to actually throw an error instead of just a warning?

    > Additionally there are certain things you are not allowed to keep across await points, e.g. mutex guards or other stuff that's not safe to switch between threads. E.g. using a thread-local data structure across await points might break, because you could be on a different thread after await. If await was hidden, you'd likely be much more surprised when the compiler would reject some code due to "invisible" await. 
Why couldn't the compiler clearly state the reason for the error though?


> You can't tell by the function call itelf.

You can't know that in general. Any regular Go function could spawn a goroutine return immediately too. In JS a "blocking" function could call setImmediate(…) and return too. Even in C, a function could spawn a thread and return immediately too.

You never know at the call site whether a function will block or not, in any language.

So I think polled futures actually are closest to knowing this, since the block-or-not decision can be bubbled up to the caller. In Rust the "doesn't block" example would more likely be `runtime.spawn(foo())`, since the executor is not built into the language, so spawning asynchronously is easier when left up to the caller.


> Only if you know that foo is an async function. You can't tell by the function call itelf.

That's fair point, but traditionally you don't use blocking functions in async contexts at all. It is fairly easy to lint for by prohibiting some inherently blocking calls eg.g std::io, although they might sneak in through some third-party dependency.

This doesn't have an easy solution because Rust is a general purpose language that allows different styles of concurrency adapted best to the situation, instead of one-size-fits-all like Golang.

Rust has means to annotate functions so maybe there will be some automation to deal with that in the future, similar to how `#[must_use]` works now. E.g. `#[blocking]` or whatever.

> I'll have to check the compiler settings.

This is with default compiler settings.

> Why couldn't the compiler clearly state the reason for the error though?

Stating the reason is probably solvable problem, but there is another problem: what if 5 layers down the call chain something suddenly introduces a potentially blocking (awaiting) operation? This would mean that some code that previously compiled now has to stop compiling even though it hasn't changed and even though none of the signatures it uses changed. I guess it would break things like separate compilation.

And again, it would be less readable than it is now. Now it is fairly simple - you don't have to look down the call chain to know that something can do await.


You can't change the async/await rules of Rust anymore. I get that. But if it started like I described from the beginning I don't see why that wouldn't work. It's just a question of syntax. Someone adding a blocking call 5 layers down wouldn't be any different than someone adding an "await foo()" right now. Code would still compile fine. As long as everything follows the same rules. Can't mix them obviously.


> wouldn't be any different than someone adding an "await foo()" right now

It would. `.await` works only inside `async` context. So if the method wasn't async at the top level, then adding `.await` somewhere down the call chain would force changing all the signatures up to now become `async`.

So you cannot just freely add `.await` at random places that don't expect it. Which is sometimes a blessing and sometimes a curse. Definitely when trying to hack a quick and dirty prototype this is a slowdown. But it is really good when you aim for low latency and predictability.


It wouldn't work. In Go, it doesn't matter how deeply nested a call to a blocking function is, when you do "go f()" the runtime takes care of things.

With async however, if "await" is the default, then as soon as an async function calls another async function, it would block, completely defeating the point of async in the first place.

I guess you could flip the rules and say that within an async function async is the default and within a regular function await is the default, but actually in most languages a regular function can't call an async function directly because async needs to propagate all the way to the event loop. So you'd just have async as the default again.

My explanation sucks but if you want to go into this rabbit hole look up "stackful vs stackless coroutines".


It would just make things more explicit. Whenever you want to obtain a future you'd have to add "async". The execution of async stuff would work the same just instead of having to explicitly "await" things you'd have to explicitly "async" things. Of course you can't change the way Rust does async/await now without having to rewrite all the async code so not going to happen.


I don't think so, it would only mislead and hide what's really going on.

Creating a future in Rust does not have any side effects like running the future in background. This is not JS. Creating a future is just creating an object representing future (postponed) computation. There is nothing spawned on the executor. There are no special side effects (unless you code them explicitly). It works exactly as any other function returning a value, hence why should it be syntactically different?

If you called something that returned a future but you forgot to use the returned future - how is that different from e.g. opening a file for write and forgetting to write to it or from creating a User object and discarding it immediately, forgetting to save it to a database? There isn't really a difference, and therefore all those cases are handled by `#[must_use]` warning.

Contrary, an `await` is an effectful operation. It can potentialy do a lot - block execution for arbitrary long time, switch threads, do actual computation or I/O... So I really don't understand why you want to hide this one.

Maybe the naming is confusing - because `await` does not really just `await`. It runs the future till completion. You should think about it more as if it was named `run_until_complete` (although it is still not precise, as some part of that "running" might involve waiting).


    > Creating a future in Rust does not have any side effects like running the future in background. This is not JS. Creating a future is just creating an object representing future (postponed) computation. There is nothing spawned on the executor. There are no special side effects (unless you code them explicitly). It works exactly as any other function returning a value, hence why should it be syntactically different?
Fair point.

    > Contrary, an `await` is an effectful operation. It can potentialy do a lot - block execution for arbitrary long time, switch threads, do actual computation or I/O... So I really don't understand why you want to hide this one.
I disagree here. Any normal function call can do these things. On the other hands an async function returning a future does nearly nothing. It sets up an execution context but doesn't execute (in Rust). But they usually look like a function call that actually performs the action - not so! An explicit "async" in front of it would make the program flow more clear instead of hiding it.

    > Maybe the naming is confusing - because `await` does not really just `await`. It runs the future till completion. You should think about it more as if it was named `run_until_complete` (although it is still not precise, as some part of that "running" might involve waiting). 
That's exactly speaking to my previous point. The program flow is not 100% immediately obvious anymore. One could argue that "await" is fine as is but maybe adding "async" to the call and not just function signature would add clarity.


> Any normal function call can do these things.

A normal function cannot switch threads.

   foo();   // executed on thread 1
   doSomeIO().await; 
   bar();   // possibly continued on thread 2
Now if foo() does some native calls that write some data to thread-local storage and bar() relies on that storage - that can make a huge impact on correctness. Rust is a systems programming language, so details like that matter.


surely lifetimes and the borrow checker are a better way to statically check for these sort of issues than relying on await side effects? What if an await is inadvertently introduced later inside your (implicit) critical section?


The borrow checker does catch those issues. But it it does not do whole-program analysis. It analyzes code locally, by looking at signatures of functions being called.

And also being forced to read distant code to understand if given snippet is correct would be a maintainability nightmare.

I've had enough problems dealing with Java exceptions which are allowed to pop up from anywhere and are not visible in the code.


What I mean is that if preserving invariants across function calls is important enough that an async call can break it, you want the invariant to be enforced statically by the compiler and you do not want to rely on visual inspection of the source to confirm lack of reentrancy.

Once you do that, you do not need a call site annotation that a function can be preempted as the compiler will check it for you.

Rust is uniquely equipped to enforce these guarantees.


> Haven't done too much async Rust yet but I don't think it solved this issue from what I've seen.

If you don't await your variable contains a future - how are you using that like e.g. an int, without a compiler error?


The issue arises if you don't use the returned value. Lets say there's a function "async fn saveToDisk()". You call this function before you exit the program. Now if you forget to use await on it, your program will exit without having saved the data to disk.


In any sensible API, saveToDisk would return an error status (a Result type in Rust). If you don't check for errors, then probably you didn't care of the data was actually saved or not.


Futures in Rust are annotated with the #[must_use] attribute [1], same as the Result type [2].

This means the compiler will emit a warning (can be upgraded to an error) if you forget to await a future even if it doesn't return anything.

[1]: https://doc.rust-lang.org/nightly/src/core/future/future.rs.... [2]: https://doc.rust-lang.org/nightly/src/core/result.rs.html#49...


You don't want the safety of your program to depend on whether the compiler emits a warning or not.

And turning warnings into errors just encourages people to write 'let _ = ...' to get rid of the error.


This has nothing to do with safety, just correctness.

> And turning warnings into errors just encourages people to write 'let _ = ...' to get rid of the error.

No? writing `let _ = make_future()` will clearly not await the future, why would you do it instead of just adding `.await` ?

Using `let _ = ...` is sometimes fine for Result if you really sure you don't care about the potential error you got but it's a no go with futures.


Note that the lint for un-awaited Future doesn't mention the way to silence them by assigning to _:

  warning: unused implementer of `Future` that must be used
   --> src/main.rs:9:5
    |
  9 |     foo();
    |     ^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: `#[warn(unused_must_use)]` on by default


It's actually very likely you'll have a compile error. Async functions return a Future (like a Promise in JS) and this isn't a value you can typically use inplace of others. There are also an on-by-default warning if you don't use the Future value at all


Concurrency is like a single cook preparing many different recipes at the same time, and parallelism is multiple cooks in the same kitchen. Single-threaded processes can use async/await to context switch and it feels like parallelism but it's not unless you're executing each task to its own OS thread.

... and that didn't click for me until I understood that concurrency (async/await) and parallelism (multiple OS threads and processes) are different executions modes which the Tokio runtime in rust allows you combine those different modes.


Is multi threading (always) parallelism? I never questioned this but it just occurred to me that on a single-core CPU with a single execution thread, multithreading in user code is a noop? The code will run but won’t be any faster.


Yeah, async is a type of greenthreading.


When I think of green threads / fibers, its threading handled in user space. Where a stack is starts small and it can grow and its managed by the language’s runtime.


When I read async, I think about an event loop in a single thread.

Meanwhile when I read greenthreading, I think of lightweight threads like goroutines or Java's virtual threads.

Am I wrong?


Right. Both of them are running on a single OS thread with application-layer context switching. Async's form of context switching is the event loop, greenthreading's is something analogous to OS threads.

Maybe it's inaccurate to say async is a type of greenthreading and better to say it's a form of application-layer context-switching comparable to greenthreading.


Thanks for this article.

I feel the goal of a tool should be to make common patterns easy to represent: so "sprinkling async everywhere" only doesn't work because of some common desirable pattern not being easily representable in modern languages and gotchas of the languages.

I have a lightweight thread scheduler written in C and I communicate between threads with a lockless ringbuffer. IO threads do IO. I like the idea of event loops and libuv but I want parallelism.

I wanted to understand how to compile async/await so I wrote a multithreaded unrolled state machine in Java. Each async keyword sends to another thread. I haven't got it to send to a different thread each time to load balance, I need to spend more time on it.

  task1:
    handle1 = async task2();
    handle2 = async task3();
    handle3 = async task4();
    // at this point, I want task2, task3, task4 to be going on in parallel ON DIFFERENT threads
    await handle1;
    await handle2;
    await handle3;
I am designing a notation for concurrency and asynchronocity that is a directly a state machine. I need to write a specification and I'm working on a Java runtime. It looks like this. It's inspired by BNF and Prolog facts.

  thread(s) = state1(yes) | send(message) | receive(message2);
  thread(r) = state1(yes) | receive(message) | send(message2);
I am inspired by Pony lang, Go and Erlang but I think there's still lots of potential innovation in this space that could be had.


Take a look at Cilk (a lightweight C first and C++ later dialect) that pretty much uses the same syntax you are using. Scheduling is done via workstealing.

The Cilk papers in particular are very very good. They discuss the programming model, the compilation strategy, the scheduling algorithms and more.

edit: this one for example http://supertech.csail.mit.edu/papers/cilk5.pdf


Rust explicitly has "Send" and "non-Send" futures that can execute either on any thread, or always on the same thread. Details depend on the executor, and you can roll your own if you want.

This works in Rust:

    let thread_s = spawn(join!(state1(yes), send(message), receive(message2));
    let thread_r = spawn(join!(state1(yes), receive(message), send(message2));


Maybe take a look at pi-calculus and session types if you're not familiar, the notation you're describing sounds very similar/related to that


Thanks for your reply.

I am coming from the perspective of states and behaviour. I have read about session types but my syntax is not inspired by them.

Types are important and useful but I am more interested in the rigid parts of code that types data flow through otherwise known as control flow. I feel it's an ignored part of computer science.

My goal is easy parallelism, asynchronocity and reactivity.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: