Async Cancellation

quietbritishjim · on Nov 12, 2021

A little off topic, but there's an excellent async library for Python called Trio (based on a conceptually similar library Curio) that implements structured concurrency [1] and timeouts [2] really well. Things are obviously a bit different from the Rust world, partly because of garbage collection (i.e. references everywhere rather than the borrow checker) but even more so because of Python's liberal use of exceptions, which are a very natural way to handle cancellation (and one of Trio's innovations is the MultiError exception). But the library documentation [3] is still an interesting read even for a Rust developer interested in concurrency.

[1] Notes on structured concurrency, or: Go statement considered harmful https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[2] Timeouts and cancellation for humans https://vorpus.org/blog/timeouts-and-cancellation-for-humans...

[3] Tutorial -- Trio documenation https://trio.readthedocs.io/en/stable/tutorial.html

d0mine · on Nov 12, 2021

The syntax enabling the structured concurrency may be included in Python 3.11 https://www.python.org/dev/peps/pep-0654/

pcwalton · on Nov 12, 2021

That's an excellent document, thanks for writing it up!

One motivation for completion futures that either wasn't mentioned or is an unexpected side effect of point #1 (compatibility with C++) is that async code in C/C++ can take non-owning references into buffers. For example, on Linux if you issue an async read() call using io_uring and you later get cancelled, you have to tell the kernel somehow that it needs to not touch the buffer after Rust frees it. There are ways to do this, such as giving the kernel ownership of the buffers using IORING_REGISTER_BUFFERS, but having the kernel own the I/O buffers can make things awkward. (Async C++ folks have shown me patterns that would require copies in this case.) Have you all given any thought as to the best practices here? It's a tough decision, as the only real solution I can think of involves making poll() unsafe, which is unsavory (NB: not necessarily wrong).

yxhuvud · on Nov 12, 2021

The cancel operation in io_uring returns a status when it completes, which mean the memory can't be freed until the ring acknowledges the cancellation. That accomplishes telling the kernel that it needs not touch the referred to memory. This follows the normal uring operation and mean that cancellation of any outstanding io in itself would need to be waited upon.

There is also need to be able to handle the race condition where the task completes successfully during cancellation in any case, or else you will get completions which you don't know to handle. So the support for cancellation must already handle processing of the completion events until the cancellation is acknowledged, even if the userspace submission of the cancellation happened long ago.

This mean io cancellation will be fairly cheap but not necessarily totally wait free. Is that really that big of a problem? It seems you make a lot of problems for yourself if waiting for acknowledgement isn't good enough.

Arnavion · on Nov 12, 2021

We have prior art of the rio crate where the Future is responsible for keeping the buffer alive until the kernel is done with it, which it does by blocking in its Drop. But since futures can be std::mem::forget'd that doesn't work (to put it charitably), and is why rio warns its users not to do that.

AFAIK there's no better way than to have the library that wraps io_uring also own the buffers instead of letting the user own them, so that it can also control when / if they're dropped.

pcwalton · on Nov 12, 2021

The way to make that work would be to make poll() unsafe and require async runtimes to uphold the invariant that poll() must be called until completion if Pending is ever returned. That way, runtimes would ensure memory safety.

This doesn't mean that functions using async/await would have to be unsafe, just functions that manually implement poll. I still don't love that, though.

Arnavion · on Nov 12, 2021

There's no way for an async runtime to do such a thing. An async runtime only knows about the top-level Future of each task. It knows nothing about what other Futures' poll()s the top-level Future's poll() calls in turn.

Async-await does not obviate this problem. `let (value, _) = futures::future::select(f1, f2).await.factor_first();` has the same problem, one of the futures is going to be dropped before it resolves.

pcwalton · on Nov 12, 2021

> It knows nothing about what other Futures' poll()s the top-level Future's poll() calls in turn.

That's why calling poll would be marked unsafe. When you call poll() manually it's on you to uphold the invariants.

> Async-await does not obviate this problem. `let (value, _) = futures::future::select(f1, f2).await.factor_first();` has the same problem, one of the futures is going to be dropped before it resolves.

Yes, you also need to reform select.

Matthias247 has done work to flesh out what this looks like: https://github.com/Matthias247/rfcs/pull/1

Arnavion · on Nov 12, 2021

>Matthias247 has done work to flesh out what this looks like: https://github.com/Matthias247/rfcs/pull/1

By making a new kind of async function that cannot be called from existing async functions, I see. I started reading it with the expectation that RTCFutures would automatically spawn() themselves to enforce the RTC requirement instead of introducing a third function color. I'm pessimistic a third function color will catch on, but let's see how it goes.

fowl2 · on Nov 12, 2021

One mechanism that doesn't seem to be mentioned is the .net style "manual" cancellation where CancellationToken is passed around everywhere and manually checked, often throwing an exception to early return.

Pros are that it's very explicit - there are no unexpected places for the code "stop", which is useful if there are effects that aren't easily modeled RAII style.

Cons are that it's very explicit, having to me manually passed around everywhere, with all the possibility of forgetting.

I guess "panic on cancel" wouldn't be very popular :P and I have no idea if/how the cancellation token scheme allocates.

yoshuaw · on Nov 12, 2021

> One mechanism that doesn't seem to be mentioned is the .net style "manual" cancellation where CancellationToken is passed around everywhere and manually checked, often throwing an exception to early return.

The async-rs/stop-token [1] library does exactly this.

This post is just the first in a series. In a follow-up post I'm planning to zoom in on the uses and design of cancellation tokens.

[1]: https://github.com/async-rs/stop-token

fowl2 · on Nov 15, 2021

looking forward to it!

Joker_vD · on Nov 12, 2021

That's also the Go's style, isn't it? With concisely named "context.Context" (not to be confused with any other relevant context) being passed around everywhere?

asdfasgasdgasdg · on Nov 12, 2021

Yes, this is the same style as go. Fortunately you don't have to check the cancel token everywhere since clients can just proceed as if you had been cancelled (in well designed systems). You just need to check it before and during potentially expensive or long running operations (IO or expensive computation, basically).

WorldMaker · on Nov 12, 2021

It's not entirely "manually" checked in .NET, it is also sometimes "panic on cancel" in that the most common "manual" "check" in .NET is a simple `cancellationToken.ThrowIfCancellationRequested()`. While it is great to implement high-level, intelligent cancellation, for the most part in .NET as long as you are passing the CancellationToken to enough low-level components you are increasingly more likely going to get a TaskCancelledException from your low-level components and maybe don't even need to wire high-level checks. (Depending on what your business logic is, and yes, the usefulness of unwinding effects at a high level such as carefully rolling back transactions or what not.)

Also it's really interesting that the original F# approach to .NET Cancellation was to handle them automatically in async { } blocks, handling and for the most passing them automatically, and after a lot of usage patterns seen in the real world they decided that the overhead of all the automated checks wasn't worth it enough and the new in F# 10 task { } blocks follow the rest of .NET in preferring entirely manual CancellationTokens.

lmilcin · on Nov 12, 2021

I have tested a lot of async frameworks in a lot of languages and I like ReactiveX for Java (RxJava, Reactor) the most.

    // process1 does nothing and takes 20s to complete
    // if cancelled, it will print a message
    private Mono<Void> process1() {
        return Mono
                .just(new Object())
                .delayElement(Duration.ofSeconds(20))
                .then()
                .doOnCancel(() -> log.info("I have just been canceled"));
    }


    // assume it does the same as process1
    private Mono<Void> process2() {
        ...
    }
    
    private Mono<Object> cancellationSignal() {
        return Mono
                .just(new Object())
                .delayElement(Duration.ofSeconds(10))
                .doOnNext(n -> log.info("oops, sending cancellation"));
    }

    // executes process1, then process2
    // if cancellationSignal happens before they finish they will be safely cancelled, you will only receive message from process1
    private Mono<Void> compositeSequential() {
        return process1()
                .then(process2())
                // timeout() is just one way to cause cancellation (in this case on an arbitrary signal)
                .timeout(cancellationSignal());
    }

    // executes both processes concurrently
    // you will receive messages from both processes as they are both getting the cancellation
    private Mono<Void> compositeConcurrent() {
        return Flux
                .just(process1(), process2())
                .flatMap(identity())
                .then()
                .timeout(cancellationSignal());
    }

azth · on Nov 12, 2021

Check out structured concurrency proposal in Loom:

* https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part2....

* https://wiki.openjdk.java.net/display/loom/Getting+started

* https://wiki.openjdk.java.net/display/loom/Structured+Concur...

vips7L · on Nov 12, 2021

Having used Rx extensively in a desktop app, I’m very looking forward to reactive programming dying. It’s done nothing but add technical debt and increase maintenance costs at this point.

sizediterable · on Nov 12, 2021

In the first `async_std::task` example, we see that `JoinHandle::cancel` is an async function. I'm curious what would you do if you didn't care whether or not the cancellation succeeded and didn't want to await the cancellation. Would you wrap the cancellation in a task?

ibraheemdev · on Nov 12, 2021

The underlying implementation sets the task as cancelled the first time it is polled, and then waits for it to stop running [0]. So you could just call `cancel` and poll the future once:

    futures::poll!(task.cancel());

[0]: https://docs.rs/async-task/4.0.3/src/async_task/task.rs.html...

sizediterable · on Nov 12, 2021

This seems to be for the `async-task` crate. My question was about `async_std::task` in particular since it doesn't have cancel-on-drop behavior.

ibraheemdev · on Nov 12, 2021

async-std uses async-task under the hood.

(async-std -> async-global-executor -> async-executor -> async-task)

sizediterable · on Nov 12, 2021

ah, makes sense!

volta83 · on Nov 12, 2021

What's rust support for structured concurrency ?

C++ appears to be going to use it as its "main"/"most fundamental" asynchronous algorithms model: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p230...

Seems quite a bit more flexible at least than Rust futures, but I am not sure I understand how they exactly differ.

rackjack · on Nov 12, 2021

Tangential, but from my understanding async code can be modelled with algebraic effects. I am excited for when they finally get added to OCaml.

slaymaker1907 · on Nov 12, 2021

Algebraic effects are way more powerful than async/await because they don't have the function color program. Higher order functions can work with both async and non-async functions and are only async if their input is async.

rackjack · on Nov 12, 2021

Yes, that's why I'm excited for them.

IshKebab · on Nov 12, 2021

I don't understand how not having to manually implement `Future` means that the "async functions might not run to completion" problem is solved (section 11).

Surely you can still accidentally write code that accidentally puts an `await` in a section that must run atomically. You still need to learn and think about the surprising cancellation behaviour.

idolaspecus · on Nov 12, 2021

Off topic but I love how this site is designed. It's unique without being wild, minimal but stylish. I really dig it.

Edit: It also looks good (better maybe) on mobile. Props.

keewee7 · on Nov 12, 2021

How does the async/await story in Rust compare with C#?

In C# it feels 95% done and the remaining 5% can lead to some significant confusion.

GolDDranks · on Nov 12, 2021

It still feels like a prototype. There's still lack of integration with some major language features such as traits, and the standard library is missing all but the most rudimentary primitives; third party libraries help, but at the moment you are required to "takes sides" which async runtime library ecosystem you choose to use (for example, to spawn tasks), even though the thing you are doing doesn't require anything specific from that library. Finally, there are unsolved design problems about cancellation and destructors. And some murky behind-the-scenes stuff about the memory safety and soundness of self-referential types (which Futures often are).

I think that at least the trait integration stuff is taking some steps forward these days, and hopefully the cancellation stuff too, which this blog post is indicative of.

GolDDranks · on Nov 12, 2021

To be sure, Rust is trying to solve very hard problems here, given to requirements: the async system is supposed to work in very high-performance situations, without allocators on embedded etc., while trying to keep the individual features that support the system relatively straightforward and orthogonal. It feels like there has been not much progress since the initial release of async in 2018, but it's a good thing that the designs are not rushed.

pas · on Nov 12, 2021

Ergonomic async depends on a smarter type system (GAT), and a lot of progress has been made on that.

bouke · on Nov 12, 2021

Can you expand on the 5%?

sershe · on Nov 12, 2021

C# story is a classic example of well-implemented, but leaky abstraction. I would say 99% of the time it works fine and very intuitive. Then 1% of the time something utterly bizarre happens, I've definitely seen very un-intuitive examples when await is used with ContinueWith spawned in an unconventional manner, and the things get hopelessly muddled (you await task t in try-finally but finally only executes after some task that is not part of t completed is something I've seen iirc).

I've also seen un-intuitive perf issues with await where replacing it with manual continuations sped things up a lot; again, I think it's very rare, but when it happens it happens.

And yeah like the other comment mention, ConfigureAwait 0_o

rawling · on Nov 12, 2021

ConfigureAwait, probably

noduerme · on Nov 12, 2021

I was hoping this related to JS. It's a shame that Promise cancellations were dropped from the spec. When they are brought back and finally implemented - and I believe they will be - all that absurd fetch AbortController signaling code can be refactored.

slaymaker1907 · on Nov 12, 2021

I don't think they will be. Cancellation of a task after it has started in JS is fundamentally unsafe since you don't have destructors for cleaning up code mid execution. The best thing you could do would be to have cancellation work as an exception whenever you call await that can be caught.

dangerbird2 · on Nov 12, 2021

Javascript does have finalizers now[1], which is a much-needed feature for webassembly interop where you need to explicitly call destructors to clean up any wasm data the object owns. However, this wouldn't help with promises where you need cleanup code to be called deterministically, not whenever the GC gets around to cleaning up the object. As you mention, using cancellation tokens that raise an exception is a reasonable solution

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

noduerme · on Nov 13, 2021

I'm not sure I understand the distinction. Right now if you throw an exception inside an async function, that "cancels" it and the function's object references are eventually GC'd. You catch the exception outside the promise. I'd imagine external cancellations would work in a similar way. Why is it unsafe for something outside to tell the async function to just throw an exception now, halt and dump its object references? If you're planning on explicitly doing it you'd need to catch it anyway. The difference is being able to do it from outside once the async function is running.