Structured concurrency in Rust

The futures system in its basics looks actually pretty close to the trio concepts.

A task, represented as a struct/enum implementing Future, is logically a tree of nested structs/enums implementing Future that represent various active sub-tasks. This structure naturally corresponds to the cancellation/concurrency scope stack/tree. These futures are poll-driven (when a wakeup comes, the main future is polled and polls the currently active sub-futures (or just the one ready to do work in some cases) in turn.

A unit of cancellation here is a subtree that corresponds to some Future. For example, if we have something like

future::select(do_some_work(), timeout(Duration::from_secs(5)))

it is represented as a structure with two child futures. When any one of them completes, the other one is dropped and its Drop::drop implementation (finalizer) is called. The finalizer is synchronous, so normally it should not do any more async work. Due to the poll-based logic there’s no need to unwind the stack as the cancellation happens outside the future that should be cancelled. This can happen at any point where the child future is not running (yields control to the parent future / executor).

The description above, however, is the ideal case. tokio, the most popular executor, provides the ability to spawn independent tasks too and users (and some libraries) use that for various kinds of background workers, which do not fit the tree-based model.

1 Like

Yeah, there’s a consensus model of “futures” that you see in Python Twisted or asyncio, or C# or Java or Javascript. But Rust’s futures are totally different – they’re much more like ZIO’s monadic concurrency (see Zio (Scala library)).

My personal feeling is that the this adds a ton of complexity without adding much value – especially once you start adding special syntax to make futures more usable, so now you have redundant procedural-ish APIs and a monadic/functional-ish APIs for the same underlying features. Trio needs support from the language for suspending/resuming a call-stack, and then adds nurseries and cancel scopes as library features, and… that’s the whole machinery. The end result is roughly similar, but the complexity budget seems much smaller. This isn’t really related to structured concurrency per se though, it’s just a general comment :-).

Anyway, as far as structured concurrency goes, obviously tokio::spawn is inconsistent with it :-). But there’s also a more subtle thing I worry about:

Using drop for cancellation seems very worrisome to me, because it means that cancellation is unilateral and the code being cancelled cannot prevent or delay it, or report the result. This is very different from most mature systems I know, where cancellation is a three-phase process: (1) request, (2) wait, (3) report success/failure. For example: this is how Trio and ZIO work. @alanb is emphasizing the need to be able to delay cancellation in the Java Loom design:

Or here’s Russ Cox explaining part of the rationale for how Go channels work:

There almost always need to be two steps in a cancellation: a request for the cancellation and an acknowledgement that work has in fact stopped.

Well, that’s the argument from authority, but what concrete problems does it cause? Here are some fundamental cases where I’ve run into this:

  • Delegating work from async code to a thread: a very standard situation is that some async code needs to call into some legacy synchronous library, so it delegates that to a thread. Now your async code gets cancelled. What happens to the thread? You can’t cancel it synchronously, so in tokio I think the only option is to leak the thread…

  • Using Windows IOCP: this is sort of similar to the previous item – say you want to write to a file using IOCP. You call WriteFile, and this schedules the work to happen in the background (generally in some state machine running in the kernel rather than a thread per se, but it’s pretty similar from your point of view). IOCP has a powerful cancellation API, but it’s not synchronous: CancelIoEx submits a request to cancel a particular operation, but the cancellation doesn’t happen immediately, and you don’t find out until later whether the operation was actually cancelled or not. Again in tokio I think the best you can do is to submit the request and then let the operation continue in the background…

What this means in practice is: say you want to write to a file on disk without blocking the event loop. You have to use either a thread (most popular on all platforms) or WriteFile (on Windows, if you’re OK with some trade-offs that are irrelevant here). Then suppose your disk write operation is cancelled. In Trio, the write operation will either return successfully (the cancel request was too late, and ignored), or else it will return a Cancelled error (in which case the file wasn’t written). Either way, you know the state of the filesystem. In Tokio IIUC, the write operation will be abandoned to keep running in the background. It may or may not actually succeed, so the filesystem is now in an indeterminate state. And even if you read from the filesystem to find out what the state was, that doesn’t help, because the write might still be pending in the background and complete at some unknown future moment – which is a classic example of the kinds of problems structured concurrency is supposed to prevent :slight_smile:

I’m really not an expert on Rust and I have huge respect for the folks working on this stuff, so I could be missing something. But this worries me.

If I were going to give more arrogant, un-asked-for advice, I’d suggest:

  • Implement async functions as a pure coroutine-style feature, where async functions have a unique type that’s distinct from regular functions

  • Leave out await syntax, i.e., use regular function call syntax for calling async functions. None of the usual arguments for decorating every call site with await apply to Rust. The compiler should enforce that async functions can only be called by other async functions.

  • Write a library that implements Trio-style nurseries + cancel scopes on top of your new coroutines, with something vaguely like Trio’s cancellation protocol: https://trio.readthedocs.io/en/latest/reference-hazmat.html#low-level-blocking. Cancellation is reported as an error using Rust’s usual Result type. Nurseries automatically cancel their children if one panics or returns an Err, and then propagate the cancel/error.

  • Integrate the existing futures code by defining an async method on futures (like fut.wait() or similar), that translates between the two regimes (e.g. when it receives a cancellation request it drops the future).

(Kotlin did several of the things above – in particular leaving out await, and using an explicit method to translate between future-land and coroutine-land.)

3 Likes

I believe this is the futures-rs issue discussing the synchronous Drop question: https://github.com/rust-lang-nursery/futures-rs/issues/1278 [n.b. it’s been a while since I looked at it, and my understanding is limited]

A proposal for a Kotlin-inspired async design was discussed here (and in the linked, preceding thread): https://internals.rust-lang.org/t/explicit-future-construction-implicit-await/7344/1

(For what it’s worth, though I lack deep expertise, I also found that appealing. I also feel like it never had a realistic chance of being accepted. My – probably biased, reductive, and by now hazily remembered – sense of that situation was that (a) there is a vocal (and possibly also large) contingent of Rust users who are extremely averse to anything that smells like implicit or hidden control flow, (b) the people driving the async/await effort were not enthusiastic about the thought of shaking things up in any dramatic way (async/await was originally planned to be a part of the Rust 2018 edition), (c) they were also wary of diverging from the more popular and familiar C# model, (d) they felt that distinguishing async fn from fn() -> Future and having to explain the difference would be unjustified complication.)

1 Like

Back linked this discusson from that discussion :slight_smile: Hope it’s helpful and will note cause a flamewar

I also feel like it never had a realistic chance of being accepted. My … sense of that situation was that

This matches my perseption as well. The irony is that async/await is also blocked on selecting specific concrete syntax for await :upside_down_face:

I was one of the primary proponents of the “explicit async, implicit await” system, and I still love it. But I’ve since been convinced that it doesn’t work for Rust, and I’ll share a couple of the reasons that worked to do so in hopes that maybe you can understand the reasoning.

All of that said (below), it doesn’t have much effect on cancellation. Many cases do benefit from cancelling being Drop + not polling again. I suspect we’ll end up with a Cancellable + Future or similar to support the cases that need full cancellation and so it stays “choose your abstractions”. I’ll leave that to the futures team, though.

Implicit async is great, but comes with a lot of incidental complexity. How do you actually get implicit async? Well, let’s start with "immediately await! any impl Future".

This works for simple cases, like async |foo: fn() -> impl Future| { foo() }. But what about async |foo: impl Future| { foo }? The only consistent solution I came up with is also surprising: the first usage awaits it and thus makes it move, no matter how it’s used. This also is a very implicit await point (as opposed to a fn call which is a “normal” edge in control flow). Also, this means that just implementing a trait for a concrete type can change the control flow of a function that takes or ever generates it. So that solution is basically out.

The other main part of the proposal is that async fn only be callable in an async context. But this creates more problems itself: sync code has to transition to async at some point (typically via a task pool executor), so it has to be able to work with async types to be able to do so. So let’s handwave and say Future::await bridges an impl Future back to async and calling an async fn in a sync context creates a Future. All’s well and dandy, right? Well, not really.

There’s a reason people want async fn() -> T == fn() -> impl Future<Output=T> in usage (which is broken by making async context only auto-await async fn, which we’ve already conceded). And that reason is lifetimes and startup work.

Consider async fn(&str, &str) -> T. The desugaring of that is fn(&'a str, &'b str) -> impl (Future<Output=T> + 'c) where 'c: 'a + 'b. The output automatically captures all input lifetimes because the future state machine could use them at any point. By writing out the desugared version manually (which may include an async closure), the implementor can both dispatch some work immediately/synchronously and use such dispatch to support short-lived input lifetimes that don’t need to be captured for the entire impl Future lifetime.

This outright cannot be represented using async fn. If it could, you could make an argument for exposing fn() -> impl Future through a wrapper async fn { fn().await() }, but the fact is that this wrapper would both delay the synchronous work and thus extend the theoretically short input lifetime. You can’t even do async fn() -> impl Future because as we’ve already discussed, this requires Future::await because the auto-awaited type has to be unnameable, so the calling code has to do a .await() anyway.

At that point, if most big library utilities will require the .await() bringing call anyway (since they will have to deal with lifetimes like this), why not make await! a first-class construct and drop all this effort to make async fn more special than the bit of syntax sugar for capturing lifetimes to an impl Future?

On top of that, Rust’s already used most of its strangeness budget. Sticking with a somewhat familiar model of async/await rather than async/async will make it less strange (even if it would help avoid false non-parallelism).

Ah, but this is a radically different proposal than the one I wrote :-).

In my proposal, remember, async is just a marker for the compiler that you want to opt-in to a coroutine / green-threads-like system – there’s no built-in connection to the Future machinery. In this approach, the only distinction between a sync function call vs. an async function call is that during an async call, the coroutine runner has the option of temporarily preempting this green-thread and letting some other green-threads run.

In Python or JS, this distinction is very important. In these languages we have what you might call “fearful concurrency” ;-). The object model is a big soup of heap-allocated mutable objects full of pointers, so preemption is terrifying! Any time you get preempted, the entire world might shift under your feet. So we have Global Interpreter Locks, and when we need concurrency we like to run it all on a single thread, and make sure all the preemption points are explicitly marked with await, so we can at least keep that scary preemption out in the open where we can see it.

But in Rust, preemption isn’t scary at all! The whole language is designed around the assumption that your code will be running on multiple threads, and that the kernel might preempt you absolutely anywhere. And in fact our coroutines will be running on multiple threads, so the kernel can preempt our code absolutely anywhere. Async function calls add a second type of preemption – now the coroutine runner can preempt us too, not just the kernel – but… from the user’s perspective, we’re going from “we can be preempted anywhere” to “we can be preempted anywhere”, so who cares? The compiler cares about the difference between an async call and a sync call, because it has to handle the stack differently. But from the user perspective, the semantics are absolutely identical in every way that matters.

And this is also a generic response to all the other concerns you raised – if my async function proposal really had some showstopper problem with lifetimes or whatever, then that would mean that regular functions in threaded Rust had the same showstopper problem. But I’m pretty sure they don’t.

So… I’ve argued that IF Rust adopted a simpler async system that was just about compiler-supported coroutines, THEN the design would work fine. But the big question is whether it’s reasonable for Rust to adopt such a system at all :slight_smile: . And that gets into what @glaebhoerl was saying:

I don’t really believe that the Rust designers are scared to invent new ideas to make concurrency work better – that does not match my observations :-). But my proposal does involve basically deprecating the entire existing futures ecosystem. It would be a soft deprecation with interoperability and a migration path, but even so, that’s an absurd thing to contemplate, especially when it’s coming from some random yahoo like me who’s never written a line of Rust.

What I can say though is: ~4 years ago, Python had built up an ecosystem around explicit Future objects, and based on this decided to commit to a C#-style async/await design, where async functions are sugar for functions-returning-Futures. And … this wasn’t exactly a mistake, because in 2015 there was no way for us to know any better. But here in 2019, AFAICT everyone involved regrets that choice. We’re spending an immense amount of effort trying to dig back out from the decision, the community may still end up essentially deprecating the futures-based system (but now with another 4 years of built-up adoption and support commitments), and if we could go back in time we would absolutely do things differently. So it’s agonizing to sit and watch Rust apparently taking the same path.

I think there’s a nasty trap here for language designers, that turns our normal instincts against us. It goes something like this:

  1. We should support lightweight/async concurrency! But how?
  2. Well, jumping straight into adding language extensions is never the right idea, so let’s try implementing it as a library first.
  3. But it turns out that to describe async programs, we need all the same control flow options as regular code and more, but we can’t use our language’s normal control structures because those are all based around the stack, so instead we invent increasingly complicated systems with like, monadic lifting and combinators and stuff, so we’re basically inventing a secondary programming language awkwardly embedded inside our regular language.
  4. Eventually it becomes clear that yeah, the pure library approach is pretty awkward. We’d better add some syntax to clean it up.
  5. But now we have tunnel vision – originally, our goal was to support lightweight/async concurrency. Now, our goal is to make the monadic system more usable. So we end up with a complex Futures layer, and then language extensions on top of that to try to reign in the complexity. And it’s terribly easy to miss that once we have language extensions, we don’t even need the Futures layer at all! It only existed in the first place to work around our language’s inability to suspend a call stack.

If you don’t have a Future type exposed to the user, then you don’t have to worry about async functions that return Futures, or about what happens when a user adds impl Future to an existing type. There is no worry about different lifetimes for the code the executes directly inside the Future-returning function vs. inside the Future itself, because there is no such distinction. (It would be like worrying about the different lifetimes for code that sets up a regular function’s stack frame, versus the code that actually runs inside the stack frame.) You don’t have to worry about what happens when dropping a Future, and how that relates to other kinds of cancellation, because there are no Futures to drop.

(Well, there is a subtlety here: obviously you will need some kind of object inside your coroutine runner that represents a partially-executed coroutine stack, that owns all the stack-allocated objects. But (a) very very few people implement their own coroutine runners so it’s fine if that part of the API has extra complexity, and maybe requires unsafe or something, (b) coroutine runners should make the guarantee that once they instantiate a coroutine, they will always drive it to completion. This makes coroutines exactly like threads: once you create a thread, the kernel guarantees that it will keep scheduling it until it completes. So everything you know about lifetimes and threads transfers over directly.)

Anyway… I hope that at least clarifies why I still think implicit await makes sense for Rust, and why I think that the idea of moving away from Futures entirely is worth thinking about, even if it seems ridiculous on first glance.

5 Likes

Hi everyone,

I’m the person who wrote https://github.com/rust-lang-nursery/futures-rs/issues/1278 as quoted by @glaebhoerl above.

I had actually some more thoughts on Rusts Future model vs alternate approaches, and eventually wanted to write an article about it. Here’s me main ideas:

The most important choice for me is whether the lightweight tasks that are offered by a framework have run to completion semantics or not. With run-to-completion semantics, any given async function will run normally until a normal return point. This is equivalent to normal functions in most common programming languages. Lightweight tasks without run-to-completion semantics however can be stopped at any yield point from the outside. It’s not guaranteed that all code runs.

The distinction between those two models has some important consequences:

Support for IO completion operations

This is what is referenced in my ticket. With run-to-completion semantics, we can naturally wrap asynchronous operations that use IO completion semantics and are not synchronously cancellable. E.g. the mentioned IOCP operations require callers to hold a buffer alive until completion is signaled, and are not synchronously cancellable. This is something that isn’t naturally supported by Rust, which is is somewhat sad, since we don’t have zero cost abstractions here. Lots of asynchronous APIs (e.g. IOCP, boost asio) follow this model.

There are workarounds available, e.g. only use “owned” buffers, and pass the ownership to some kind of IO manager which drives the operation to completion even if the async task (Future) had already been dropped.

I think think this issue had probably not seen as much review as it should, since most pioneers for async programming in Rust looked only at Unix environments, with readiness based semantics (which work fairly well in Rust).

Cancellation

This might be a larger topic for the “Cancellation” topic. In general I think there are also 2 cancellation models:

  • Cooperative cancellation (which involves signalling and wait for termination steps as @njs also pointed out)
  • Hard cancellation (which means the parent task can cancel a child task without the child task doing anything)

Most environments will only support cooperative cancellation. This works fine in normally threaded environments, as well as in asynchronous environments. The hard cancellation model however only works in certain environments, e.g. in Rusts futures model. It is important to be aware that Futures allow both models. Dropping a Future will perform a hard cancellation on the subtask. This might have fatal consequences. E.g. if a task is cancelled while half of a long message had been sent towards a socket, the socket would essentially be in garbage state afterwards.

We can however also implement cooperative cancellation in Rust. E.g. I implemented something along CancellationTokens (or Go Context) via an async ManualResetEvent, which allows subtasks to wait for cancellation signals from a parent task and exit at well defined points. The code in the subtask looks something like:

async fn parent() {
    let event = LocalManualResetEvent::new(false);
    let task_a = async {
        let wait_for_cancellation_future = event.wait();
        let other_future = do_something_async();

        select! {
            _ = wait_for_cancellation_future => {
                // We got cancelled. Return gracefully
                return;
            },
            _ = other_future => {
                // The normal operation succeeded
            }
        }
    };
    let task_b = async {
        // Cancel the other task after 1s
        timer.delay(Duration::from_millis(1000));
        event.set();
    };
    join!(task_a, task_b);
}

So technically Rust implements a superset of cancellation compared to other approaches.
However there might be a certain bias for using the hard cancellation, since it’s so much easier to utilize.

This is also where I see the pros and cons of this: The hard cancellation in Rust comes mostly for free, and is super easy to utilize. For supporting graceful cancellation people have to add more extra code.

The downside is however that it might really be hard to reason about some code where the execution of a function stops somewhere in the middle. I don’t think most people would expect and, and have thought about the corner cases enough.

The downside of cooperative cancellation is that it has to be explicitly added on each layer, especially if there is no unified framework which can help with it. This is tedious and error prone. We can e.g. see how long it took for Go to add support for Context everywhere, and where it is still lacking. Adding cancellation support in that way is also really hard. E.g. cancelling an IOCP operation might require performing a callback when cancellation is first issued. Which again requires some storage space for storing the callback, which won’t be available in a fixed size context that is also suitable for embedded programs.

Efficiency

I think the run-to-completion model unlocks various optimizations that are not possible in the always cancellable model. E.g. I already elaborated about the IO completion operations above. Another example I encountered was in my implementation of an asynchronous ManualResetEvent in Rust, which stores task to wake up as raw pointers in an intrusive wait-queue. In a run-to-completion implementation (such as the one in cppcoro) the list of waiters can just be accessed without additional synchronization, since it’s guaranteed that the task would disappear as long as it’s blocked on the event. In my Rust implementation this can happen, since the Future can be dropped while it’s waiting to get completed. That required me to add an additional Mutex.

Applicability to programming language

I think the “stopping a task from the outside” model works somewhat reasonable for Rust - mostly due to the support for destructors. With most types implementing RAII semantics, even with an external cancellation the proper actions can be implemented. E.g. any (async) mutexes that might have been locked in the task will be automatically unlocked when the task is cancelled. And other resources would be released.

I think this model does not work at all in a language which does not support destructors, and e.g. relies on finally/defer blocks for cleanup. As far as I can see e.g. Go, Javascript and Kotlin all implement run-to-completion semantics. I’m actually not too familiar with Python to tell for sure.

I am not 100% sure about C++20 coroutines. From the examples that I have seen (e.g. in the usage of Lewis Baker’s cppcoro library, I think it also favors run-to-completion. However there are some destroy methods for coroutines, which afaik should be callable in suspended state.

Personal Summary

My current personal estimate is that both models work, and have some pros and cons. I think I might have preferred the run-to-completion approach for Rust too, since it seems to unlock a few other zero-cost abstractions and seems sometimes easier to understand. I also think it would have avoided the requiring the weird Pin<Pointer> type, which is super hard to understand and get right if one really relies on pinned data structures.

However we can probably build good software with the current design too. The benefit of the current design is that more average users will get at least some cancellation right, since there is less work involved for it.

The question whether await is explicit or implicit is for me actually of secondary importance. If we would have run-to-completion, it probably is a mostly syntactical differentiation. Without it, an explicit await seems preferably, since it at least shows users that a method might never resume beyond that certain call.

1 Like

A long time ago Rust used to have green threads that were removed in 2014, about half a year before 1.0. The motivation for the runtime system removal is written down in RFC 230. In brief, the green threading system had significant overheads related to stack management (memory overhead if a full stack is allocated for each task; performance costs for stack switching, FFI problems and fighting LLVM with segmented stacks) and was difficult to embed into non-Rust compared to native threading. We’ve gone a long way and need a new solution for async, but now we’ve got more problems.

One of them is that we (Rust users) want async to work everywhere it can work. We want it to work in tiny microcontrollers without setting aside a quarter of the total memory for the second stack. We definitely want it to work with WebAssembly and interoperate with Promises, we want it to be used when Rust code is embedded into a non-Rust application and so on. The big plus of the Future/async ecosystem as is is growing now is that it looks like function calls from the CPU side too. Suspending/resuming a call stack is something that’s almost trivial for an interpreted language like Python, but for a compiled language it requires its share of magic that may cause problems when you need to interoperate with other code inside a single application or in more unusual execution environments.

There are multiple crates out there that implement coroutines, but for now the futures ecosystem seems to be some orders of magnitude more popular.

I think this is basically exception safety? And is strongly reminiscent of the difficult situation in C++.

Oh, really? I thought that was more fundamental – the issue being not whether you poll() it to completion, but that it would be unsafe to move it between two of those poll()s.

Right. I guess one way to think about it is: normally in Rust there are two kinds of error-handling: panic! for contract violations and unrecoverable errors, and Result for ordinary/expected errors.

In most systems, cancellation is modeled as a normal error that async functions might return, and that the caller can then decide how to handle – log, do some special cleanup, propagate to caller, etc. In Rust, this would mean that cancellation was a Err, and then you could use the type system to track which operations are cancellable, based on their return type – if your error handling code forgets to think about cancellation then the compiler would complain about non-exhaustive match, all that stuff comes along for free.

Futures instead use a different model, where cancellation is instead a kind of quasi-panic!. It’s not exactly a panic – it doesn’t respect panic=abort, it can’t be caught with catch_unwind, it can only happen at yield points, it only unwinds to the point where the Future was dropped – but it’s similar to panics in that it’s a forced unwind that’s invisible to the type system. Or I guess you could think of it as an Err that’s invisible to the type-system, and every time you await! there’s an implicit ?, but the ? only applies to this one type? Either way, it’s unusual! [This is an example of how futures systems always seem to grow into a new language that handles common language issues like error handling and control flow in a quirkily different way from the host language.]

Now, there are definitely some benefits to the quasi-panic way of handling cancellation. The big huge problem that makes cancellation hard is that it cuts across abstraction layers – if I want to cancel this docker API call, then somehow that needs to get translated into cancelling a socket send, which is like 5 libraries down below where I’m working. This means that (a) everyone has to integrate with the cancellation system or it doesn’t work, (b) the cancellation system itself has to take responsibility for figuring out how far the cancellation should propagate; no individual layer can figure that out on its own. Most designs fail at one or both of these points. (And I think unfortunately your ManualResetEvent is doomed because of this.)

Rust’s Futures are like, one of the two designs I know of that actually solves both of these problems! It solves (a) by saying that all code in the Futures ecosystem simply must support cancellation, because it will happen whether it wants it or not. It solves (b) by using the point where a Future is dropped to “delimit” the extent of the cancellation – and there’s no way for code in between to mess this up by catching the cancellation early. This is really cool. If our only options were C#/Go-style cancellation tokens or drop-based cancellation, then you could make a pretty compelling case for drop-based cancellation.

But, those aren’t the only options :-). In my hand-wavy-fantasy-Rust, we would use cancel scopes. That article has the full details, but the basic idea is implicitly passed cancel tokens. In Rust I guess it would be something like:

let timeout = CancelScope(timeout=TIMEOUT)
let result = timeout.apply(|| {
    // arbitrary async code
})

This means: if TIMEOUT seconds pass, and the code inside the block is still running, then any cancellable operations start returning CancelRequestedError (waking up if necessary to do so). Like drop-based cancellation, this automatically works everywhere – no need to pass explicit Context objects around, and we integrate it with the core sleeping functions, so that every way of putting a coroutine to sleep automatically has to support being cancelled. And, like drop-based cancellation, it’s delimited – the timeout cancels all the code inside the timeout.apply block, and only that code.

Let me elaborate a bit on that last part. The way this works is, you need two different Err types: one type to represent “you’re inside an operation that’s been cancelled, you need to unwind”, which is the CancelRequestedError that can be returned by any cancellable operation. And a second type, maybe CancelSucceededError, to represent “that operation was indeed cancelled and has been unwound”. The only thing that returns this is CancelScope.apply. And CancelScope.apply's logic is: if the block evaluates to CancelRequestedError, and this was the cancel scope that injected that error in the first place, then convert that into CancelSucceededError; otherwise, return the value unchanged.

This is the one downside I can think of compared to drop-based cancellation: you do have to teach people the difference between these two error types, and that if they get a CancelRequestedError, they should always propagate it outwards. I don’t think it’s a big deal, but it’s something.

OTOH, this approach also has several major benefits:

  • The unfortunate reality is that on common platforms like Unix and Windows, we have to be able to cope with operations that are uncancellable, or where the cancellation takes some time. Examples include delegating blocking work to threads (which is the only way to do async disk I/O on Unix), or IOCP (which is how you do everything on Windows). This approach supports these operations; drop-based cancellation can’t.

  • Cancellation is now integrated with the regular Rust error handling system, instead of an unusual “quasi-panic”.

  • You no longer need to work with explicit Future objects to cancel things. One of the core advantages of my hand-wavy-fantasy-Rust is the elimination of the explicit Future type and all the complexity that comes along with it, and this is a crucial enabler for that.

Of course for complicated problems like this it’s really hard to figure everything out without actually implementing it, so I could be missing something. But AFAICT the cancel scope approach is more powerful AND simpler to understand AND easier to implement. They’re also very cheap – I think just a few bytes per coroutine, a few bytes per CancelScope object, and in the uncancelled case a single highly-predictable branch per yield.

Right. But all those problems came from wanting to support yielding in any function, not just explicitly marked async functions. From the compiler’s perspective, turning explicitly-marked async functions into coroutines is exactly the same as turning explicitly-marked async functions into Futures – you have the same implementation techniques available, and the final code is basically the same. (Stepping a coroutine is the same thing as polling a future, etc.) The difference is in how they integrate with hand-written Futures. Sticking to pure coroutines has similar runtime and memory complexity, and much lower API complexity.

That’s all fine. Compiler-supported coroutines don’t need large stacks, they can interoperate with foreign APIs just as well as Futures, and they’re just as efficient as Futures in terms of both memory and CPU. Because they’re basically the same thing :-).

Right – if you don’t have compiler support, then coroutines are really inefficient (you need to allocate big stacks, like threads) and futures are the only viable path. If you do have compiler support, then they’re equally efficient, and coroutines are simpler and easier to work with.

That’s why async is such a trap for language designers: if you knew from the start that you were going to add syntax/compiler-support for async, then coroutines would be the obvious way to go. But generally by the time you’ve convinced yourself that you need syntax/compiler support, then you already have a large futures ecosystem, and you’ve trained yourself to think in terms of futures. It’s very difficult to step back and realize that as soon as you add async syntax, all your awesome futures APIs, which were important and tricky and crucial, suddenly become unnecessary bloat.

3 Likes

If we’re talking about coroutines with exactly the same limitations as Futures, the next obvious question is: what would we win by saying “There are no Futures, the return value of a coroutine call is some (magic dust) that can only be awaited or …”?

This change has some obvious drawbacks. At least it blurs the mental picture, because… for example, in

let timeout = CancelScope(timeout=TIMEOUT);
let result = timeout.apply(async || { /* arbitrary async code */ })

what type does the timeout.apply method receive? Obviously it has to be a coroutine, and the Rust-native way of saying “the argument must be a coroutine and the return value is its output” is

fn apply<C: Coroutine>(coroutine: C) -> C::Output;

…so now we’ve just renamed a Future into a Coroutine. I really doubt that Rust should go out of its way to make this kind of objects magically unnameable, given that even basic properties like “can be sent to other threads” are represented as named traits.

So now our Coroutine trait exists, but maybe we could forbid implementing it manually at all? Technically it could be done, but that will limit the ecosystem significantly because the building blocks will have to all live in std. join and select? WebAssembly and integration with Promises? Custom leaf futures for a microcontroller like waiting for a specific interrupt? Waiting for a POSIX signal? Now the standard library has to know about all of them. So IMO the furtherest point from the current would be an unsafe trait Coroutine that you can implement manually but need to keep the invariants yourself.

Now let’s look back at what we have and think again about what we’ve just said. There’s no radical difference between compiler-supported futures and compiler-supported coroutines. What does that mean?

That means that, realistically, we don’t need to break everything. The code that exists today either builds futures with combinators, or passes them directly to a executor, or implements futures manually. The first two cases don’t actually need any changes to work with a different cancellation model. The parts that would need changes are executor implementations, combinator implementations for the combinators that can drop an incomplete future, and the manually-implemented futures (primarily the ones that actually do drop any child futures before completion), as these are the only places that influence the run-to-completion versus run-while-polled difference in semantics. The leaf futures also need to change to be taught about the “cancellation pending” flag, but they are manually-implemented too, so they are already on this list.

I’m not sure if you realized it yet, but a similar cancellation mechanism is implemented since 20 years in one of the most utilized programming languages: In Java!

As far as I understand, CancelRequestedError is basically equivalent to Java’s InterruptedException, which can be thrown by blocking APIs when external cancellation is requested. Of course what you describe is more powerful, since Java doesn’t have a CancelScope, and the scope of cancellation is a whole Thread. However Java has demonstrated that the cancellation errors are not properly used. I’ve seen enough try/catch blocks that don’t know what to do with that error and just ignore it.

I think the naming of CancelRequestedError might be a bit better than the Java version, and help people understand better what they are supposed to do. Maybe also the fact that cancellation in async code is far more typical than in synchronous code makes it easier to teach. But it will still be a challenge. Maybe on par with the CancellationToken/Context approach.

How would the return type for those cancellable operations then look like? Something like Result<Result<SuccessType, DomainSpecificError>, CancelRequestError>?

That seems a bit unergonomic for most operations. But we also can’t add the cancellation error to all other error code enumerations.

Looking forward to the next @njs blog post:
“Dear language designers: Don’t go back to the futures!”

4 Likes

Given how different Rust futures are to asyncio futures, it is unfortunate that they share the same name. The fact that Rust “async functions return futures” gives the reader used to Python a completely wrong idea of Rust async. To disentangle the mess, I’ve found a more useful mapping to be:

  • Rust future trait is equivalent to the Python awaitable protocol.
  • The concrete (opaque) type that implements future returned by a Rust async fn is roughly equivalent to Python’s concrete coroutine object type.

The fact that both Python and Rust provide a distinction between the interface (Rust future/Python awaitable) and the implementation (Rust impl Future created by async fn/Python coroutine object returned by invoking a coroutine function) is very useful because it allows innovations in how coroutines are actually implemented. For example, in Python it allows cython to implement efficient coroutines, or extensions to expose coroutines from languages like Rust at Python level. In Rust it allows you not to care whether a future is implemented by async fn or by manually implementing the trait, where the reason for the latter might also be the desire to expose a more basic system, e.g. through C FFI.

My personal feeling is that the this adds a ton of complexity without adding much value

I’m curious what complexity you are referring to here - the larger API surface and difficulty in teaching the language, or complexity in the implementation and usage of futures? Because at this level the only difference between Rust futures and Python awaitables is that the former, despite being low-level and light, offer the combinator API. But that doesn’t seem like a big problem in practice. As a future implementor, you get the combinators for free because they are automatically provided by the trait, and you get them regardless of whether your future comes from async fn or from a type that implements the trait manually.

I’m talking about the API surface complexity. Both in terms of teaching (cf the comment above that “Rust has already used most of its strangeness budget”), and in terms of how features like this interact in complicated ways with other parts of the system – complexity costs are super-linear. For example, there’s a whole discussion here about how dropping Futures has tricky semantics and maybe it would be better to handle cancellation another way – but if Futures are exposed, then you’re forced to also decide what happens when they’re dropped, and all Future code has to be prepared to be dropped, etc. In the approach I’m advocating, this tricky design problem simply disappears.

Yes, that’s right. Having seen how this turned out, I also think that Python would have been better off with just the coroutine object type – that experience is a major part of why I’m hoping Rust doesn’t make the same mistakes :-).

I don’t think this it’s actually useful to allow for innovations here. Think of Rust functions (ones with no explicit ABI tag). IIRC Rust doesn’t guarantee anything about their ABI; the compiler is free to change how Rust functions are implemented. And in practice, this is fine. If you have some other kind of functions (e.g., C functions), you can still call them via a translation layer; and the compiler even has some built-in features for generating these translation layers in specific cases. None of that requires that Rust publish and fix the Rust function ABI as part of the Rust API.

We could handle coroutines the same way – the main difference is that having alternative implementations is actually less useful for coroutines. For functions, there’s an existing de facto standard ABI you need to support for FFI (i.e., the C function ABI). For coroutines, there is no existing de facto standard like this. And since coroutines are necessarily tied to some kind of event loop / coroutine runner, I doubt there ever will be one. You can still interoperate with existing libraries – it just happens at a slightly higher level, by writing native async functions that wrap the other library and handle the translation between the semantics.

In Trio, we only use Python’s concrete coroutines, and systematically pretend that the whole “awaitable” protocol doesn’t exist. It works fine. We wrap C libraries all the time. Not a problem at all.

In practice, there are three kinds of independent implementations of awaitables in Python:

  • legacy libraries that use Future-like abstractions, like tornado/asyncio/twisted. This is the backwards-compatibility case that I’m arguing doesn’t provide enough value to be worth the costs.
  • library authors that exploit the awaitable protocol to do unnecessarily wacky things, that make their APIs weirder and more confusing. I suspect these will disappear as we collectively get more experience with what kinds of usage are considered tasteful, so it’s not a huge negative, but it’s a negative.
  • Cython. This is a very special case, tied up in very specific details of Python that don’t apply to Rust. Cython doesn’t just implement the awaitable protocol; it does the best it can to implement the concrete coroutine object type. (It even monkeypatches Python’s type introspection APIs to make Cython’s coroutine type look like Python’s coroutine type.) The reason they do this is because Cython’s goal is a second independent implementation of all of Python’s semantics, with seamless interoperability, and the only difference being that it’s statically compiled. The Rust equivalent would be to try to support seamless cross-linking between crates that were compiled with different independent implementations of the Rust compiler. This is not something Rust cares about, and if that ever changes then it will take a lot more than a Future API to make it possible.
1 Like

Java’s Thread.interrupt is an interesting case study, yeah.

At a high-level, there are only two states that a cancellation API ecosystem can be in: either almost everyone writes cancel-safe code, or almost no-one writes cancel-safe code. This is because cancel-safety is an AND property – a complex routine can only be cancel-safe if all the subroutines it uses are also cancel-safe. So if any measurable fraction of the libraries in your ecosystem aren’t cancel-safe, then in complex programs you have to assume that cancellation just isn’t safe to use at all. And that means that there’s no motivation to worry about making your code cancel-safe either. So none of your users can use the cancel system either… and the spiral continues.

To avoid this, it’s not enough for any specific person to try to write cancel-safe code; you need to design things so that for 99% of devs, the perceived benefits of writing cancel-safe code outweigh the perceived costs.

And to make things worse, writing cancel-safe code is just intrinsically, unavoidably harder than writing non-cancel-safe code. It’s something that you have to be thinking about in the back of your mind, all the time. You can’t avoid this with clever API design; It’s just inherent in the idea. For example, think of some code that takes a lock, manipulates the locked data, and then releases the lock. If it gets cancelled in the middle, while the data is in an inconsistent state, what happens? If the programmer doesn’t make sure to unwind the inconsistent state correctly, then everything will be screwed up.

(Side note: This is also why Rust mutexes are poisoned on panic. I don’t think Rust’s mutexes are currently poisoned if a Future is dropped while holding a lock. That seems like a dangerous choice to me, given that Future cancellation currently has the same control flow as panic.)

So if you’re trying to design a cancellation system, this is a pretty dire problem. Obviously part of the solution is to make it as painless as possible to write cancel-safe code – but we can’t expect to eliminate the pain entirely. There are always going to be perceived costs. And we can try to force people to support cancellation, basically increasing the pain of not handling it right, but this doesn’t go very far. Our only hope is to somehow provide one heck of a carrot, so that everyone can feel the value they’re getting from the cancellation system – not just us concurrency nerds.

This is where Thread.interrupt falls down. Cancelling a thread is a fairly rare requirement. It’s a real requirement; when you need it, you really need it! But there are also a lot of people who have never needed it, or who just aren’t experienced enough yet to realize why it’s important. And since they don’t feel the problem themselves, they aren’t motivated to understand the solution or spend energy handling it correctly. Yet, Java forces them to do something, because InterruptedError is a checked exception. So, you get the half-assed code you mentioned that just catches and discards it, and then this starts the spiral I talked about above, where if interruption doesn’t work reliably, they why should anyone else bother supporting it either…

I think there is one carrot that might work: timeouts. Everyone who does any kind of networking has to handle timeouts. They’re ubiquitous, and in most APIs trying to manage them correctly is a constant source of frustration. So if you can tell people hey, our cancellation system makes timeout handling easy and frictionless, then that’s really compelling. But, Java missed this opportunity: Thread.interrupt is totally useless for timeouts! Even APIs like HttpClient that do support Thread.interrupt have a totally different method for managing timeouts.

Cancel tokens have the opposite problem: they provide a clean solution to the timeout problem, so they have a lot of perceived value, but there’s too much perceived drudgery in passing them around everywhere and integrating them into different blocking constructs, so programmers rebel. And then you end up with situations like C# or Go, where they have this lovely cancel token system but even their own socket libraries don’t support it, and here comes our spiral. Or in table form:

Design approach Perceived drudgery Perceived value
Thread.interrupt Low Low
Cancel tokens High High
Cancel scopes Low High
2 Likes

Let’s nail down some terminology. In the coroutine approach, there are two separate types: there are async functions, what you get by writing async fn .... They’re a thing you can call. And there are coroutine objects, which represent a call-in-progress.

So at the implementation level, yeah, the two approaches are isomorphic: async functions are like Future-returning-functions, and coroutine objects are like Futures.

But at the user level, the coroutine approach is simpler. With Futures, there are three steps: call the Future-returning-function -> get a Future -> await the Future to get a regular value. With coroutines, the last two steps are collapsed together inside the compiler, so it’s just: call an async function -> get back the value it returned.

It has to be an async function, not a coroutine object. I’m not sure if this is what you meant or not :-). But it’s very straightforward, just like any other higher-order function in Rust, just with some asyncs added:

// Return value simplified, see below for more discussion
async fn apply<T>(fun: AsyncFnOnce() -> T) -> T

OTOH the natural Future-based version would be:

fn apply<T>(fut: Future<T>) -> Future<T>

This is really different – remember that a Future is equivalent to a coroutine object, not an async function.

Here’s a straw man proposal for how this could all fit together:

We define async fn as a new concrete type that’s built into the compiler, just like fn. We also define a concrete type CoroutineCall to represent an in-progress coroutine.

There are two things you can do with an async fn:

  1. You can call it, just like a regular function, except there’s a restriction: this can only be done from inside another async function.

  2. There’s an operation – with special compiler support – that lets you extract a CoroutineCall object from an async fn. Maybe something like: unsafe fn coroutineify(fun: AsyncFnOnce() -> T) -> CoroutineCall<T>.

As you can see, there are trait versions of async fn, just like for fn. And, just like for fn, these are defined in terms of the concrete type async fn. E.g.:

#[lang = "async_fn_once"]
#[must_use]
pub trait AsyncFnOnce<Args> {
    type Output;
    extern "rust-call" async fn call_once(self, args: Args) -> Self::Output;
}

(Compare to the definition of FnOnce.)

I’ll be vague about the exact API for CoroutineCalls, but they would provide some sort of mechanism to step them, get back values when they suspend themselves, etc. I guess they could have a corresponding trait, beyond just the concrete object – it wouldn’t cause any particular problem – but I don’t see how it would be useful right now either, so I’ll leave it out.

Not so! This system is flexible enough to handle all that stuff. If you don’t believe me, check out Trio, which actually does it :-). The primitives you need are:

  • A way to suspend the current coroutine stack
    • while specifying how to wake it up if it’s cancelled
  • A way to request the coroutine stack be scheduled again

All I/O operations and “leaf futures” can be implemented in terms of these. Throw in nurseries and cancel scopes (which can be implemented inside the coroutine runner library, no need to be in std), and now you can do anything.

There are different ways to split up this functionality between the language runtime vs libraries. In Python, we just have “suspend with value” and “resume with value”, and then Trio implements these primitives on top of those. In Rust, the static typing, lifetimes, and desire for minimal overhead mean that you’d probably want to iterate a bit to find exactly the right way to encode these core primitives. But as a straw man, maybe a wait-for-control-C primitive might look roughly like this:

async fn wait_for_ctrlc() -> Result<(), CancelRequested> {
    suspend!(
        // first argument: setup function
        // arranges to call request_wakeup(value) at some future point
        |request_wakeup| {
            ctrlc::set_handler(move || request_wakeup());
        },
        // second argument: abort function
        // arranges that request_wakeup(value) *won't* be called
        || {
            ctrlc::unset_handler();
            AbortSucceeded
        },
    )
}

Notes:

  • All the language runtime has to support here is suspending the coroutine stack and passing these two callbacks out to the coroutine executor. Everything else can be done by the coroutine executor. (Subtlety: only the coroutine runner knows how to add a coroutine back to its scheduling queue, so we let it choose how request_wakeup is implemented.)

  • The protocol for handling aborts has to be somewhat complicated, to handle all the cases of (1) abort is easy and can be done immediately, like here, (2) abort is impossible, (3) abort is possible, but happens asynchronously, like with IOCP. This is the simple case, as indicated by the AbortSucceeded return value.

  • Thinking about lifetimes… if we want to avoid heap allocations, then I think we would need to say that if the abort function returns AbortSucceeded, then that invalidates the request_wakeup callback, which would mean we needed some unsafe in here. Maybe this can be avoided by being clever somehow. If not, then I don’t think it’s a big deal? Very few people need to implement their own I/O primitives from scratch using the lowest-level API. And the common cases for this are like, implementing your own async-friendly Mutex, which is an extremely tricky thing to do no matter how you slice it.

This part I do agree with :-). If you want to pass between the old Future style and the new async fn style, you could do that by writing async fn await<T>(fut: Future<T>) -> Result<T, CancelRequested>. It would look something like our wait_for_ctrlc function: use combinators to create a new Future that selects between the user’s future and a second future that gets resolved by the abort function, and then calls request_wakeup when that resolves, and then tokio::executor::spawn this new Future and go to sleep.

But, Futures do become a second-class citizen: they need to be explicitly awaited, they use a different cancellation system, etc. I think people would migrate away from them over time.

What a great question! I don’t know :-).

I guess for apply, being the only thing that can return CancelSucceeded, the double-nested-Result might indeed make sense. But I think you’re right, for the majority of operations, all of which can return CancelRequested, using double-nested-Result would probably suck.

In general, the functions that can return CancelRequested are functions that do I/O. And in general, functions that do I/O already have a ton of different error types. So I was imagining that we’d embed CancelRequested into the various I/O error enums. Maybe impl TryFrom<YourFavoriteErrorType> for CancelRequested? This is a place where my lack of Rust expertise means I really can’t tell for sure what would make sense, so maybe this is where the whole idea founders! I would be interested to hear folks thoughts.

2 Likes

Ok, I didn’t know that - I automatically assumed that while Trio avoided “fat” asyncio-style awaitables (awaitable objects providing additional functionality and API), but was fine with however the awaitable itself is implemented.

The awaitable protocol seems like a reasonable answer to the question “how do I implement an async def in C” or “how do I await in Python/C”? That kind of thing sometimes pops up on StackOverflow, and while the solution is far from pretty, it’s at least possible. I understand why you’d want to avoid asyncio futures with their unstructured and callback-centered API, but awaitables seem little more than a thin duck-typing generalization of Python’s existing implementation of coroutines. What is the benefit of ignoring that?

Purely implementational innovations, like Cython, seem like a plus. A more far-fetched example would be a native wrapper over a Rust or C++20 coroutine. Sure, you could still achieve a similar effect by creating an async def and using it to drive the foreign coroutines over a wrapper, but this has two drawbacks:

  • It is slower. As anyone who does Python for a living knows all too well, Python abstractions are high-cost, and every additional one (on a hot path) affects performance of the system.

  • It breaks with the broader Python conception of decoupling the virtual machine and object system from the interpreter. Most of Python doesn’t care if a callable is created with def, as long as it implements __call__, or whether an iterable is created using a generator or an itertool, as long as it implements __iter__, and so on. There are exceptions to this, and coroutines could be another, but the benefits need to clearly outweigh the downsides.

(I think this is an instance of some of the general difficulties involved with Rust’s error handling model – using nominal sum types to represent possible errors works (and to be clear, the benefit of compiler-checked exhaustiveness means I prefer it to most of the mainstream alternatives!), but achieving compositionality can be laborious. Having either row polymorphic variants or effect types, so that the set of possible errors can be expanded or contracted locally, as needed, rather than having to define new nominal enums and embeddings between them by hand (or working with nested Results, or whatever), would probably be helpful here.)

Of course, we have to think in terms of actually-existing Rust. Given that ErrorKind already contains things like WouldBlock, TimedOut, and Interrupted, your assumption that we could just add Cancelled to the list (or maybe repurpose one of the others?) doesn’t seem unreasonable to me on its face. But I’m not really familiar with the details w.r.t. what consequences that would have.

How impractical or inadvisable would it be to fold the cancellation effect into async itself? That is, any async fn has the potential to be cancelled, and when calling one, it automatically checks whether it has been (as if the ?-for-cancellation were implicit), so cancellation automatically propagates (if I understand things correctly) back up to the point where it was requested.

Is the issue that for the less-simple “(2) abort is impossible, (3) abort is possible, but happens asynchronously, like with IOCP” cases, automatic propagation is not what you want, and you need to do something else when you get a CancelRequested error? (And is that also the problem with the Drop-based approach, which the paragraph above otherwise resembles?) If that’s the case, could it be solved by making intercept-cancellation (rather than propagate) the special case, maybe with some sort of async fn catch_cancel<T>(impl AsyncFnOnce() -> T) -> Result<T, CancelRequested> primitive?