Structured concurrency in Rust

Yeah, there’s a consensus model of “futures” that you see in Python Twisted or asyncio, or C# or Java or Javascript. But Rust’s futures are totally different – they’re much more like ZIO’s monadic concurrency (see Zio (Scala library)).

My personal feeling is that the this adds a ton of complexity without adding much value – especially once you start adding special syntax to make futures more usable, so now you have redundant procedural-ish APIs and a monadic/functional-ish APIs for the same underlying features. Trio needs support from the language for suspending/resuming a call-stack, and then adds nurseries and cancel scopes as library features, and… that’s the whole machinery. The end result is roughly similar, but the complexity budget seems much smaller. This isn’t really related to structured concurrency per se though, it’s just a general comment :-).

Anyway, as far as structured concurrency goes, obviously tokio::spawn is inconsistent with it :-). But there’s also a more subtle thing I worry about:

Using drop for cancellation seems very worrisome to me, because it means that cancellation is unilateral and the code being cancelled cannot prevent or delay it, or report the result. This is very different from most mature systems I know, where cancellation is a three-phase process: (1) request, (2) wait, (3) report success/failure. For example: this is how Trio and ZIO work. @alanb is emphasizing the need to be able to delay cancellation in the Java Loom design:

Or here’s Russ Cox explaining part of the rationale for how Go channels work:

There almost always need to be two steps in a cancellation: a request for the cancellation and an acknowledgement that work has in fact stopped.

Well, that’s the argument from authority, but what concrete problems does it cause? Here are some fundamental cases where I’ve run into this:

  • Delegating work from async code to a thread: a very standard situation is that some async code needs to call into some legacy synchronous library, so it delegates that to a thread. Now your async code gets cancelled. What happens to the thread? You can’t cancel it synchronously, so in tokio I think the only option is to leak the thread…

  • Using Windows IOCP: this is sort of similar to the previous item – say you want to write to a file using IOCP. You call WriteFile, and this schedules the work to happen in the background (generally in some state machine running in the kernel rather than a thread per se, but it’s pretty similar from your point of view). IOCP has a powerful cancellation API, but it’s not synchronous: CancelIoEx submits a request to cancel a particular operation, but the cancellation doesn’t happen immediately, and you don’t find out until later whether the operation was actually cancelled or not. Again in tokio I think the best you can do is to submit the request and then let the operation continue in the background…

What this means in practice is: say you want to write to a file on disk without blocking the event loop. You have to use either a thread (most popular on all platforms) or WriteFile (on Windows, if you’re OK with some trade-offs that are irrelevant here). Then suppose your disk write operation is cancelled. In Trio, the write operation will either return successfully (the cancel request was too late, and ignored), or else it will return a Cancelled error (in which case the file wasn’t written). Either way, you know the state of the filesystem. In Tokio IIUC, the write operation will be abandoned to keep running in the background. It may or may not actually succeed, so the filesystem is now in an indeterminate state. And even if you read from the filesystem to find out what the state was, that doesn’t help, because the write might still be pending in the background and complete at some unknown future moment – which is a classic example of the kinds of problems structured concurrency is supposed to prevent :slight_smile:

I’m really not an expert on Rust and I have huge respect for the folks working on this stuff, so I could be missing something. But this worries me.

If I were going to give more arrogant, un-asked-for advice, I’d suggest:

  • Implement async functions as a pure coroutine-style feature, where async functions have a unique type that’s distinct from regular functions

  • Leave out await syntax, i.e., use regular function call syntax for calling async functions. None of the usual arguments for decorating every call site with await apply to Rust. The compiler should enforce that async functions can only be called by other async functions.

  • Write a library that implements Trio-style nurseries + cancel scopes on top of your new coroutines, with something vaguely like Trio’s cancellation protocol: https://trio.readthedocs.io/en/latest/reference-hazmat.html#low-level-blocking. Cancellation is reported as an error using Rust’s usual Result type. Nurseries automatically cancel their children if one panics or returns an Err, and then propagate the cancel/error.

  • Integrate the existing futures code by defining an async method on futures (like fut.wait() or similar), that translates between the two regimes (e.g. when it receives a cancellation request it drops the future).

(Kotlin did several of the things above – in particular leaving out await, and using an explicit method to translate between future-land and coroutine-land.)

3 Likes