Structured concurrency in Rust

Hi everyone,

I’m the person who wrote https://github.com/rust-lang-nursery/futures-rs/issues/1278 as quoted by @glaebhoerl above.

I had actually some more thoughts on Rusts Future model vs alternate approaches, and eventually wanted to write an article about it. Here’s me main ideas:

The most important choice for me is whether the lightweight tasks that are offered by a framework have run to completion semantics or not. With run-to-completion semantics, any given async function will run normally until a normal return point. This is equivalent to normal functions in most common programming languages. Lightweight tasks without run-to-completion semantics however can be stopped at any yield point from the outside. It’s not guaranteed that all code runs.

The distinction between those two models has some important consequences:

Support for IO completion operations

This is what is referenced in my ticket. With run-to-completion semantics, we can naturally wrap asynchronous operations that use IO completion semantics and are not synchronously cancellable. E.g. the mentioned IOCP operations require callers to hold a buffer alive until completion is signaled, and are not synchronously cancellable. This is something that isn’t naturally supported by Rust, which is is somewhat sad, since we don’t have zero cost abstractions here. Lots of asynchronous APIs (e.g. IOCP, boost asio) follow this model.

There are workarounds available, e.g. only use “owned” buffers, and pass the ownership to some kind of IO manager which drives the operation to completion even if the async task (Future) had already been dropped.

I think think this issue had probably not seen as much review as it should, since most pioneers for async programming in Rust looked only at Unix environments, with readiness based semantics (which work fairly well in Rust).

Cancellation

This might be a larger topic for the “Cancellation” topic. In general I think there are also 2 cancellation models:

  • Cooperative cancellation (which involves signalling and wait for termination steps as @njs also pointed out)
  • Hard cancellation (which means the parent task can cancel a child task without the child task doing anything)

Most environments will only support cooperative cancellation. This works fine in normally threaded environments, as well as in asynchronous environments. The hard cancellation model however only works in certain environments, e.g. in Rusts futures model. It is important to be aware that Futures allow both models. Dropping a Future will perform a hard cancellation on the subtask. This might have fatal consequences. E.g. if a task is cancelled while half of a long message had been sent towards a socket, the socket would essentially be in garbage state afterwards.

We can however also implement cooperative cancellation in Rust. E.g. I implemented something along CancellationTokens (or Go Context) via an async ManualResetEvent, which allows subtasks to wait for cancellation signals from a parent task and exit at well defined points. The code in the subtask looks something like:

async fn parent() {
    let event = LocalManualResetEvent::new(false);
    let task_a = async {
        let wait_for_cancellation_future = event.wait();
        let other_future = do_something_async();

        select! {
            _ = wait_for_cancellation_future => {
                // We got cancelled. Return gracefully
                return;
            },
            _ = other_future => {
                // The normal operation succeeded
            }
        }
    };
    let task_b = async {
        // Cancel the other task after 1s
        timer.delay(Duration::from_millis(1000));
        event.set();
    };
    join!(task_a, task_b);
}

So technically Rust implements a superset of cancellation compared to other approaches.
However there might be a certain bias for using the hard cancellation, since it’s so much easier to utilize.

This is also where I see the pros and cons of this: The hard cancellation in Rust comes mostly for free, and is super easy to utilize. For supporting graceful cancellation people have to add more extra code.

The downside is however that it might really be hard to reason about some code where the execution of a function stops somewhere in the middle. I don’t think most people would expect and, and have thought about the corner cases enough.

The downside of cooperative cancellation is that it has to be explicitly added on each layer, especially if there is no unified framework which can help with it. This is tedious and error prone. We can e.g. see how long it took for Go to add support for Context everywhere, and where it is still lacking. Adding cancellation support in that way is also really hard. E.g. cancelling an IOCP operation might require performing a callback when cancellation is first issued. Which again requires some storage space for storing the callback, which won’t be available in a fixed size context that is also suitable for embedded programs.

Efficiency

I think the run-to-completion model unlocks various optimizations that are not possible in the always cancellable model. E.g. I already elaborated about the IO completion operations above. Another example I encountered was in my implementation of an asynchronous ManualResetEvent in Rust, which stores task to wake up as raw pointers in an intrusive wait-queue. In a run-to-completion implementation (such as the one in cppcoro) the list of waiters can just be accessed without additional synchronization, since it’s guaranteed that the task would disappear as long as it’s blocked on the event. In my Rust implementation this can happen, since the Future can be dropped while it’s waiting to get completed. That required me to add an additional Mutex.

Applicability to programming language

I think the “stopping a task from the outside” model works somewhat reasonable for Rust - mostly due to the support for destructors. With most types implementing RAII semantics, even with an external cancellation the proper actions can be implemented. E.g. any (async) mutexes that might have been locked in the task will be automatically unlocked when the task is cancelled. And other resources would be released.

I think this model does not work at all in a language which does not support destructors, and e.g. relies on finally/defer blocks for cleanup. As far as I can see e.g. Go, Javascript and Kotlin all implement run-to-completion semantics. I’m actually not too familiar with Python to tell for sure.

I am not 100% sure about C++20 coroutines. From the examples that I have seen (e.g. in the usage of Lewis Baker’s cppcoro library, I think it also favors run-to-completion. However there are some destroy methods for coroutines, which afaik should be callable in suspended state.

Personal Summary

My current personal estimate is that both models work, and have some pros and cons. I think I might have preferred the run-to-completion approach for Rust too, since it seems to unlock a few other zero-cost abstractions and seems sometimes easier to understand. I also think it would have avoided the requiring the weird Pin<Pointer> type, which is super hard to understand and get right if one really relies on pinned data structures.

However we can probably build good software with the current design too. The benefit of the current design is that more average users will get at least some cancellation right, since there is less work involved for it.

The question whether await is explicit or implicit is for me actually of secondary importance. If we would have run-to-completion, it probably is a mostly syntactical differentiation. Without it, an explicit await seems preferably, since it at least shows users that a method might never resume beyond that certain call.

1 Like