Structured Concurrency Kickoff

A post was split to a new topic: The ParaSail language

A post was split to a new topic: Zio (Scala library)

I split out some posts into new topics. My experience with discourse forums so far, splitting into topics like this seems to make things easier to follow, versus having different discussions intermingled with each other. It’s pretty easy to split topics like this though, so you don’t have to worry too much; if a discussion gets tangled up we can fix it afterwards :-).

So Trio is actually pretty fundamentalist here: it simply does not have any way to create a task without a nursery, or a nursery without binding it to a stack (using one of Python’s with blocks).

I think we are both approaching the same problem here from different sides. Namely: How to spawn a thread that lives in the scope defined by your parent?

void foo() {

void bar() {
    go(quux()); // this should be canceled when foo exits, but how?

Here’s the Trio’s solution:

def foo():
    with async trio.create_nursery() as n:

def bar(n)

And here’s how you would do the same thing in libdill:

void foo() {
    int b = bar();

int bar() {
    b = bundle();
    bundle_go(b, quux());
    return b;

Now, both are functionally equivalent, but look at the ownership of the scope (nursery, bundle). In the former case it’s first owned by foo and the by foo and bar in parallel. In the later case there’s always exactly one owner. First it’s bar, then it passes it to foo but by that time bar is already not running. The swap of ownership is atomic.

Shared ownership results in some weird scenarios:

def foo():
    with async trio.create_nursery() as n:

def bar(n)

Note how bar cancels quux which it should, as a naive person would argue, not even be aware of. In other words, it looks like a violation of encapsulation.

Yeah, this is an interesting issue! There’s nothing really specific about KeyboardInterrupt here – you can have the exact same issue with any two exceptions that race against each other in concurrent tasks.

There actually may be something specific to KeyboardInterrupt here (or to a broader set of exceptions that KeyboardInterrupt is part of).

Consider an I/O error. If it happens after the thread has been asked to exit, who cares? From the user’s perspective, the thread is already dead and asking a program to report network outages that occured after it has been shut down doesn’t sound like a reasonable expectation.

Not so for KeyboardInterrupt though. If it’s dropped, it will cause entire program to misbehave. Maybe there’s a “native” scope for each exception? Like that KeyboardException should always go directly to the main thread?

Well, sure, if you pass a nursery across an encapsulation boundary, then you have explicitly chosen to violate encapsulation :slight_smile:. You can also write the same thing in libdill:

void foo() {
    b = bundle()
    bundle_go(b, quux());

int bar(int b) {

I think the gap here is that Python really doesn’t use “passing ownership” as an idiom. The closest analogue to “ownership” in Python is to use a with block on something. So if bar wants to create something, while enforcing that its caller takes ownership, then the way you do that in Python is define bar in such a way that you have to use it in a with block:

async def foo():
    async with bar():  # this can create a nursery
    # this dedent closes the nursery

And if we did things the way you suggest, with heap-allocated nursery objects whose ownership could be passed arbitrarily between functions, then we would lose many of Trio’s key advantages – no more automatic exception propagation (how do we know whether the exception should be sent to foo or bar?), no more using cancel scopes to delimit a cancellable operation (how do we tell which nurseries are inside the cancel scope if they’re just integers that can be moved around arbitrarily?).

OK, that’s fair, yeah. Since a KeyboardInterrupt in particular can end up in totally arbitrary parts of the program, that means it can potentially end up in parts that aren’t properly set up to handle an exception like it.

I think this is an example of a pretty general trade-off. In a “serious” program, you probably want control-C (and also SIGTERM) to trigger some kind of controlled shutdown. Maybe a graceful shutdown, definitely some kind of cancellation and unwinding. The standard Python thing of just materializing a KeyboardInterrupt at some arbitrary location cannot be made fully safe and predictable, in lots of ways. What if you were in the middle of a critical section? Trio tries its best to make it at least reliable enough to be useful in practice, but like, there’s no possible way you can write a test that an exception that can happen after any instruction is always handled correctly. This is just inherent in the idea of tossing an exception into an arbitrary place and crossing your fingers. So we have ways for serious programs to catch the control-C in a controlled way and then cancel everything, etc.

So KeyboardInterrupt is broken! …except. What if you have a buggy program? For example, one where there’s a task caught in an infinite loop and ignoring cancellation? In that case a controlled shutdown is impossible, and throwing in a KeyboardInterrupt grenade is likely to work pretty well. Or what if you have a quick script whose author never spent any time at all thinking about control-C or controlled shutdown? (And of course there’s a lot of overlap between the “quick script” and “buggy program” cases :-).) KeyboardInterrupt is theoretically wrong, but in practice it handles these cases pretty well.

Slightly off-topic here, but yes in general this observation is correct
that Python’s way of handling Ctrl-C mostly works for small programs.
Larger programs with multiple threads and a mainloop should implement a
signal handler to catch it and shut down orderly without raising an
exception (usually signalling the mainloop).

So considering the KeyboardInterrupt exception as a way to make your
application shut-down orderly in a structured-concurrency environment
may be the wrong thing to try to achieve. Instead the signal should be
caught and the application can then cancel all
threads/nurseries/bundles/pools that it wants to cancel in order to
terminate cleanly.

Both of Trio’s strategies for handling control-C really, really benefit from structured concurrency, though. By default, we do the usual Python thing of raising KeyboardInterrupt, and it works as well as it ever does. (I.e., not 100% reliably in theory, but basically just fine in practice.) This relies on our ability to propagate exceptions properly, which we get from structured concurrency. The KeyboardInterrupt arrives in some arbitrary task, and then as it propagates out it automatically cancels everything else, runs finally blocks etc. Or, if you register a signal handler, then structured concurrency makes it easy to cancel everything and shut down in an orderly way – call root_cancel_scope.cancel() and the whole program unwinds itself.

By comparison, Trio’s competitors like asyncio don’t have any useful default behavior – in fact a common reaction to control-C is for KeyboardInterrupt to be raised inside the mainloop’s guts, which corrupts its state and makes everything crash hard :frowning:. Or, if you do register a signal handler, it’s very difficult to figure out what all your different tasks/callbacks/etc. are doing so you can shut them all down in an orderly way.

There is one frustrating limitation with Trio though: if you have a program that uses trio.run_in_worker_thread(...) to call into some code that blocks for an indefinite period, then the program tends to freeze when you hit control-C :-(. The reason is that we have no generic way to cancel threads, and Trio relies on cancellation to unwind the program after control-C. This seems to be basically unsolvable in the general case, though you can do things like manually make the thread check for cancellation, and hopefully as Trio’s ecosystem grows then people will have less need to call into legacy blocking libraries like this.

Fair enough. Both C or Python have their own shortcomings and special considerations which muddle the thinking about the problem. Let’s rather think of some kind of ideal language that has SC baked in:

thread_scope {
     go foo(); // lifetime is automatically bound to the enclosing scope
} // foo gets canceled here

Function can be an implicit thread scope:

void bar() {
     go foo(); // lifetime is automatically bound to the the lifetime of bar
} // foo gets canceled here

Now, that’s nice and easy. It makes it hard to shoot yourself in the foot. My ideal language should definitely support that kind of thing. But the original observation in this point was that it is not sufficient. And the example given was the socket object with a thread inside that’s returned from a function. We need a different syntax for that.

In the end I feel like these’s a case for having two different constructs.

This is hard to express in Python though, given that it (I guess) allocates everything on heap and uses GC to take care of lifetimes.

I think Floris may be getting at the same point as I did: KeyboardInterrupt is special. If we handle it just like any other exception we end up with the weird corner cases where it is not respected (see my scenario in the original post).

Once you start thinking of it as something special some solutions pop up. For example, one could make KeyboardInterrupt “level-triggered”, i.e. once user presses Ctrl+C, every blocking function from that point on will end immediately with KeyboardInterrupt. But that doesn’t work if you want to give the server a grace period to shut down after Ctrl+C.

So, my thinking was: Can’t we route the Ctrl+C event always to the main thread?

It’s easy to implement in the language and it has the desirable properties – it never gets overshadowed by an exception from a sibling thread, given that main has no sibling threads.

The problem is that it looks weird and special. But in fact, it is not. Consider how network interrputs are handled. They are captured by the OS and routed to the socket that cares about that particular connection. And if we are doing interrupt routing anyway, why not simply route Ctrl+C to the main thread?

As for the common vocabulary, that’s what I’ve tried to do, but as already said, I am not sure my list is exhaustive.

On the technical level, I would advice against making “soft cancel” part of the language. I’ve tried that and it resulted in a lot of complexity and even more importantly it made the entire “structured” thing much less obvious and intuitive.

Why not simply say that graceful shutdown is to be handled by the user. The user can open a channel to the thread, send it a “shutdown in 10 secs” message, then wait for 10 secs and cancel it by exiting the scope. The thread, in turn, would now know that it’s supposed to exit in 10 secs, but it doesn’t have to be too paranoid about it: If it doesn’t comply it will be canceled by the parent anyway.

EDIT: Actually, this is a problem I’ve banged my head against for a year or so before I realized how to solve it. I’ll write a separate post about it.

I actually like the reified lifetimes that nurseries give you. I just posted some of the reasons why in the ZIO thread. And of course you also need them for objects that encapsulate a nursery. E.g. in Trio you write:

async with open_websocket("https://...") as ws:
    message = await ws.receive()
    await ws.send("reply")

Here the async with open_websocket internally opens a nursery, whose scope extends over the open_websocket’s block. The ws object internally holds a reference to this nursery, so it’s helpful that it is an object :-). But code inside the async with block doesn’t have any way to access this nursery directly. (For example, it can’t spawn new tasks into it.) So reified lifetime objects are really useful for encapsulation and abstraction.

Also note that you can construct a reified nursery from an implicit nursery, though it’s kind of awkward:

# In a made-up language with `go` statements that are implicitly scoped to the
# surrounding function
def pseudo_nursery_manager(nursery):
    while True:
        thunk = nursery.receive()
        go thunk

Now whenever I want a nursery object that I can pass around, I write:

def uses_pseudo_nursery():
    nursery = open_channel()
    go pseudo_nursery_manager(nursery)
    # These both get to spawn tasks into my psuedo-nursery by
    # sending on the channel

Doing things this way is clunky and awkward, but you haven’t actually stopped people from shooting themselves in the foot if they really try :-).

Anyway… disallowing “dynamic” / “heap-allocated” nurseries actually works pretty well for us. And you have to admit: heap-allocated nurseries are a wishy-washy compromise that lets unstructured control-flow leak into your language. Have the courage of your convictions :wink:

I think the key thing you need to make Trio’s approach practical is to have a language that lets users define their own “block types”. So like in Python, anyone can invent a new kind of with block. And that’s what allows open_nursery to be encapsulated inside some user-defined function like open_websocket – they just have to make their function a with block.

In many modern languages, the way you would do this is instead to use some kind of closure/block syntax. Like in JS you’d probably make the primitive

withNursery(nursery => {
    # ... code that uses nursery ...

And then for the websocket, you’d do:

withWebsocket(url, ws => {
    # ... code that uses ws ...

There are similar idiomatic features in Ruby, Swift, Rust, etc. I think this is the key feature that C is missing, that’s making your life difficult.

This is actually what Trio does when it gets a Ctrl+C at an awkward time where it can’t deliver it immediately – it sets a flag, and then uses its cancellation system to inject the KeyboardInterrupt into the main task at the next available opportunity. But, this isn’t for the reason you suggest :-). Trio does it this way because it needs to deliver it to some task, and the main task is guaranteed to always be there, so it’s a convenient choice. But it doesn’t help with the issue you’re thinking of: in Trio the main task is not really special with respect to exception handling, and the KeyboardInterrupt could still get lost, if one of the main task’s children crashes while the KeyboardInterrupt is propagating.

Oh sorry, when I said “common vocabulary”, I meant, a generic way for all Trio apps to talk about it – so if my app has an embedded HTTP server, an embedded websocket server, and something else, and they’re all written by different third parties, then it’s very helpful if there’s a standard uniform way to say “All right all of you, do a graceful shutdown”.

I’d like to hear more about this! I was thinking it seemed like a pretty small and natural extension, that fits naturally with the “structured thing”; since we already have a way to deliver a cancellation at a branch of the task tree, extend that mechanism to deliver soft-cancellations as well.

We’ve experimented with “user space” implementations of this. But a channel is pretty awkward here. Take our accept loop:

while True:
    conn = listener.accept()
    nursery.start_soon(handler, conn)

The accept call might block indefinitely. But when the graceful shutdown is requested, we want the accept call to exit immediately (while any handlers are allowed to keep running of course). So if we use a channel for this, then it means we need some kind of accept-a-socket-or-else-receive-from-a-channel operation, which is really difficult. (For regular OS sockets it’s possible, if you go all concurrent ML, but that’s a whole pile of complexity that you don’t need just for this use case, and it doesn’t necessarily work for cases where listener is more complicated than a bare OS socket.)

Instead, we’d write:

while True:
    with cancel_if_graceful_shutdown_requested as cancel_scope:
        conn = listener.accept()
        nursery.start_soon(handler, conn)
    if cancel_scope.was_cancelled:
         # Graceful shutdown requested

Looking forward to it :slight_smile:

1 Like

A post was split to a new topic: Project Loom – lightweight concurrency for the JVM

2 posts were split to a new topic: Thread locals and dynamic scoping

Uh. I looked at how KeyboardInterrupt works in Python and it doesn’t look like it can be interecepted and routed to the main thread. That makes the entire discussion mute. Other languages may try this approach though.

(An interesting insight here is that this cannot be done without structured concurrency because unless the program is structured there’s no concept of main thread – all threads are equal and therefore there’s no obvious candidate to handle Ctrl+C events.)

EDIT: After even more investigation, it seems that it’s possible to install a custom interrupt handler in Python:

signal.signal(signal.SIGINT, signal_handler)

So maybe the signal can be re-routed to the main thread after all.

And, oh, maybe the exception thrown in the main thread should be Cancelled rather than KeyboardInterrupt. That would make the main thread the same as all other threads: It can be canceled only by its “parent”, which, in this case, is the user pressing Ctrl+C.

Terminology is a problem and “scope” is already overused.

Yes, that really sucks. We should come up with something less generic than “scope” or “bundle” and less arbitrary than “nursery”. Also, being a core concept the name should be two or less syllables long.

After spending some time browsing the thesaurus I’ve found “twine”. It represents the concept well (it’s a collection of threads), it’s 1 syllable long and most importatnly, the name is not already taken. Would that work for other people here?

There are a few other concepts such as nesting and a “non-cancellable” scope to shield fibers from cancellable during recovery, cleanup or critical operations. This may be relevant to some of the discussion here.

I’ve made a separate post about this topic here (Graceful Shutdown - #3 by sustrik) but I think that what I’ve wrote is not easy to grasp. I’ll try to write a blog post about the topic or something.

I just split off a few more threads:

Feel free to edit titles if I got them wrong.

For what it’s worth, the Python interpreter’s signal handling machinery already routes all signals to the main thread, in the sense that signal handlers always run inside the main thread (the one that started first). This is how the default KeyboardInterrupt handling works: the signal handler can raise an exception (!), and then it propagates into whatever the main thread was doing when it paused to run the signal handler. There are some subtleties here around Unix vs. Windows, and C-level signal handlers versus Python-level signal handlers, but that’s the basic idea.

I think this is probably irrelevant though :slight_smile: I don’t think there’s much demand for a Python structured concurrency library that uses actual threads.

Uh, we are back to terminology stuff. By “threads” I’ve meant coroutines are whatever they are called in Python.


Ah, OK. Then yeah, signal handlers can be used to route to the main task/fiber/whatever-you-want-to-call-it. That’s how Trio does it, in the cases where its policies decide this is the right thing to do :slight_smile: