Structured Concurrency Kickoff

Thanks for your flexibility here, and this awesome kick-off post!

I want to say – I think the forum has some nice features, and it was easy to set up this way since we maintain the forum for Trio anyway, but if people feel uncomfortable with using a “project branded” space like this then please speak up. And to be 100% clear, the intention is that the “Structured concurrency” category here is totally open to any project, not specific to any one in particular.

So Trio is actually pretty fundamentalist here: it simply does not have any way to create a task without a nursery, or a nursery without binding it to a stack (using one of Python’s with blocks). Partly this is necessary because Trio is fundamentalist about never letting exceptions be accidentally thrown away, and if we allowed “heap-allocated” nurseries, then we would inevitably end up in cases where a task crashed but we had nowhere to re-raise the error. Andy partly it’s a tactical decision – when in doubt, do the more restrictive thing; even if it doesn’t work out, then you’ll at least learn something :slight_smile:

That means you have to structure code like this differently; in particular, you have to bound the lifetime of your object to some stack frame. It could potentially be some higher-up parent stack frame, but it has to be some stack frame. (This also answers the question you asked earlier, about why Trio allows nursery objects to be passed around, and whether that’s an anti-pattern – it’s certainly not something you want to do if you can avoid it, but it does make it possible to handle cases like these without relaxing our principles.)

There’s more discussion here:

Interestingly this part seems to be working out OK so far. When I wrote that post I used websockets as a hypothetical example, but we now have a real websocket library that uses this approach, and I haven’t seen anyone complain. A key part of this is that Python with blocks are extensible/composable, so the trio-websocket library defines its own with block to open a websocket, and that with block manages the nursery internally:

async with open_websocket_url('wss://echo.websocket.org') as ws:
    await ws.send_message('hello world!')

But you cannot do ws = open_websocket_url(...), because of the reasons you say.

Yeah, Trio has things easy here, since it’s an async / cooperatively-scheduled framework. This has two advantages:

  • Cancellation is one of those things that’s easy to add to a system if you do it at the beginning, but almost impossible to retrofit in later, since you end up having to basically audit all existing user code. Since we’re an async framework, we have to rewrite all the I/O routines anyway to route through our event loop, which makes it easy to add ubiquitous+uniform cancellation semantics at the same time.

  • We already have a mechanism to make the schedule points visible (async/await syntax). So we can re-use it to make cancellation points visible too.

There are other ways to make cancellation points visible to users – for example, if you have a type system to track what kinds of errors can happen (like in the Rust/Swift/Go style), then you can probably re-use this to keep track of cancellation too. But I don’t have any brilliant ideas on how to retrofit legacy libraries to make them tolerate cancellation.

Small point of clarification: I really mean, it’s the only way to make it work the way Python users naively expect it to work. Because Python made a decision long ago that control-C injects a magic KeyboardInterrupt exception, and Python users all learn this long before they start learning about concurrency. There are definite trade-offs to this design choice, and I don’t know if I’d repeat it or not in another language. But given that Trio is aimed at Python users, we wanted to keep things familiar.

Yeah, this is an interesting issue! There’s nothing really specific about KeyboardInterrupt here – you can have the exact same issue with any two exceptions that race against each other in concurrent tasks. And we’ve found that people do get bitten by this in practice: example 1, example 2, example 3.

There is an analogous thing that can happen with regular sequential code: if an exception handler crashes, then the resulting exception will tend to preempt the original exception. E.g. here:

try:
    raise ValueError("bad value")
finally:
    file_handle.clsoe()

…you won’t get a ValueError, you’ll get an AttributeError complaining that file handle has no method named clsoe. (Of course more static languages would catch this specific error at compile time, but I’m sure you can think of other situations where cleanup handlers might crash.)

Python 3 has a neat way to help debug cases like this, called implicit exception chaining: when our AttributeError preempts the ValueError, the original ValueError gets attached to the AttributeError as its “context”. Then when the traceback is printed, Python shows you both exceptions, and how they relate:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: bad value

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
AttributeError: '_io.TextIOWrapper' object has no attribute 'clsoe'

So we have a plan to extend this system to handle cross-task exception preemption as well. In your example, the unrelated exception would “win”, but we’ll make a note in the traceback information that at the point where it propagated past this other task on the call tree, then it preempted a KeyboardInterrupt. (This is part of a larger redesign of how we represent cross-task exceptions; search that issue for __context__ to read about this part specifically.)

Yeah, IMO a timeout system needs to make it easy to apply timeouts to arbitrary operations – so the “targeting” system needs to be more fine-grained than a bundle/nursery or a task. And you need to handle nesting (because otherwise you lose encapsulation, and it makes Dijkstra sad.) This is the reasoning behind Trio’s “cancel scope” system – the connection establishment code can put a 10 second timeout on the handshake code without needing to know about any other timeouts that might be in effect. I assume you’ve seen this article before, but for those who haven’t, it goes into much more detail: Timeouts and cancellation for humans — njs blog

We have a long discussion of graceful shutdown here: More ergonomic support for graceful shutdown · Issue #147 · python-trio/trio · GitHub

I think there really is value in having a “common vocabulary” for graceful shutdown, because otherwise it’s very difficult to build a complex application that might embed a third-party webserver etc., and still coordinate shutdown. We’re thinking about things like adding a “soft cancelled” state to cancel scopes, but haven’t settled on any one design for certain.