Discussion: "Notes on structured concurrency, or: Go statement considered harmful"

You can use this thread to discuss the blog post Notes on structured concurrency, or: Go statement considered harmful.

1 Like

Out of curiosity, what is your opinion about this post?

1 Like

Great article, I’m sold on Nurseries. You mentioned “The only other software I’ve used that thinks “print something and keep going” is a good error handling strategy is grotty old Fortran libraries, but here we are.”

It also reminds me of VB Script and VBA where people commonly wrote on error resume next at the top of the script without even thinking what that meant, and wondering why their code silently failed.

1 Like

That’s probably my lack of Python knowledge, but in Automatic resource cleanup works. you write that you can’t access already close file handles. However, doesn’t this depend on ordering the with-blocks?

That is, this will work as described:

with open("my-file") as file_handle:
  async with trio.open_nursery() as nursery:

However, this will end in an error:

async with trio.open_nursery() as nursery:
  with open("my-file") as file_handle:

The problem here is that start_soon returns immediately, which exhibits goto-like control flow problems (as described). However they are localized, which is the important part :slight_smile:

I forgot to reply to this at the time, but it just came up again in chat, and I ended up writing a little reply that I think might be a good complement to Cory’s post that @belm0 linked to. So I’ll paste it here too:

yeah, python async/await is definitely a two-color system
I think my main frustration with that post is that it takes for granted that having two function colors is obviously a bad thing
…though to be fair, I can see how if you’re starting with js callbacks as your main experience with them, then it does feel pretty obviously problematic
but fundamentally, function color is a way to encode certain behavioral properties into your type system, like “this function does IO”. The thing about that property is that it’s transitive over call stacks: if A calls B and B does IO, then A does IO too. So if you think “this function does IO” is a useful thing to encode in your type system, then function color is how you do it. And that plausibly is a useful thing to include in a type system, e.g. haskell does it.
so IMO the question is really whether any given function color system has a good enough ergonomics/functionality tradeoff to be worthwhile.
in trio, the property it encodes is “could release the GIL and could be interrupted by cancellation”, which are both pretty important behavioral properties! putting them in the type system has a lot of value. and migrating to async/await is painful, but it’s way less painful than callbacks.

Yeah, I was being a bit hand-wavy there. The point is that you can now see which code is running inside the with block and which code isn’t: the nursery.start_soon call happens inside the with block, and indeed, the file will remain open for the start_soon call. But if you know what start_soon does (it schedules a task to start running in the near future, but returns before it’s actually started), then hopefully it should be pretty obvious that closing the file after calling start_soon is not what you want. And – crucially! – you can tell that just from looking at the source code inside the with block; you don’t have to go look inside read_file.

Here’s another example that might make this clearer, that doesn’t involve concurrency at all. It’s totally possible to use regular with blocks in a broken way too, for example:

with open("my-file") as file_handle:
    another_var = file_handle

Obviously this code won’t work correctly, but you can write it if you want. So in general, with blocks don’t like, strictly guarantee that you can never access a closed file handle. (For that you need something like Rust’s lifetime tracking.) But in practice, it’s OK; people don’t write code like this by accident, because it’s obviously wrong. And once you’re used to nurseries, then your “bad” example is obviously wrong too, in basically the same way.

I have just read your toughts on concurrency API and this post about “go statement”. First thank for these insights.
Something troubles me tought. It is about error handling. You said that:

As noted above, in most concurrency systems, unhandled errors in background tasks are simply discarded. There’s literally nothing else to do with them.

To be honest I am still a “newbie” in concurrent programming, and I just started using gevent and learning asyncio but I think your previous quote is not fair. Those two frameworks have a task/future mechanism to get an exception occuring in the task and returning it.

Also, I’m not sure that propagating errors to the parent task is the best strategy since it discards all the children tasks that are still running. In the context of an application server, I think it’s embarrassing that all user queries can be canceled because of an error that occurred on a single query.

Thanks in advance for the answers :slight_smile:

Right… every system has a way to handle errors explicitly – in regular Python or Trio you use try/except/finally; in asyncio or gevent you use try/except/finally+some custom method to explicitly check exceptions in background tasks. But the key word in the text you quoted is unhandled – sometimes people don’t write explicit error handling code like this, so as framework authors we have to decide what should happen. We have to pick some default behavior.

The Zen of Python suggests: Errors should never pass silently, unless explicitly silenced. That’s what Python and Trio do: exceptions keep going until you catch them or until the program exits. Of course this may not always be what you want (like in your application server), but the solution is easy enough: just catch the exception and do something else with it :slight_smile: It’s the safest default, but you can always override it.

In other frameworks like asyncio or gevent, if an error occurs in a background task, and you don’t remember to explicitly check for it, then they just print a message on the console and then discard the error. Sometimes, if you’re lucky, this might be what you want. But the framework really has no way to know which unhandled errors you intended to discard, and which ones are actually some serious bug that needs to be handled. So it’s a pretty dangerous thing to do by default.

Thank you for the reply. I’m currently reading trio tutorial, I’m gonna try it on.


Thanks, yes you are right, you can always leak the file handle. Just wanted to make sure I understand how this works in Python correctly :slight_smile:

1 Like

Thank you for the blog post, I really enjoyed reading it and thinking about what could be done with this idea.

One thing I thought of when you talked about not allowing background tasks to just go off, is some code in Discourse that wants to run a block of code after the request is done being rendered - that is, it wants to escape the nursery of the request! Oh no, that’s bad, and means we need an escape hatch around the nursery concept… right?

Well, let’s open the code and see what we can do about this.

def self.defer_track_visit(topic_id, ip, user_id, track_visit)
  Scheduler::Defer.later "Track Visit" do
    TopicViewItem.add(topic_id, ip, user_id)
    TopicUser.track_visit!(topic_id, user_id) if track_visit

… huh, Scheduler::Defer.later sounds one heck of a lot like start_soon, except later rather than sooner, doesn’t it?

Aside: What’s this code doing? It’s logging some simple stats about GET requests. Waiting on a database write operation before you finish up a read is going to be really slow. So, we want to punt the database write until after the response has been fully served. The Deferred blocks run at the end of the request, right before returning to the listen/accept.

So all we really need to satisfy this use case is to escape the “HTTP request” context and lob a task onto the “HTTP server” context. This is solved by the server framework simply exposing a nursery somewhere that tasks like this can be tossed onto. In this case - a request just needs to expose an API to put callbacks in a list, and call them all after the request is done and before going onto the next one.

(Any truly long-lived tasks in Discourse go onto Sidekiq, where they get stored in Redis and picked up by background worker processes.)

You know, I hear the Go team has been struggling to come up with something that is truly worthy of calling something ‘Go 2.0’. May I propose this tagline:

Go 2 - ‘go’ less

[Sorry, Akismet banished you to moderation purgatory and I only just noticed. Unrelatedly, do you happen to know if there’s any way to get email notifications when there are posts in the moderation queue?]

Yep, that’s it exactly! The point of “structured concurrency” isn’t to normatively force everyone into strict-lexical-scope structure, it’s that you should have some explicit structure (and probably in 95% of cases it will look like lexical-scope, just because functions are a fantastic way to organize code in general).

  • The unification of single-threaded and multi-threaded and asynchronous and distributed
    • async/await, Project Loom fiber, Warehouse/Workshop Model

The Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Warehouse/Workshop Model

I really like this post, and I think this is not praised enough. I share it every chance I get, and it has really shaped the way I think about using about async elsewhere (not just asyncio, but also in other programming languages). Even without using trio, I think applying the ideas presented in the post really do improve the overall code quality. Thanks for this!


What if the functions in nursery taking very long time. Is main thread waiting on the results?

@njs If there is a programing language which has no callback, no new Thread etc, it just has a keyword like nursery to do concurrent things. How can we use it to build UI app. It seems all UI app need callbacks.

The task/thread/whatever that opened the nursery does wait for all the functions in the nursery to finish before it can close the nursery. And the main task/thread can do other stuff while the nursery is open – it only stops to wait once it tries to close the nursery. Anyway, this is often what you want – like how when function A calls function B, and function B takes a long time, then function A will wait for it to finish :-). But if you want to stop the children sooner, in Trio you can just cancel them, and hopefully other structured concurrency systems will have similar features.

Most current UI frameworks all use callbacks to deliver event notifications, but really all you need is some way for your program to get notified that something happened (a key is pressed, a checkbox gets clicked, etc.). So one possible option is that you write a loop like:

async for event in ui_toolkit.events():
    if event.kind == "checkbox-toggled":
        print(f"{event.checkbox.name} is now set to {event.checkbox.state}")
   elif event.kind == ...

This is roughly how Redux and Elm work. QTrio is experimenting with some similar ideas, where a task can block waiting for some Qt signal to get sent, like await widget.button.wait_clicked(), or async for signal in qtrio.subscribe(widget.button.clicked, widget.button.destroyed, ...): ....

I suspect Kotlin also has some nice strategies here, because Kotlin is used heavily for GUI apps and they’re big on structured concurrency, but I’m not as familiar with the details of how those APIs work. You should check them out if you’re interested though :-).

Also note that it’s kind of early days for structured concurrency so far: it’s likely that we’ll figure out better patterns for these kinds of apps over time. But those are some initial thoughts.

1 Like

I don’t know if things like Redux is a good idea, it split code everywhere.
Maybe it should be like:

async with trio.open_nursery() as nursery:
    while True:
        btn = render_button()
        nursery.start_soon(() => {
            while True:
                event = btn.click.accept

Just pesudocode, I’m not familiar with python.

In C, I’ve implemented it the other way round: Cancel by default; wait-for-all requires an explicit action.

I am still not sure which pattern is better. It could be that it’s domain-specific: Wait-by-default in supercomputing, cancel-by-default in frontend apps, or something in similar vein.

A side thought: The wait-by-default naturally follows from the concept of all peer threads being equal. If none of them is special, it’s not obvious which of them should cause cancel-by-default. (Unless you cancel when the first one exits, which is, in my experience, a pretty rare scenario.)

However, my feeling is that all-threads-are-equal concept doesn’t really fit the reality. On the technical level, one thread is special in that it continues to use the stack of the parent thread – which is not an argument, but it may be a hint.

Furthermore, in real life, one thread tends to do different kind of work than its peers: Consider socket accept loop and in one thread and then handling of individual TCP connections in remaining threads.

The patter, I think, generalizes in that you often have one “control” thread and many “worker” threads. But if so, then the concept of all-threads-are-equal does not apply and wait-by-default pattern doesn’t automatically follow from the fundamentals. It may as well be replaced by “cancel when control thread exits”.

Yeah, that’s similar to what qtrio is doing – one task can subscribe to one set of events, while another task is subscribed to a different set.

I guess the important point as far as “structured concurrency” goes is that either way, the parent always waits for the children to exit. Even if you request a cancel, the parent still waits for them to notice it and unwind themselves. Exactly how you request the cancel or not is just a minor question of ergonomics :slight_smile:

BTW, you might enjoy this discussion, where we’re talking about adding special “helper” or “service” tasks that are automatically cancelled at the end of the nursery… but with the twist that they’re never cancelled before that. (So if you cancel the whole thing, that just gets delivered to the main task, it can continue using the services while it’s unwinding, and then once it finishes unwinding the helper tasks are unwound too.)