Runtime cost of the nursery model vs go/fork and join

A similar model to Trio’s nursery model is used in Rust’s Rayon::scope: https://docs.rs/rayon/1.2.0/rayon/fn.scope.html

One thing that caught my attention is that it suggests using manual joins instead of Rayon::scope when possible when performance or heap allocations need to be limited. Of course, this is largely because Rust is a systems language and it largely won’t matter in a language like Python, but it does make me wonder about how SC affects runtime costs.

Mentioning a few bullet points to get discussion started.

  1. Different implementations may have very different costs. Full stackful coroutines implemented with continuations are going to be more expensive than stackless coroutines which are compiler sugar for the CPS transform.

  2. In compiled languages, Nurseries can be viewed either as a static scope or as a dynamic object which can be passed around as a function parameter rather than as a lifetime parameter.

  3. The nursery models invariants may sometimes hinder algorithms that require the threads to not be structured as trees, but it also creates more opportunities for optimizations.

  4. Scopes & nurseries work well with RAII, and possibly escape analysis.

  5. (absolutely unlikely to be relevant near term but a pet issue of mine that may be relevant in 3-4 decades) Scoped concurrency handling helps contain side effects, so it’s easier to keep it around on a reversible computing architecture where you need to be able to unjoin processes to reverse computation in order to be able to reclaim the entropy they produced.

1 Like

The nursery models invariants may sometimes hinder algorithms that require the threads to not be structured as trees…

Just curious: do you have an example of such an algorithm? Ever since I discovered structured concurrency, I have been trying to find an example.

At this point, however, I am starting to wonder if structured concurrency will gain its own theorem along the lines of the structured program theorem, but a good counterexample would prove me wrong.

1 Like

Fundamentally, a nursery is a way to use the type system to force code to join all child tasks. So there’s really no intrinsic cost except for what’s required to join all child tasks. That has a small cost – you need some kind of data structure to track which tasks are still running, and tasks need some kind of reference to the nursery so they can report completion. But… in a correct program, you basically have to do this tracking anyway, and my guess is that if you’re doing it by hand, then on average you’re probably going to be less efficient than the version built into the runtime, not more efficient.

It’s mostly academic though – keeping a list of running tasks is a pretty trivial cost no matter how you do it.

1 Like

The structured program theorem says that any computation that can be expressed with GOTOs can also be expressed with structured control flow primitives. It’s true, but mostly[1] useless, because what we actually care about is whether any computation that can be expressed elegantly with GOTOs can also be expressed elegantly with structured control flow, and the theorem does not say anything about elegance :-).

The situation for structured concurrency is exactly the same. You can implement a general go primitive using nurseries: just create a global nursery at the top of your program, stash it in a global variable, and then define go as a function that spawns a task into that nursery that runs the user’s code, captures any exceptions, and discards them. Therefore, any computation that can be expressed with go can also be expressed with structured concurrency… but it’s super gross so it doesn’t answer the question we really wanted to know. (Though one could argue that it’s merely revealing the inherent grossness of go statements.)

[1] The structured program theorem does actually have one surprisingly important practical application that I know about: JS/WASM only allow structured control flow, so C/C++ → JS/WASM compilers like emscripten actually rely on this theorem to remove the GOTOs from C/C++! They call it the “relooper”.

4 Likes

But then again, how do you define “elegance”? Short code? Ease of writing the code? Ease of guaranteeing correctness?

I personally think that the latter is the best definition, and I also think that structured concurrency enables it more strongly than unstructured concurrency, just as structured programming enables it more strongly than unstructured programming.