Hi everybody!
There are structured-concurrency-related efforts going on for different programming languages but the entire effort is kind of scattered, without people being aware of each other and speaking to each other.
If we had a common forum, we could share the use cases, the problems, the ideas and the solutions. Each of us, irrespective of which language they are working with, could benefit from this common pool of knowledge.
First, I wanted to create a cross-language mailing list to bring everyone together. I’ve even written a kick-off email. But then I realized that Nathaniel beat me by few days by creating this forum. So, what follows is my introduction to the topic.
What’s structured concurrency?
Structured concurrency is an extension of structured programming paradigm (goto considered harmful etc.) into the domain of concurrent programming. In very broad terms, the main idea is that physical layout of the program (on screen) should correspond to its execution flow (in time). When you violate that principle, as Dijkstra argues, you’ll get ugly spaghetti code.
For structured concurrency in particular, it means that the lifetime of a thread (I am using the term broadly to mean anything between a process and a coroutine) is bound to a particular syntactic construct, typically a scope or a code block.
To give a simplest possible example, a thread may be automatically canceled when a block is exited:
{
...
go foo()
...
} // foo gets canceled here
There are different ways to explain why structured concurrency is desirable. The simplest one is to note that threads, as we know them today, don’t provide even slightest encapsulation guarantees: You call a function and once it returns you believe it is done. But unbeknownst to you it has launched a thread in the backgroud which is still running and causing mischief. This problem is particularly bad because of its transitive nature. To find out whether function launches a thread it’s not sufficient to examine the code of the function. You have to examine code of every single dependency, every dependency of a dependency and so on.
This line of though was exhaustively explored by Nathaniel Smith’s blog post “Notes on structured concurrency, or: Go statement considered harmful”:
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful
(If you are not familiar with the topic and you want to read just one article on structured concurrency, this is your best choice.)
However, there are different ways to think about the problem. Specifically, the above approach focuses heavily on the language user’s point of view and doesn’t give much hints to the language implementer.
As an alternative, I find the metaphor of a “call tree” (as opposed to “call stack”) to provide much sharper focus on what structured concurrency actually means on the technical level: Same way as you wouldn’t consider unwinding a stack frame before all it child frames have exited, a thread shouldn’t exit before all of its child threads are finished.
I’ve explored this point of view in “Structured Concurrency in High-level Languages” article:
Implementations
This list may be incomplete but I am aware of implementations in:
C: http://libdill.org/
Python: https://trio.readthedocs.io/en/latest
Kotlin: https://kotlinlang.org/docs/reference/coroutines/basics.html#structured-concurrency (native!)
What have we learned so far?
- Thread lifetimes should be bound to scopes. Kind of.
While the example above looks nice and neat, it doesn’t really work. Consider the user case of a webserver accepting connections:
{
l = listen(addr)
while(...) {
c = accept(l)
go connection_handler(c)
}
}
If the lifetime of the connection handler was bound to the innermost scope (scope of the while loop), as happens to be the case with reguar variables, each connection would be canceled when the while loop rolls over, i.e. immediately. Instead, we want the handlers to be canceled when the outer scope is exited.
That seems to imply we need two different kinds of scopes. Libdill calls these special scopes that cancel threads “bundles”. Trio calls them “nurseries”. In Kotlin it’s just plain “scopes”. It would be great if we could unify the terminology. Anyway, in this memo I’ll just call them “thread scopes”.
Here’s an example in Python:
async with trio.open_nursery() as n:
while ... :
...
n.start_soon(connection_handler, c)
...
# The handlers are canceled here.
- Thread scopes don’t necessarily live on stack.
It’s kind of like with traditional variables. In most cases we want them to live on stack (and be scoped accordingly) but once in a while one needs to do a dynamic allocation and thus escape the strict scoping rules.
Consider an example of an object that represents a network connection. It may contain a thread that sends keepalives. A function may want to create the object and pass it back to its caller. With simple data types (int) you can do that by copying the data to the caller’s stack frame. With threads this is not a good idea. Copying a stack of a running thread elsewhere is dangerous, breaks poiters to things on stack etc.
Ideally, there would be a way to create the thread scope on the heap, so that it doesn’t get deallocated when the local scope is exited:
Example in C:
connection *new_connection(void) {
connection *c = malloc(sizeof(connection));
c->sock = connect(...);
c->bundle = bundle();
bundle_go(c->bundle, send_keepalives(c));
return c;
}
void free_connection(connection *c) {
close(c->bundle); /* The keepalive thread is canceled here. */
close(c->sock);
free(c);
}
What are the open questions?
- The elephant in the room with structured concurrency, of course, is thread cancelation. With no thread cancelation there’s no structured concurrency. Period.
In theory, simple EINTR-like cancelation and a lack of asynchronous interrupts/signals is a sufficient foundation for building structured concurrency. In practice, this raises an entire host of questions: Can we guarantee such semantics with actual threads (as opposed to coroutines)? What about processes? What would be the exact semantics of turning async events into EINTR-like exceptions? What are the possible cancelation points? And doesn’t all that actually turn preemptively scheduled threads into a half-cooperatively scheduled Frankenstein monsters? And is that even desirable?
- How to handler errors?
The promise of consistent handling of errors in a concurrent environment is big part of the appeal of structured concurrency. But it’s not that easy.
Once again, Nathaniel Smith wrote an entire article on the topic:
https://vorpus.org/blog/control-c-handling-in-python-and-trio/
His claim is that structured concurrency is the only way to make Ctrl+C handling work in a way how users naively expect it to work.
And it’s easy to see why: If the exceptions are propagated up the call tree, freely crossing the thread boundaries, the KeyboardInterrupt exception is eventually going to reach the top of the call three (main function) and exit cleanly.
However, there’s a hidden race condition involved: Imagine two threads in a scope, one gets KeyboardInterrupt and the other one raises an unrelated exception. If the latter exception bubbles up to the parent thread faster it will cancel the thread that’s processing KeyboardInterrupt and Ctrl+C would get lost.
There are some rather deep philoosophical questions about the nature of errors here which I am not going to get into now. But the topic is definitely worth discussing.
- Who owns the thread scope?
Is the thread scope owned by some kind of external entity or is the ownership shared between all the threads in the scope?
The latter case means that each thread in the scope can close it, thus canceling its siblings. Is that a good idea?
I’ve written about the question here: http://250bpm.com/blog:139
- Timeouts.
How exactly are we going to handle timeouts? The most obvious way is the golang-context way where you just associate a deadline with a scope.
But the thing is much more complicated than that. Scope applies to all the contained threads (e.g. to all connections handled by a web server) but you may want to have different deadlines for each thread. For example you may want to close a connection that haven’t completed the initial handshake in 10 seconds.
Then there are grace periods. (“Close the server but give all connections 1 second to shut down cleanly.”) When the grace periods enter the picture the entire topic becomes even more complex.
I’ve tried to enumerate all use cases, but I am in no way sure I’ve covered everything:
http://libdill.org/structured-concurrency.html#what-are-the-use-cases
- Typed scopes.
Should all threads running in a scope of the same type? This is a weird theoretical question, but one that I find interesting. In a strongly types language one could make sure that one thread scope would hold a single kind of thread (e.g. user connections as opposed to admin connections).
b = bundle<connection_handler>();
bundle_go(b, connection_handler()); // ok
bundle_go(b, database_maintenence()); // error!
Done this way it would give programmers a way to reason about logical groups of threads as of atomic entities and maybe even associate specific behaviours with each group (cancelation policy, deadlines etc.)