Structured concurrency proposal for C++20

There’s a really interesting proposal under consideration for C++20, that’s highly relevant to us here: https://wg21.link/p0660 (PDF), draft implementation. According to Herb Sutter, the proposal is “design-approved”, and the hope is that the exact wording will be formally approved in July.

The idea is to add a new type std::jthread, which has the following properties:

  • Creating a jthread object spawns a thread
  • The thread is automatically passed a cancel token argument (the proposal also adds a standard cancel token type, called std::stop_token, whose semantics seem to closely follow the equivalent in C#)
  • When a jthread object goes out of scope, then it automatically cancels the token and then waits for the thread to exit

This is essentially the same combination of features that we’ve seen before in libdill’s bundles, Trio’s nurseries, golang’s errgroups, etc.

One difference from Trio is that “stop tokens” are passed explicitly, and unlike C# there’s no notion of linked stop sources. This means that if you have a cancellable operation that takes a stop_token argument, and it internally uses jthreads, then the propagation of the cancel from the external stop_token to the internal threads only happens implicitly, through the jthread destructor. This is similar to how libdill handles things.

It also does something different from libdill – and from all other known “structured concurrency” implementations. Each jthread object manages only a single thread, versus a collection of threads in all the other libraries.

Each of these decisions seems reasonable enough on their own. But, as @sustrik pointed out in one of his blog posts, together they cause an unfortunate situation, because it means that if you have multiple threads you want to stop, then issuing stop requests happens in a totally sequential manner.

For example, in this code, if we call external_stoken.request_stop(), then it might take up to 3 seconds for all work to stop. In the equivalent in other systems, it would only take at most 1 second, because the request would be sent to all threads simultaneously:

void myfunc(std::stop_token external_stoken = {}) {
    std::jthread nested_worker_1([] (std::stop_token nested_stoken_1) {
        while (!nested_stoken_1.stop_requested()) {
            sleep(1);  // placeholder for real work
        }
    });

    std::jthread nested_worker_2([] (std::stop_token nested_stoken_2) {
        while (!nested_stoken_2.stop_requested()) {
            sleep(1);  // placeholder for real work
        }
    });

    // Do some cancellable work in the main thread
    while (!external_stoken.stop_requested()) {
        sleep(1);  // placeholder for real work
    }
}

I guess one could avoid this by explicitly setting a callback on the parent stop token to immediately propagate cancellation requests to the nested threads, e.g. the example function could do something like:

std::jthread nested_worker_1(...);
std::stop_callback cb1{
    external_stoken,
    // Explicit cast-to-void is required by the draft spec
    // I don't know why
    // It might be an error in the draft?
    [&] { (void)nested_worker_1.request_stop() }
};

// (and repeat for nested_worker_2)

But this seems potentially tiresome and error-prone… especially if we have a dynamic set of jthreads.

1 Like

I posted to Josuttis’ repo to let him know about this discussion.

Inevitably, there’s now a bunch of replies over there instead of over here…

Not being able to cancel threads in parallel is the smaller issue here, IMO.

The bigger issue is garbage collection.

Consider an accept loop. For each accepted connection you create a jthread object and put it into a list. Once the server is shutting down, the accept loop gets interrupted, the list of jthreads gets out of scope and individual jthreads gets cancelled. So far so good.

Now imagine that the server is running for a long period of time (months, years). It accepts, say, 100 connections per second. So, eventually, there’ll be a lot of jthreads in the list (over 3 billion a year). And most of them will be already done with their work, being just placeholders, so that accept loop doesn’t have to deal with exiting threads in asynchronous manner. And even if the placeholders are small they will eventually eat all the available memory.

If, on the other hand, jthread object could manage multiple threads the problem would disappear. As a worker thread exists, it would be removed from the list in the background, by jthread itself. There would be no memory leaks.

Since each jthread represents an actual operating system thread, I guess it’s intended for cases where you know the total number of threads in advance (e.g., one thread per cpu)? On most systems, if you have a dynamic, unbounded set of tasks to do, and you try to spawn an OS thread for each of them, then the kernel scheduler will fall apart long before you start having problems from leaking old jthread handles.

Though, the one platform where this is somewhat workable is 64-bit Linux, and 64-bit Linux is a pretty common deployment target for high-performance servers.

Also Solaris. But yes, agreed, with moderate amount of threads created per second this is unlikely to be become a major problem before the server is restarted. Still, I kind of feel uneasy about having a construct in the language that is, in common scenarios, basically a memory leak.

CppCon talk by Lewis Baker on structured concurrency. This is using coroutines, not jthread.

C++ RAII doesn’t support async scope exit, so they go the route of making all concurrency operations lazy. There is hope that RAII will be extended to support an async destructor in the future.