Thread locals and dynamic scoping

Thanks for the initiative @sustrik! It is really convenient to have an umbrella portal opening to (hopefully) all the structured concurrency related discussions and developments in different languages.

While I was thinking about how shall I summarize the state in OpenJDK, @alanb took the lead – of which I am thankful, since he is the one who is co-leading the entire engineering effort in Java front.

A certain subject that I personally find significantly relevant but missing in structured concurrency discussions are the handling of variable scopes. For decades, Java and many other languages provided means for variable declarations in thread-local states. Some used it for avoiding expensive synchronization issues (e.g., ThreadLocalRandom), some used it to deliver an implicit attachment through the processing of a certain task (e.g., logger MDCs, authentication information), and some used it to associate certain parts of a process with a particular CPU core. As you might guess, this list is far from complete. And I am not gonna rant about whether these usages were justified or not. Because these patterns and how to deal with it in the context of fibers have already been discussed by experts in the field. In this context, I really favor dynamic scoping as in Common Lisp as a perfect fit for structured concurrency. Because dynamic scoping can be leveraged to provide structured variable scoping by confining the exposure of the variable assignments into the parent fiber scope. What do you people think about the variable scopes from a structured concurrency point of view?

Can you give a concrete example of how would it look like?

In the early days Trio had a concept of “task local variables”, that was basically just a dict attached to each task. When you spawned a new task, the child’s dict was initialized to be a shallow copy of the parent’s dict. The user level API was modelled after Python’s API for thread-local variables.

More recently, Python gained a built-in concept of “context variables” (spec, interpreter docs, trio docs). The motivation for moving this into the interpreter itself was that there are libraries that currently use thread-local state, and they were doing very weird things when mixed with user-space scheduling. For example, numpy doesn’t know anything about networking or concurrency or Trio or asyncio, but it lets you set how IEEE 754 errors like overflow/divide-by-zero/etc. should be handled. And this is supposed to be thread-local, but the settings were leaking between different Trio tasks / asyncio callbacks / etc.

So one important constraint was that the new thing had to be about as fast as thread-locals to access (because target users like numpy are extremely speed-sensitive), and it had to support all the operations that classic thread locals supported, so that we could port these legacy APIs over. In particular, this meant that in addition to a dynamic-scoping-style API like with variable_handle.temporarily_set_to(new_value): ..., we also needed a traditional-style API like variable_handle.set(new_value).

And if you have a pure setting API like this, you have to define the scope of where the new value is visible. I think this is the biggest choice to make: do you only allow creating new dynamic bindings at the level of the current scope, or do you allow updating existing bindings, and if the latter, then were are the bindings created?

For Python we basically kept the concept of task-local variables, that are inherited from the spawning task as a shallow copy. If you’re using the with based API you can’t really tell (it sets the new value on entry, then restore the old value on exit). But if you use the set API, then that sets the value in the current task.

Also, with Python context vars, we can’t quite emulate dynamic scoping, because of a wrinkle in how Python generators and context vars interact. The issue is, what happens if you set a context var inside a generator, and then suspend the generator. With a real dynamic scope, then the setting should apply inside the generator, then go away when the generator is suspended, then come back again when the generator is resumed… this requires a significantly more complicated implementation, with the interpreter runtime maintaining a stack of values that are automatically pushed/popped as generators are entered/exited.

Also, it turns out there are some cases in Python where you really want the generator body to be treated as if it was part of the surrounding scope – in particular when defining new context managers. Like, take numpy.errstate, that I linked to above. Maybe I want to define a new abstraction based on errstate, like:

# shorthand for 'with errstate(overflow="error")'
with error_on_overflow():
    ...

The standard way to define error_on_overflow would be:

@contextmanager
def error_on_overflow():
    with errstate(overflow="error"):
        yield

So what’s going on here is that we define error_on_overflow as a generator (!), and then the @contextmanager decorator wraps that generator into an object that implements the context manager protocol that with uses. When I write with error_on_overflow(): ..., it creates the generator, steps it once to get to the yield, then executes the body of the with statement, then steps the generator again to do any cleanup. So this is a really different use case for generators than the classic one of defining an iterator. And this is what makes dynamic scoping tricky: for the iterator use case, dynamic scoping should isolate the body of the generator from the caller. But in the context manager use case, the entire purpose is to change the value of a context variable in the caller’s context!

Sorry, I hope that summary made sense – it’s pretty complicated :slight_smile:

Anyway we did come up with some schemes for how we could make the full dynamic-scoping-style version work (PEP 550, PEP 568), but Guido was very skeptical about whether it was worth the complexity, and pushed us to leave it out. So for now, context var settings can leak out of generators.

I’m sure a lot of the considerations here are specific to Python. But I’m guessing you have similar issues around backwards compatibility, and I guess you’ll want to think hard about all the ways your new delimited continuations might be used beyond just to implement lightweight-threads, and how they should interact with scoped variables.

2 Likes

Oh, one other tiny technical note that you might find interesting: In Trio, new tasks actually inherit their task-locals from the task that called start_soon. In 99% of cases this means they get the same task locals as were in effect at the point where the nursery was created. But in principle they can diverge, if the nursery gets passed into another task, or just if the task local gets mutated after the nursery is created. We could have implemented it either way, but this seemed like the option that was less likely to surprise people.

Example:

with numpy.errstate(overflow="error"):
    async with trio.open_nursery() as nursery:
        nursery.start_soon(f)  # This task has overflow="error"
        with numpy.errstate(overflow="ignore"):
            nursery.start_soon(g)  # This task has overflow="ignore"
        nursery.start_soon(h)  # This task has overflow="error"

So as it works out, there isn’t a direct 1-1 connection between the structured concurrency task tree, and the way task locals are inherited.

However, there is a pretty strong connection in practice. In particular, if I do:

with numpy.errstate(overflow="error"):
    nursery.start_soon(f)

And I don’t pass in a nursery object to f, and f doesn’t mutate the errstate, then that guarantees that f and all of its descendents will have the overflow="error" set. So this handles one of the major use cases for task-locals, of setting some kind of tag for logging purposes.

2 Likes

Hi everyone!

I’m interested in the issue of how tasks use dynamic scoping because I designed Juila’s current logging system to store the logger (defined as the user-configurable type which filters, maps or sinks log events) in Task-local state. Essentially this makes the logger dynamically scoped, and is very interesting from a composability point of view. I also made the logger inherit from the parent Task because that seemed appropriate; again, from a composability point of view.

In Julia we don’t yet have structured concurrency which means that Julia Tasks may outlive their parent function call and this is an inconsistency which has led to confusion for users of the logging system. I hope that will change though; I was super pleased to see that @StefanKarpinski has signed up here (Hi Stefan :slight_smile: ).

I’d say we’re still in the process of seeing how the Julia logging system design will pan out and whether it’s better than just having the equivalent of MDCs, but I feel that signs are promising so far.

(PS: You mentioned Common Lisp, so you might find it amusing that this dynamically scoped design came about partly because the Julia parser is written in a Scheme-like lisp which has a with-bindings macro to set variables in dynamic scope. Not having used any type of lisp before, this was my introduction to the idea of dynamic scoping in general.)

1 Like