Concurrency and trio as implmentation detail in blocking/sync method

Hi there,

I’m not quite sure here is the right place to ask so let me know if this is preferable as a github issue or anywhere else.

I’m thinking I might be approaching my problem the wrong way, or misunderstanding some things.

I’m implementing a sync method on an object. It has to be sync as it is part of a protocol that other objects already implement and other already use, think python __getitem__, you can’t async def __getitem__.

Within these methods I could potentially use limited internal concurrency, so I’m tempted to define a private async equivalent method, and have the sync one call trio.run(self._async_impl).

Now my concern is that I want this to be an implementation detail – which it is not as the main application that use to get method may itself run in trio, and you are not supposed to nest trio.run()

in main run function the initial comment says:

    # It wouldn't be *hard* to support nested calls to run(), but I can't
    # think of a single good reason for it, so let's be conservative for
    # now:

It is indeed relatively easy, I can use a context manager to save restore GLOBAL_RUN_CONTEXT, and can now run a nested trio.run() in this context manager.

I have the impression that most talks about “nested” event loops are about keeping concurrency, and not having inner loops blocking outer one, but in my case I’m fine about blocking.

Am I missing something ? Would that be a valid case for making run() or whatever alternative re-entrant with the understanding that only the tasks in deepest run() can be processed at the same time ? What are implication on interrupt and cancellations I’m not seeing ?

So, yeah, this is definitely an awkward situation. I don’t have a nice argument or conclusion but I can dump some thoughts anyway :slight_smile:

Currently, trio.run and asyncio.run both have this limitation that they refuse to do nesting.

As far as I can think of, for Trio making it work technically really would be as simple as saving/restoring GLOBAL_RUN_CONTEXT. But I’m skeptical that blocking the outer event loop while doing this is really OK in the long run… if you have an outer loop, and that outer loop is invoking some sync protocol like __getitem__, and that sync protocol has enough internal complexity and IO waits that it makes sense to spin up entire concurrent event loop, then wouldn’t it make more sense for the outer loop invoke that protocol in a thread? And you can nest Trio event loops that way, e.g. trio.run(trio.to_thread.run_sync, trio.run, ...)

There’s also a very relevant thread here, though it’s mixed with discussion about “reentrancy”, which is a slightly different feature that a lot of people really want, but is much harder to support: https://mail.python.org/pipermail/async-sig/2019-March/thread.html

(Though one of the concerns about allowing nested loops is that folks will think that they’re doing something reentrant, and get confused.)

@oremanj also wrote this recently, which is some kind of argument for something:

It’s kind of a more-reasonable version of reentrancy, and you could imagine writing sync-flavored code that detects whether a loop is running and then either uses greenback to reenter it or starts a new loop, as appropriate. The problem though is that greenback requires that you do some setup to enable it while you’re still in async-flavored code, so I don’t know if that’s appropriate for your situation.

Thanks for the reply, and the useful links.

then wouldn’t it make more sense for the outer loop invoke that protocol in a thread?

I think some of the same concerns are given in the thread you point on the Python mailing list. And the “Just do it in a thread” get thrown around a lot, but in many cases you can’t / don’t want to run in a thread. In IPython/Jupyter we avoid thread as much as possible as user code sometime really does not accept to not run outside the main thread (open cv).

Even if I run my subloop in a thread to have local concurrency it does not really help with the main thread as I do want to emulate a blocking call, so my main thread is just waiting to join immediately and wait. Now if I run in your trio.to-thread_run I have the overhead of a thread AND a loop, so it’s strictly worse. If I just run in a thread w/o blocking the main loop, then I loose any guarantee of no code mutating values between async calls as well (if I understand correctly). So I either: loose any guaranty of async, or pay the extra cost of a thread.

Yes the re-entrancy is way more complicated, and I’m of course not talking about re-entracy, but if you are afraid that users will be confused we can call that something completely different like trio_blocking_subloop, I also would not expect that to be a in the top level namespace as it is really meant for internal implementations anyway (but yes I know you can’t trust users).

A sync protocol does not need to have that much complexity to need a subloop, it may just try to get many resources asynchronously that each have a non negligible latency, and I’m thinking it could be possible to not completely spin up a new loop, and reuse most of the infrastructure already in place; just saving/restoring fields of the GLOBAL_RUN_CONTEXT , attached is what I currently have, but I’m also concern about the propagation of Timeout/Keyboard interrupts. In the gist is also a class that automatically generate a sync version of all the async method on the class.

And anyway the outside loop if there is one is going be blocked regardless of whether the internal implementation if block or not, and yes it would be good to have eveything async all the way down, but gradual-syncifcation from the top and the botom woudl be great.

I feel like your reply is referencing a lot of context about your specific situation, and I don’t have that context so it’s hard to figure out what to say.

Let’s back up: we both know that in general, doing blocking operations inside an async loop is fundamentally-kinda-broken, because it causes other random unrelated tasks to freeze, and that makes your system more fragile. But there’s something specific about your setup that you think makes that OK and even preferable. What is that? What are you actually trying to do? Without knowing what you’re trying to do, it’s like you’re asking “what’s the right way to do the wrong thing?”, and it’s hard to give useful advice if we’re starting from the assumption that I can’t tell what you actually need :slight_smile: So far all I know is that you’re implementing something that’s like-__getitem__-but-not and for unspecified reasons you think threads might not be appropriate…

BTW, I don’t actually know if this is relevant or not, but from my vague understanding of the kinds of things you’ve been worrying about over in Jupyter/IPython-land, I think you might find this new feature interesting (just landed in master, not quite released yet): Introspecting and extending Trio with trio.lowlevel — Trio 0.23.1+dev documentation

If we did do nested run loops, then timeouts wouldn’t propagate at all – trio want to cancel the task that’s block inside the nested trio.run, but it wouldn’t be able to do anything, both because sync functions aren’t cancellable and also because the outer trio.run can’t process timeouts at all if you’re not letting the scheduler run.

KeyboardInterrupt is a bit trickier. With your current hack, it’s definitely possible that KeyboardInterrupt might get delayed and not noticed until after the nested run finishes. It wouldn’t be too hard to fix this though. (Basically we’d need to teach the nested run to register its SIGINT handler when it starts – right now it skips that if it detects that you’ve registered a non-default SIGINT handler, and it’s not smart enough to realize that the outer Trio’s SIGINT handler is “close enough” to the default SIGINT handler that it could be temporarily overridden.)

(New users can only post 2 links, so some links at the end)

For context and sorry I’m likely going to repeat myself a few time.

  1. I avoided to give precise details to better understand things, and pushing my understanding of the global async situation, also I’m curious.
  2. I’m working on Zarr, and in particular on zarr spec v3 (a) to be friendlier to network stores. As part of writing the spec, I’m [writing a draft of what an API may look like (b) , in particular a shim/adapter v2<->v3 API which is synchronous as v2 is synchronous. The way spec v3 is made there is potentially a number of places where async would be great internally (fetch concurrently metadata and data, join and decode once you get both, or do many request and return the first succeding), and basically I need each method internally asynchronous, but exposed to the user as a synchronous. Folks are starting to wonder about async (c) ans using it in their top level code.
  3. I think there is probably something monad-ish things here and I feel like i actually want a function which is sync in sync context and async in async context for the implementation but that’s not the subject of this discussion

I could likely use threads for this specific implementation but I’m trying to avoid, that’s also part of me just me being me and seeing how far I can get (hey that’s how we got an async REPL right ?).

So I ended making my api async and show the user a sync version using trio, as I know that mostly if it’s running from within Jupyter then I likely can do a trio.run without issue, but I would also really like to completely run IPython under an event loop like trio, so then wouldn’t be able to nest. (we do have some use case of nesting IPython.embed as well, but then we use the pseudo-sync loop that just advance coroutines)

So that’s where I’m coming from.

Yes, I agree in a perfect world, all the layers are async-aware, it’s nto aways the case.

The current API I want to mock is sync, and I don’t want to introduce potentially task switching behavior where there was none before. Now I’m happy to encourage users of the new API to use async in the future, but can’t really tell them now it’s async and its their only choice, plus some of the abstraction are fundamentally sync like __setitem__, the end user code will really looks like high level numpy:

import zarr
import numpy as np
z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
z[0, :] = np.arange(10000)
z[:, 0] = np.arange(10000)

And here __setitem__ really can’t be async on the user facing side and it seem it will never be, though internally could be async based. That kind of code may also likely be in libraries like pandas, matplotlib … seaborn that god know what kind of expectations they have around the synchronicity of set/get items and global state if we allow re-entracy. If I do a concurrent “fetch data and plot it with plt.plot()”, and now in the middle of matplotlib plotting routine it switches plots because data[slice()] is actually a coroutine, i’m not too sure what the result would be.

Unless you think it would be ok if anything the looks a numpy getattr/setattr would be ok to introduce a task switch.

Hope that clarify where I’m coming from.

(And also missing in person conversation with you when we were at bids, hope you are doing well)

a) https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html
b) GitHub - Carreau/zarr-spec-v3-impl
c) async in zarr · Issue #536 · zarr-developers/zarr-python · GitHub

(I made your links live and manually bumped you out of the “new user” purgatory so it shouldn’t hassle you anymore)

Oof, sorry for the slow reply here. Mostly because this is a really hard problem and I wasn’t sure what to say…

So, I totally get how for your use case you want __setitem__ to be sync, both for compatibility with your old API and to work around Python’s syntactic limitations. And It makes sense that you’d want concurrency inside something like zarr. But idk if there’s any great way to fit the pieces together…

The thing is, event loops are kind of expensive to start up, and their semantics are all built around the idea that they’ll keep running continuously. And concurrency also has overhead. So it doesn’t make a ton of sense to start a new event loop on “small” operations that complete in less than, say, 1 ms. But OTOH if it’s a “large” operation and the user is using an async library of their own, then they need “large” operations to be async, to avoid blocking their event loop!

I hate to say it, but I wonder if this is one of those rare cases where a background thread is actually the best option. The idea would be that you spin up a dedicated thread that hosts your event loop, and synchronous operations like __setitem__ submit requests to that thread to execute async/concurrent code on demand. E.g. in trio, this would be trio.from_thread.run (“from thread” because you’re entering trio from another thread.) That way, your event loop is happy because it runs continuously (and you can do any relevant background management stuff there, like e.g. keepalives or cluster discovery), and your sync-colored APIs are usable from any context, whether that’s a good idea or not :slight_smile:

I happened to think of a solution of sorts for this idea. Trio runs end up being composable if you are willing to start up a new thread for each one. In my specific code the requirement was for a sync generator, rather than a sync method on a class, but it has a lot of the flavor of @njs post above, starting a single thread and trio run to reuse across many iterations:

The key part is reproduced below:

def map_concurrently_in_subthread_trio(func, items, args=(), kwargs={}):
    trio_outcome = None
    sentinel = object()
    result_queue = queue.SimpleQueue()

    def trio_main():
        trio.run(amain, result_queue, func, items, args, kwargs)

    def deliver(result):
        nonlocal trio_outcome
        trio_outcome = result
        result_queue.put(sentinel)

    trio.lowlevel.start_thread_soon(trio_main, deliver)
    while (result := result_queue.get()) is not sentinel:
        yield result

    return trio_outcome.unwrap()

It is able to perform arbitrary concurrency and unwrapping the trio outcome in this way brings any exceptions into the main thread conveniently. A missing piece is what to do when the generator is garbage collected or gets an exception from throw but that seems tractable too.