Graceful Shutdown

Sorry to jump in, but I had some thoughts.

If B is not GS aware, what is B’s expected behavior when C exits (without an exception)?

  • B doesn’t know C exited. This is fine since HC will clean up B anyway.
  • B exits as soon as all of its children exit. B eventually exits which is what we wanted.
  • B detects that C exited and decides to spawn C again. C gets no work because hopefully we’ve stopped accepting new work in B’s ancestors. Both B and C get HC and exit.
  • B detects C exited and treats it like an error state, raising an exception or exiting with an error. This doesn’t seem like a resilient program unless that error is handled before it brings down the entire process. Hopefully the program author sees this crash in practice and adds appropriate handling in an ancestor during GS? I guess this is more of an escalation from GS to HC while the stack unwinds up to some parent, potentially hurting GS across the tree.

The last case is problematic for GS. It sucks that a single leaf in the tree could cause HC behavior across many other nodes, but that’s the point of structured concurrency - at least they exit and have an opportunity to clean up some resources.

Honestly the last case feels like a bug in B since the absence of an error from C on exit probably shouldn’t propagate out as an error - the author of B doesn’t really understand the contract with C.

If GS is an application construct, is has to be passable to every layer of your program. Every library needs to implement support for it and they all need to follow some standard conventions (or you need to add extra layers to convert between them). If D does not follow that convention, E will never get the signal and will get HC.

I like this suggestion:

I would expect every checkpoint within the graceful cancel scope to act as though the scope was cancelled. GracefulCancelScopes created after the parent cancel scope has entered graceful cancel will immediately raise a cancellation on the first checkpoint.

It would be nice if every cancel scope could have a graceful_cancel() method that would propagate like regular cancel() but only be raised within GracefulCancelScopes. That way you can control what parts of your program are gracefully shutting down.

Or maybe every cancel scope has a graceful property that can be set to True. If you are in a scope where graceful = True (which is also inherited from its parent scope), any checkpoint you hit will raise a cancel.

async def producer(queue):
    try:
        while True:
            with trio.GracefulCancelScope() as cancel_scope:
                # or set cancel_scope.graceful = True
                message = await conn.recv()
            if cancel_scope.cancel_caught:
                break
            await queue.put(message)  # don't want to cancel this on GS
    finally:
        # this will succeed on GS but will raise on HC
        await queue.put(STOP_SENTINEL)
        await conn.send_close()  

async def consumer(queue):
    while True:
        item = await queue.get()
        if item is STOP_SENTINEL:
            await queue.put(STOP_SENTINEL)  # make sure the other consumers get the message
            break
        # do stuff with item 

async def server(queue):
    # A case where you want to make sure the consumers drain the queue on GS
    with trio.open_nursery() as nursery:
        nusery.start_soon(producer, queue)
        for i in range(5):
            nusery.start_soon(consumer, queue)

You should also be able to set graceful = False to disable this behavior. It is a little like shield except that you can opt in again later.

async def first():
    with trio.GracefulCancelScope():
        # any checkpoint in second will cause a cancellation on graceful_cancel()
        await second()

async def second():
    with trio.CancelScope() as cancel_scope:
        # if this does not shield from the graceful scope above
        # then any checkpoint in third() will cause a cancellation on graceful_cancel()
        # use cancel_scope.graceful = False to disable
        await third()

async def third():
    # since this is graceful cancel aware,
    # it should be explicit about blocking graceful cancel
    with trio.CancelScope(graceful=False):
        with trio.GracefulCancelScope():
            trio.sleep(10)  # simulate a good graceful stopping point
        trio.sleep(100)  # simulate work you don't want to gracefully stop

In the interest of keeping things simple I think new cancel scopes should inherit the graceful bit. To intentionally handle graceful cancel within your code, you should block at the entrypoint when entering code that supports graceful cancel. From above, server becomes:

async def server(queue):
    # A case where you want to make sure the consumers drain the queue on GS
    with trio.open_nursery() as nursery:
        nursery.cancel_scope.graceful = False  # protect children since this is GS aware
        nusery.start_soon(producer, queue)
        for i in range(5):
            nusery.start_soon(consumer, queue)

# calls of server
async def worker():
    with trio.GracefulCancelScope():
        # if the nursery cancel scope within server does not disable the graceful bit
        # the consumers would just die instead of gracefully exiting
        await server(queue)  

async def main():
    with trio.open_nursery() as nursery:
        nursery.start_soon(worker)
        trio.sleep(10)
        nursery.graceful_cancel(10)

This keeps things simple for the runtime and libraries. If your code has no concept of graceful cancellation, the cancellation scope above you will determine if you are cancelled on graceful cancellation or not.

# if conn.recv() makes a new cancel scope, it must inherit the graceful bit
# otherwise any caller can't opt in to graceful cancellation when calling this coroutine
async def recv(self):
    with trio.move_on_after(TIMEOUT):
        # read from somewhere

In this way, the graceful bit is different from shield in that a graceful cancel scope should be cancelled if any scope above it calls graceful_cancel, even if there is a graceful = False scope in between.