Should a non-graceful close be instant or thread/fiber-blocking?

#1

That question is in my head for some time, and since I’m implementing a new async API it got recent again.

Let’s say I have a typical stream-based interface:

interface Stream {
    async fn read(buffer: Buffer)
    async fn write(buffer: Buffer)
    [async] fn close()
}

Those kinds of interfaces are in general expected to be thread-safe. Or at least: One thread may read at a time, and one thread may write at a time, and close is always possible. close is expected to close the stream and lead to an error on outstanding operations.

Now the question is: Should close be blocking and wait until the outstanding asynchronous operations have finished and released their resources? Or should it be more or less fire&forget, and be acceptable that those operations close later. The question is not only about graceful closes (where we e.g. might need to wait until buffers have been flushed), but also about forceful ones (which might still need to wait for some time, until all subtasks have finished).

I have the feeling the right thing to do from an API design perspective is either:

  • Let close block for at least until the time the other operations take to finish and release their resources. This might require an async/suspend/etc modifier or an attached callback in asynchronous environments.
  • Allow the use only to call close() when they have cancelled all other outstanding operations manually before. This is super-clean and easy to implement, but doesn’t play as nicely with common patterns where we interrupt a pending operation by closing the underlying resource from the side.

However the fire&forget implementation also has the appeal that it can always be called, independent of whether the caller is also an async function or not.

What’s the take from this community on this? I surveyed various libraries, and mostly found instant close methods. E.g. the ones in Kotlin is instant. Other ecosystems have an async and an instant method, e.g. .Net has Dispose() and DisposeAsync, node.js has .end() and .destroy(), and apparently also Trio also has .aClose() but also a synchronous .close() method on sockets.

The fire&forget implementation seems hard and really messy from a structured concurrency approach. One thing I wanted to implement is a wrapper around a stream which has internal buffering. It gets an internal stream, and a buffer from a buffer pool. On close, the stream needs to be closed, and the buffer needs to get released.

With the fire&forget implementation releasing the buffer is really hard: It might still be in use by any pending async operation, so it can’t be released until we know for sure those have finished. The workaround I had is that each async operation steals the buffer using atomic ops, performs it’s work, and when done (or cancelled) checks whether the actual stream had been closed or not and lazily releases the buffer. Which of course makes things less deterministic.

@elizarov would also be interested in your opinion on that, since you might be aware why the APIs in KotlinX went for a non-suspend close.

#2

The reason for non-suspending close is to support cancellation. Conceptually a “cancel” must be fire-and-forget operation – it must be prompt, ideally lock-free (or at least non-blocking). I don’t see how it complicates structured concurrency. On the contrary. In Kotlin the participants to structured concurrency are coroutines (their Jobs) – a parent coroutine always waits for its children to completed before completing itself. We don’t have to involve channels there in any way, since channels are only communication pipes between coroutines. We can always wait for the corresponding coroutine when there is a need to.

You write “Should close be blocking and wait until the outstanding asynchronous operations have finished and released their resources?”. But how does a channel knows when “an outstanding operation completes”? Somebody receiving from this channels does not mean anything. It does not mean operation is completion. Only by waiting on the corresponding coroutine you can ensure its job is complete.