When are 'callbacks' acceptable?

From reading Nathaniel’s blog posts, I understand that avoiding callbacks is a central concept of structured concurrency. The Trio design docs back this up:

Task spawning is always explicit. No callbacks, no implicit concurrency, no futures/deferreds/promises/other APIs that involve callbacks.

However, there are APIs in trio which use what I would call callbacks - functions passed in which will be called when something happens. Specifically, serve_tcp() takes a handler function to be called for a new incoming connection.

I understand that this doesn’t break structured concurrency: the new tasks are either children of the task running serve_tcp, or belong to a nursery explicitly passed in. And the docs have a warning that uncaught errors in handlers will crash the server. But is there a good explanation for why callbacks are a sensible choice in this specific case? Or do other people not consider the handler argument a callback?

For context: I’m playing with some code using Trio, and a contributor likes a design where each incoming message triggers a callback in a newly started task. I’ve got a feeling that this is best avoided, even if it’s technically possible with Trio. But I can’t articulate why exactly it’s different from serve_tcp’s callbacks on new connections.

4 Likes

I’m not familiar with serve_tcp() specifically, but it’s likely similar to the trio-websocket API where you supply a handler for new connections.

Certainly we’re not saying that functions should never be passed into API’s-- witness nursery start_soon() etc.

The type of callbacks synonymous with “callback hell”, and which are addressed by async/await, are specifically when you request an action and are supplying a function that will be called when the action is done/ready. That’s quite different than handlers meant to manage events whose origin are truly asynchronous, and may require a context that may live indefinitely (e.g. a connection handler).

each incoming message triggers a callback

If the messages need to be consumed sequentially, a more typical pattern is for the API to provide an async generator.

You could use an async generator for new connections or unrelated messages too, but then you have to jump through hoops to ensure that simultaneous connections / messages are processed concurrently.

2 Likes

I guess I was thinking: is there a fundamental difference between “call this when new data arrives” and “call this when a new connection is opened”, which makes the latter OK?

I’m currently thinking that callbacks imply spawning new tasks, so in scenarios where you would start a new task anyway, an API that accepts a callback is a harmless convenience. You’d often want to start a task for each connection even if the API made you write the accept/start loop by hand, so things like serve_tcp are just a shortcut. But you wouldn’t generally start a new task for each message received.

This still isn’t a totally satisfactory answer, though. If you’re used to callback-style APIs, I think it would be easy to reflexively say “yes, we want a new task for each X” - e.g. if handling an incoming message involves async operations, and you want to process more than one at once, it would be easy to reach for the hammer of ‘start a task on each message received’, even if that’s not actually the best design.

@takluyver Just i have seen this incredible. You are genius which you are asking interesting question. :slightly_smiling_face: