Hi! I have a question which, while not strictly about structured concurrency, feels like it should fit this forum.
Context: I am working on an “incremental compiler”, rust-analyzer, most of which is utterly boring, concurrently-speaking. There are several bits of what feels like should be fizz-buzz level concurrency though:
- The interface is JSON-RPC with a single client.
- The compiler is CPU-bound, so there’s a threadpool for doing parallel processing of requests.
- We watch the file-system for changes.
- Finally, sometimes we launch
cargo
process and process its stdout in a streaming fashion.
When implementing those things, I’ve immediately hit a problem which felt disproporionaly tricky to solve. And it seems like even the folks whose job is to design concurrency struggle with this problem (, @elizarov)!
I organize concurrency using “worker” pattern: simplistic, hand-coded actors. Each worker, like a virtual file system, runs a loop where it accepts requests from a inbox and sends replies back:
fn run(self, inbox: Receiver<Request>, sender: Sender<Response>)
while let Some(message) = inbox.recv() {
let response = self.process(message);
sender.send(response);
}
}
Workers form a hierarchy (with two level depth only ), when the parent exits, it makes sure to wait for the child worker (yay structured concurrency!).
And here is the problem: the parent and the worker are connected with a pair of channels, requests flow in, responses flow out. How should I size the channels to avoid deadlocks?
I though that a good rule of thumb is to make channels synchronous (unbuffered), but this immediately leads to a deadlock:
fn parent_run(inbox: Receiver<ParentRequest>) {
let (child_sender, child_receiver) = spawn_child();
loop {
let event = select! {
recv(inbox) -> it => it.ok().map(Event::ParentRequest),
recv(child_receiver) -> it => Some(Event::ChildResponse),
};
match event {
Event::ParentRequest(request) => {
let child_request = ...;
child_sender.send(child_request); // blocked on child
}
Event::ChildResponse(response) => {
...
}
}
}
}
fn child_run(inbox: Receiver<ChildRequest>, sender: Sender<ChildResponse>) {
while let Some(message) = inbox.recv() {
let response = process(message);
sender.send(response); // blocked on parent
}
}
The communication structure contains a cycle, and if both parties do .send
without select, they deadlock each other.
The pragmatic solution I am using is to just make all the channels unbounded (note that fixed-capacity bound is the worst here, as you’d get deadlocked sometimes (you might guess how I’ve figured that out)). I think is typical for actor systems? But this obviously removes automated backpressure handling and makes the latency worse. I don’t care about those things in rust-analyzer (yet?), but I am curious if there are better solutions here? And, also, if actors use unbounded mailboxes, how do they implement backpressure?
Here’s some proper (but easy to get wrong) solutions I think would work.
First, when the workers form a strict hierarchy, it should be possible to make forward-edge channels synchronous and keep only back-edges unbounded. If you add biased select to the mix (ie, parent’s select should always prefer child responses over grandparent requests), you’d even have backpressure, provided that each request returns bounded number of responses. The problem here is that you need to decide for each channel between 0 and ∞ buffer, and that seems like an easy off by one error.
Another problem is that the amount of responses is not necessary bounded, so you might have backpressure problem still. A good example here is VFS actor in rust-analyzer. The request to VFS is “please, now watch this set of globs”. The responses are “this file path has changed”. If we bias towards processing responses, we might end up in a situation where child VFS worker floods us with “this path changed” events, while we have “please, now watch the empty set of globs already” request in the pipeline.
Second, I think you can break the cycle if there’s a select!
which covers both direction. Ie, if at least one of parent or child doesn’t do a bare .send
, and always select!
s between send and recv, than the deadlock can be avoided. The first problem with this solution is that it just puts the buffer elsewhere. If you are a worker which cannot send a response, you’d have to accumulate requests in some local variable until the parent reads your response. The second problem is that I prefer giving workers opaque callbacks, instead of channels, for sending responses. That way I don’t depend on a particular channel library and, even if the worker uses channels internally, it stays implementation detail. But the difference between a callback and a channel is exactly that callback is not selectable. It can only block, it can’t signal readiness.
In general, it seems like I want a weaker programming model, where instead of select!
which works for sends and receive, I want only a merge
operation, which works only for receives. I wonder if putting senders into the select!
block breaks unidirectional dataflow pattern…
To sum this overly long post up, here is the list of specific questions I am interested in:
- If actor systems use unbounded channels, how do they handle backpressure?
- Is there a way to avoiding deadlocks in a hierarchy of workers, which is as simple as “always use unbounded channels”, but exerts backpressure and otherwise places the limits on the usage of resources?
- Is this an XY problem? Should I use something else instead of focusing on actor-ish workers?