Structured concurrency in Rust

Java’s Thread.interrupt is an interesting case study, yeah.

At a high-level, there are only two states that a cancellation API ecosystem can be in: either almost everyone writes cancel-safe code, or almost no-one writes cancel-safe code. This is because cancel-safety is an AND property – a complex routine can only be cancel-safe if all the subroutines it uses are also cancel-safe. So if any measurable fraction of the libraries in your ecosystem aren’t cancel-safe, then in complex programs you have to assume that cancellation just isn’t safe to use at all. And that means that there’s no motivation to worry about making your code cancel-safe either. So none of your users can use the cancel system either… and the spiral continues.

To avoid this, it’s not enough for any specific person to try to write cancel-safe code; you need to design things so that for 99% of devs, the perceived benefits of writing cancel-safe code outweigh the perceived costs.

And to make things worse, writing cancel-safe code is just intrinsically, unavoidably harder than writing non-cancel-safe code. It’s something that you have to be thinking about in the back of your mind, all the time. You can’t avoid this with clever API design; It’s just inherent in the idea. For example, think of some code that takes a lock, manipulates the locked data, and then releases the lock. If it gets cancelled in the middle, while the data is in an inconsistent state, what happens? If the programmer doesn’t make sure to unwind the inconsistent state correctly, then everything will be screwed up.

(Side note: This is also why Rust mutexes are poisoned on panic. I don’t think Rust’s mutexes are currently poisoned if a Future is dropped while holding a lock. That seems like a dangerous choice to me, given that Future cancellation currently has the same control flow as panic.)

So if you’re trying to design a cancellation system, this is a pretty dire problem. Obviously part of the solution is to make it as painless as possible to write cancel-safe code – but we can’t expect to eliminate the pain entirely. There are always going to be perceived costs. And we can try to force people to support cancellation, basically increasing the pain of not handling it right, but this doesn’t go very far. Our only hope is to somehow provide one heck of a carrot, so that everyone can feel the value they’re getting from the cancellation system – not just us concurrency nerds.

This is where Thread.interrupt falls down. Cancelling a thread is a fairly rare requirement. It’s a real requirement; when you need it, you really need it! But there are also a lot of people who have never needed it, or who just aren’t experienced enough yet to realize why it’s important. And since they don’t feel the problem themselves, they aren’t motivated to understand the solution or spend energy handling it correctly. Yet, Java forces them to do something, because InterruptedError is a checked exception. So, you get the half-assed code you mentioned that just catches and discards it, and then this starts the spiral I talked about above, where if interruption doesn’t work reliably, they why should anyone else bother supporting it either…

I think there is one carrot that might work: timeouts. Everyone who does any kind of networking has to handle timeouts. They’re ubiquitous, and in most APIs trying to manage them correctly is a constant source of frustration. So if you can tell people hey, our cancellation system makes timeout handling easy and frictionless, then that’s really compelling. But, Java missed this opportunity: Thread.interrupt is totally useless for timeouts! Even APIs like HttpClient that do support Thread.interrupt have a totally different method for managing timeouts.

Cancel tokens have the opposite problem: they provide a clean solution to the timeout problem, so they have a lot of perceived value, but there’s too much perceived drudgery in passing them around everywhere and integrating them into different blocking constructs, so programmers rebel. And then you end up with situations like C# or Go, where they have this lovely cancel token system but even their own socket libraries don’t support it, and here comes our spiral. Or in table form:

Design approach Perceived drudgery Perceived value
Thread.interrupt Low Low
Cancel tokens High High
Cancel scopes Low High
2 Likes