Zio (Scala library)

Thanks for the link! I was vaguely aware of Zio but had not found the right reference to figure out how the big ideas fit together. Impressions from watching the video:

Well… there’s a tremendous amount of machinery used for static typing and the pure functional monad part (you don’t write a program, you write a program that returns a program, etc.). Of course this way of approaching things has benefits; I’m just not sure I followed all of it :-).

But if you squint past that, I think the core concurrency primitives are:

  • A very traditional go-like operator (they call it fork) that returns a “fiber handle”, and then you can use this handle object to join or interrupt the “fiber”. Conceptually I think the only way this really differs from a JS Promise is that it supports cancellation, and it’s pretty much isomorphic to a Twisted Deferred or asyncio Future (which do support cancellation), or even to a POSIX thread if we pretend POSIX thread cancellation is usable.

    • But they do go beyond these systems by thinking through cancellation in a more comprehensive way – operations can register handlers for how they handle a cancellation, you can shield specific bits of code from cancellation, etc. I saw a lot of parallels between the practical details of their cancellation implementation and the practical details of Trio’s cancellation implementation, which is reassuring that we’re both on the right track :-). Their public API is a bit inside-out from how Trio does it, but the end effect is very similar. (They tie cancellation to fibers, and then make putting an arbitrary computation into a fiber something that’s cheap and can be done retroactively; Trio makes cancellation of arbitrary computations a first-class concept, and then makes the fiber mechanism aware of it.)
  • And there are a bunch of nice higher-level tools built on top of the bare “fiber” concept, that carefully manage the fiber handles and interrupt things appropriately. For example race is implemented using the public fiber APIs, and manually keeps track of which fibers are running, handles errors from them, cancels the loser, etc. Having done all that, they end up with something whose semantics looked pretty safe and sensible to me. The talk makes a big deal about most other implementations of race not getting these details right, which sounds fair.

  • And then you have supervise, which lets you manually demarcate a chunk of code, and when you get to the end of that code, if there were any fibers that it created but then leaked, those fibers are automatically cancelled.

So… this is a really interesting example to me. They seem to be worrying about the right things, and I get the feeling that you can indeed use this system to effectively solve lots of real problems. But… the one thing they’re missing is exactly the stuff from the “go statement considered harmful” article. And if they had that, I think it would make their design substantially simpler and better behaved at the same.

There are two key differences between this approach and how Trio does it:

  • We reify an object to represent each call to supervise
  • Then we make this reified object a mandatory argument to fork

This seems like a pretty small change, but it does a lot!

It eliminates fiber leaks by making them inexpressible. Users can’t forget to call supervise.

It’s great that they were very clever and careful about implementing higher-level operations like race, but with my version, you don’t have to be clever and careful, because the type system forces you to handle the other fibers.

Since you can’t leak fibers, you get better local reasoning about code – if I invoke some kind of subroutine, and I don’t pass in a supervise object, then the type system guarantees that any fibers that the call spawns internally cannot outlive the call. I’ve previously called this property “respecting causality”, and it makes many common errors inexpressible. With their current design, you could get this by wrapping every subroutine invocation in a supervise, but who wants to do that?

And in terms of design complexity, you get all these benefits essentially “for free” – it’s just a very simple tweak to stuff they needed anyway, no new concepts, no rocket science. It’s so simple that in this respect, Trio actually manages gets stronger guarantees out of Python’s type system than ZIO gets out of Scala’s.

When people start learning Trio, this is a very common question: “how do I get a reference to the enclosing nursery?” They expect there to be some equivalent to ZIO’s fork, that lets then put a task into the enclosing supervise, where-ever it may be. So this is why we don’t have that operation :slight_smile:

On the other hand, an interesting thing about the ZIO style is that it’s probably easier to retrofit into existing libraries/ecosystems. (I think Kotlin works this way too?) I wonder if it would have applications in, for example, Golang.

Two more minor thoughts:

When their supervise block exits, they cancel all the nested fibers, similar to libdill. When Trio exits a nursery block, it stops and waits for all the nested tasks to complete. One trade-off is that this forces ZIO/libdill to have fiber handles, and a join operation to wait for a fiber to finish. Trio is able to get away with skipping both of those concepts. I can see how ZIO’s approach makes sense for them, since the emphasis on pure-functional-style means they really want to join fibers to find out what they evaluated too, and the emphasis on high-level combinators like race and par hides the tedium of joining from most users. Trio makes side-effects a more first-class citizen, so we use those as our primitive for getting results out of tasks, instead of join. Oh huh, and it looks like more recently ZIO added the Trio style as an option, too.

Error handling: ZIO’s way of handling and propagating errors is sufficiently foreign to me that I don’t really know how to compare it to Trio! I worry that their tracebacks might not be very useful? And I noticed that they implement the “crash handler” pattern, where you register some callback to handle errors from unjoined fibers. (Or all crashed fibers?) This pattern always makes me wonder how the crash handler preserves enough context to know how to handle an error. But I am curious how it all fits together.