Concurrency fuzzing

Async/await-style concurrency makes it harder to introduce deadlocks and concurrency-based bugs compared to thread-based concurrency. Nevertheless it is possible. I think it should be possible to systematically test for race conditions and deadlocks. Has such a system been implemented?

My suggestion would use hypothesis. Hypothesis is a “fuzzer”: it is designed to take a function, a description of its inputs, and some invariants and explore the space of permissible inputs in an attempt to find inputs that make the function violate its invariants. Hypothesis can also “shrink” these inputs in an attempt to find a minimal example that causes the failure.

In an async/await program, there are a finite number of places where a context switch can occur, and a finite number of possible places execution can go next. Trio chooses among the currently runnable coroutines at random to make it clear that there are no guarantees which gets run. With appropriate plumbing, hypothesis should be able to explore the possibilities in search of a problem.

In fact, hypothesis has a “stateful testing” system, in which the system under test can be set up, taken through a sequence of operations, and then checked for its invariants. As with data fuzzing, these sequences of operations can be shrunk to (attempt) produce a minimal example.

A hypothetical hypothesis concurrency fuzzer would take over the event loop. Each time control returns to the event loop, either one or more coroutines are ready to go, or all coroutines are awaiting something external. The simplest approach here would have hypothesis exploring the possible different selections of coroutine to execute next. It would then explore these sequences to see if it could trigger a deadlock or exception. A more sophisticated approach could also explore the orderings of the external events (which may well need to be mocked anyway) - the HTTP request could come back first, or the timeout could expire first.

This implementation (in Python) would not be trivial - it would need to include a custom event loop, and possibly mock implementations of I/O and other primitives at the other end of the async stack. But I don’t think it would necessarily require much knowledge of the internals of hypothesis: a concurrent execution tree is a stateful object, and the selection of actions at each stage is the set of coroutines that are currently ready to execute, as well as fudging any primitives whose readiness can be fudged. Clearly expressing the sequence of scheduling decisions should be doable, though not simple.

I am not aware of any system implementing this sort of concurrency fuzzing. Is anyone? Is it just a terrible idea?

1 Like

I’d love to see something like this in Trio, and there’s actually a bunch of notes and links in this old thread:

The biggest challenge is figuring out a good heuristic for exploring “interesting” schedules, because the space of valid schedules is so huge that you can easily waste all your time on running similar schedules over and over without finding the edge cases you’re looking for.

There are also a bunch of things that would make this more useful, like detecting when the test successfully finds a deadlock:

and having better testing shims for networking: