Structured concurrency in golang

After seeing the below I can’t help but wonder, are the go language maintainers following along? Anyone know?

Susan Potter (‪@SusanPotter‬)

5/17/19, 23:42

I can’t help myself… “We studied six popular Go software [projects] including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems.” songlh.github.io/paper/go-study…

1 Like

There’s some discussion of Go + structured concurrency in this thread; see in particular Ian Lance Taylor’s comment here. I don’t know of any discussion beyond that though.

1 Like

Also, I think it would be super cool if someone went through the bugs in that paper to check how many would be prevented by structured concurrency. Obviously some of us have an intuition that it might be a lot of them, but you don’t know until you check!

1 Like

I just read/skimmed the paper. Informally I’d guesstimate that around 2/3 to 3/4 of the bugs they examined could have been avoided. To be more accurate, somebody with better Go knowledge than me would have to classify the individual patches manually. While the categories they sorted the bugs into do hint at whether structured concurrency would have prevented them, I sampled the “blocking/docker” sampels and found what looks like exceptions in both directions.

Expecting structured concurrency to have prevented 3/4 of the bugs seems optimistic to me. I haven’t looked at the raw data (is it even available?), but they tell us that ~1/4 of the bugs were classic race conditions, and I wouldn’t expect structured concurrency to have any direct effect on those. (I guess we could hypothesize about an indirect effect where structured concurrency makes things simpler overall so you have more brain power left over to spend on noticing race conditions, but that’s beyond the scope of this methodology.)

And @elizarov wrote a nice article here about a deadlock bug when using channels in a structured program – structure is nice but it’s not a panacea:

OTOH, all mistakes in using the WaitGroup API seem like they’d trivially be prevented. And more interestingly, the two examples in their paper that I had to stare at the longest to understand – Figs. 6 and 12 – both involve a standard module’s surprising use of background goroutines. In a structured language, you couldn’t have those APIs. And maybe you wouldn’t need them, since they both involve timeout/cancellation handling, and in structured languages we expect those to be handled by the runtime.

I just can’t tell how representative these examples are. Someone could email the authors to ask for access to the data or invite them to participate here, but I don’t have the time myself right now.

The github archive mentioned in the paper contains an exhaustive spreadsheet, plus patches to every bug they classified.

Yes, structure by itself is not a panacea, but OTOH structure would enable the runtime, or even static analysis, to more stringently check for partial no-progress conditions or lock priority violations.

Evaluating their list in that light would boil down to another research paper – something which I’d love to write but unfortunately have zero free time for.

The linked paper does not seem to account for survivorship bias. The bugs that they examined include only the subset of bugs that made it through initial testing and code review (before the commit was accepted into the corresponding project), so it’s not necessarily valid to use that analysis to try to reason about the defect rates of concurrency patterns in general.

Do note that there are libraries for structured concurrency in Go, such as golang.org/x/sync/errgroup. Whether it can or will be incorporated into the language or standard library remains to be seen.

(I gave a related talk at last year’s GopherCon: Rethinking Classical Concurrency Patterns.)

That’s totally true. I think for their purposes that’s fine, since they’re interested in applications like building tools to catch those bugs that would otherwise slip through.

For the purposes of assessing the structured concurrency idea, it’s more of a problem. And in fact, there’s an even worse problem: sorting through the bugs in that paper won’t tell us anything about all the bugs that would have been created if go was a structured-concurrency language, but weren’t because it’s not.

Really the only valid approach would be to go back in time, create two parallel universes that are identical except for how go was designed, and then compare defect rates in Kubernetes etc across these two universes.

Since we can’t do that experiment, I think it’s still interesting to learn what we can from studies like this. But you’re right that we have to be very careful with the interpretation.

Also there’s sync.WaitGroup in the standard library that is specifically designed to group all goroutines of a task together and control their termination (their “joining”).