Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Real-World Concurrency Bugs in Go [pdf] (golangweekly.com)
165 points by oldgun on Mar 2, 2019 | hide | past | web | favorite | 70 comments

The two key takeaways for me are:

(1) "Contrary to the common belief that message passing is less error-prone, more blocking bugs in our studied Go applications are caused by wrong message passing than by wrong shared memory protection."

(2) "Shared memory synchronization operations are used more often than message passing, ..."

So basically, (1) people have more trouble writing correct multithreaded routines using message passing, and (2) in large production applications people tend to fall back to using shared-memory primitives rather than message-passing anyway.

It seems the intention of the Go language designers was to make a language that would be simple and easy for programmers to pick up. Given that most programmers are already accustomed to multithreaded programming using shared memory and that this is conceptually simpler, I think the language designers made a mistake by throwing channels, a relatively new and experimental approach (yes, I know CSP is from 1978, but I'm talking about widely-adopted industrial languages), into the language. I think it was the single biggest mistake in the design of Go.

”people have more trouble writing correct multithreaded routines using message passing”

The paper makes that claim for blocking bugs, but it also says ”Message passing causes less non-blocking bugs than shared memory synchronization”

Also, having more blocking bugs in message passing could be explained by users using message passing only for the harder cases. I don’t see that discussed in the paper, but ”Message passing […] was even used to fix bugs that are caused by wrong shared memory synchronization.” might hint at it.

Finally, programmers having less familiarity with message passing code might explain the difference in number of blocking bugs, rather than writing message passing code being more difficult.

How appropriate that the other thread covers this exact situation: https://news.ycombinator.com/item?id=19278839

Channels are appealing, but they're trickier than they seem. Especially when you're selecting on two channels waiting for one event to happen first but then both happen at the same time. Eventually you end up with special code in both cases to check whether the othe thing happened and try to stop it, and then what if you're late, ... The code is easy to write, but hard to make correct.

The one thing I would note is that most of the problems with channels seem to fall into the type where incorrect ordering causes a blocking channel operation that will never complete.

Channels are definitely hard things to get right in go consistently, and they can be a bit non-intuitive. They result in deadlocks/stalls and leaked goroutines. However what they don't result in is memory corruption issues -- none of the cases wind up reading or writing the wrong values. So in that sense I think you can say that concurrency fails with channels tend to be "safer" in that they will not cause operations to proceed with the wrong data. Which is one advantage over many cases of shared-memory concurrency bugs.

> most programmers are already accustomed to multithreaded programming using shared memory

Aren't most programmers JavaScript programmers accustomed to single threaded applications passing variables around?

I've been programming for 20 years, as a career, and I've never once used shared memory in the way that I think when I think "shared memory". Maybe some languages I've used do things internally with shared memory, I don't know.

My point here is to be aware that the people you work with and know aren't necessarily representative of "most programmers" even though it feels that way to each of us. We expose ourselves to what we know more than what we don't know, and that influences how we each see the world.

> My point here is to be aware that the people you work with and know aren't necessarily representative of "most programmers" even though it feels that way to each of us.

Definitely a good point and something we all should keep in mind.

With that being said:

> Aren't most programmers JavaScript programmers

I would say no. Outside of Silicon Valley, most programmers are employed working on internal enterprise applications using backend languages like Java, C#, Ruby, and Python. Shared memory is the standard way multithreaded programming is done is all these languages. It is also the standard way multithreaded programming is done in C and C++ and therefore most of the serious software out there. I think this definitely covers "most programmers," or at least most application developers, the subset of programmers most likely to try to learn Go.

You've never needed to share a cache (like a map of ID -> object) across threads? Or implement a queue for distributing work to threaded workers? Or maintain a shared rate limiter across threads, to prevent saturation of some API or other shared resource? Those are all shared memory examples.

Avoiding sharing memory for concurrent workers is rare, in my experience, unless the workers are completely isolated -- which they rarely are. You can move some things outside (for example, Redis as a cache), at the expense of performance or simplicity.

Channels I think make sense in more nuanced cases where you want to be design a more complex, but isolated concurrency problem. A good example is timers/tickers, which heavily lean on channels in their implementation. Same for things in the sync package.

I think using channels for every-day synchronization is often reached for too early when concurrency likely isn't even needed in the first place, and when it is, it's probably going to work well enough with shared memory until it becomes more complex.

Timers are kind of my go to for how channels quickly get difficult. Oh, it's just a channel? I can totally compose that with other channels in select. Go offers conditional variables, but not with timeouts. You want cond.Wait(timeout)? You're going to build it yourself with a timer and a wake channel. But now you've got the problem where you need to stop the timer. Except you might miss the race, so you have to remember to drain the channel. And your wakeup channel is more like a semaphore than a condvar. If you get multiple wakeups, you need to remember to drain that too. And unlike cond.Signal(), wakers can block on the channel send if you're not careful. So now you've decided to put the timer in its own goroutine, which will simply cond.Signal at intervals, but you still need to manage stopping and resetting the timer in the cond.wait() calling function. Edge cases abound if you want precise consistent results. The documentation for timer.Reset spends more words telling you how not to use it than what it does.

That's just because time/sync predates context IMO. If you could use context it would be easy and the same as in other situations.

My point wasn't about using timers which present channels to the user, but rather implementing timers/tickers is largely done using channels (though, also runtime help to make it faster/more efficient) and it's those types of problems which channels are a good solution IMO.

I feel like Golang is turning out to be like C++ with all of it's gotchas. It's not that it didn't have good intentions, it's just they didn't nail the corner cases for the end coder.

There's a lot of features of C++ that would have been awesome, had it not been for those corner cases.

The ratio of gotchas in Go compared to C++ has to be on the order of 1:10,000

>I feel like Golang is turning out to be like C++ with all of it's gotchas. It's not that it didn't have good intentions, it's just they didn't nail the corner cases for the end coder.

In many cases in Go they left footguns and open corner cases for the end coder in favor of an easier compiler implementation.

It's like C++ but without metaprogramming, without a huge library of standard algorithms, and with much worse performance, yes.

This is pretty good and deserves more attention. Github makes bug studies so much more approachable. Hopefully the study will make people more cautious about Go's concurrency model, which is as error prone as shared memory multithreading, if not more.

What concurrency model do you prefer? Seems like it’s easy enough to opt into a share-nothing model in Go, and this is generally what I do unless I have performance concerns (and as a consequence, I very rarely see concurrency bugs). I’m also not sure how much static analysis a la Rust could help the problem without accepting Rust levels of productivity.

The only foolproof way to opt into shared nothing in Go is not to spawn any goroutines. Global variables (which are all mutable, because there aren't any other kinds of variables in Go) are used all over the place in Go, including in the standard library.

How many of the concurrency bugs in this paper trace to the combination of "share-nothing" designs (channels) with mutable global variables in the standard library? If the answer is "none", what evidence does this paper present that mutable globals in the standard library are a practical impediment to "share-nothing" designs in real programs?

I mean, standard library global variables not being thread-safe would generally be treated as bugs in the standard library, so I wouldn't expect to see bugs there.

The point is that shared-nothing designs are just not how Go typically works. You can see that in functions like http.HandleFunc() (and everything else that uses the DefaultServeMux), which registers a global handler across all threads of the program.

Is that a good example of impediments to correctness in shared-nothing designs in Go? Using the default mux makes it hard for two different services to share the same Go program, and reduces the flexibility of libraries, but it doesn't appear to harm correctness or drag programs into shared memory designs. It seems like more of a purity test than a practical critique.

But the default serve mux is shared memory (it's literally a mutable global variable in net/http). If it isn't shared memory, what is?

I mean, Go doesn't force you to use shared memory; no language with threads does. It encourages it through library design (e.g. the default serve mux) and language design (e.g. package init() functions) though.

Right, I'm not disagreeing with you that the Go standard library and a few idiomatic Go standard library transactions use shared memory. I'm disputing that these are real impediments to building programs that benefit from shared-nothing designs. Obviously, Go does a better job of minimizing shared memory than eliminating it. I'm asking: does this distinction matter in practice?

There's Go functionality such as pprof that requires the use of the default serve mux and therefore you can't use pprof with a true shared nothing design. The log module is similar.

Does that lead to more bugs in programs? I don't know. It's quite possible it doesn't matter in terms of defect count. It's not shared nothing, though, is all I'm saying.

Right, thus the distinction I'm making between purity test concerns and practical concerns. Go is not, and has never claimed to be, a pure share-nothing design. It's pretty up front about not being that!

Right, but I don’t need “foolproof”, especially if it comes at high costs. I can already get mostly correct code in Go and the time I spend debugging is less than the overhead in other languages.

I wouldn't say I'm an expert in this area, but I do find Erlang/Elixir's actor model quite compelling.

The big advantage of actors are that they are compatible with network loss. Erlang and Akka actors are network transparent. Local-only actors are missing the point, IMO.

I'm fairly confident that Erlang's "actors" (the language authors didn't know about Hewitt's work at the time) were originally local-only, since the objective was robust processing on standalone network routers.

I suspect the fact that asynchronous messaging turns out to be particularly well-suited to network communications was a happy accident.

Anyway, I guess my point is that local-only actors can be useful, but I definitely agree that network transparency is a huge win.

It's one advantage but not the only one. Actor systems can greatly increase throughput and scalability with easier code without complex synchronization. There are plenty of benefits running only on a single host.

I've not used an actor model--is there some router component that resolves the process to which a message needs to be sent? Is every send() operation a tuple of `(address, message)`? If so, presumably the router component decides whether there is a local actor or whether it needs to go out over the network?

Erlang runs atop its own VM that includes scheduler, routing services, etc.

There are (at least) a couple of different ways to identify the recipient of a message: process ID (unique identifier for another actor) or Erlang's name service.

The VM does indeed know whether the recipient is local; the sender typically neither knows nor cares, although the information is available if useful.

In Erlang, you do send a message to a destination. It also offers a way to register a name to a process, which you can build on to make whatever routing you need. Erlang/OTP ships with a module called pg2 that builds a synchronized group of processes in your distributed system, sending to the group will by default pick a local process over a remote process, or you can broadcast to all the processes.

Every 'proc' (like a thread) is id'd. You're responsible for finding the proc you want. Some procs are globally named and there can only be one. The VM translates proc IDs when you message pass across the network. If the proc dies, your message will fail, so you'll have to find the new target or drop what you were trying to do.

How do you make optimal use of a computer's memory hierarchy? Do messages allow structural sharing of data? Or is every message assumed to potentially go over the network, hence destroying opportunities for optimizations (e.g. think of the internal workings of a high-performance database engine).

>How do you make optimal use of a computer's memory hierarchy?

Apparently more easily, with a faster time to market, and with less errors than alternative languages. E.g:



But the idea is not that you as the end programmer "make optimal use of a computer's memory hierarchy".

It's that you write in a way that makes it easy for the runtime to ensure what you do is solid and correct and scalable -- and it's up for the runtime makers to make sure it makes optimal use of the computer's memory hierarchy.

And it's also that you sacrifice some of that "optimal use", to get the "solid and correct and scalable" part.

Iirc the idea here is that there are four very smart engineers working on the core that have taken care of that for you. Everything on top is quite battle tested.

Erlang does have a shared binary type, but the other types are simply copied. This is likely not space efficient, and it likely uses a lot more memory bandwidth. However, this allows for the garbage collection to be extremely simple. It may be pretty useful for NUMA systems as well; although, I'm not sure how well the VM manages that, it may be more potential than actual.

Other actor systems may vary.

> I’m also not sure how much static analysis a la Rust could help the problem without accepting Rust levels of productivity.

You're implying that Rust has productivity issues, which is not true. I'd say I'm as productive in Rust as I was in C# and more productive that I was in JavaScript. Productivity isn't a problem in Rust, the true issue is the level of involvement and effort needed to learn the language in the beginning.

I would not cite Rust as proper static concurrency analysis, but Pony. Pony is the only fast language (in use) with safe and proper static concurrency analysis and support for shared memory.

And there exist many other concurrency-safe slower languages with message-passing only, which don't have the Go problems. Go has at least the tools to detect deadlocks at run-time in testing, but these tools do not replace proper static type-checks for concurrency bugs.

I prefer ReactiveX, because you're still thinking in terms of data flows, parallelization comes as a bonus.

Well, yes. Go's shared memory model is essentially the same as that of C and C++. You have locks, and you have data objects, and the language has no idea which lock covers what data. That's the big thing Rust got right.

Go's message passing model is somewhat error prone. It lets you pass references. So you can create shared data through the channels. It's easy to do this accidentally by passing slices, which are references to the underlying array, across a channel.

More generally, Go channels are one-way. Since you often want an answer back, you have to create some two-way mechanism from the raw one-way channels. It seems to be common to do this at the raw channel level, rather than using some "object" like encapsulation. Creating a mechanism which shuts down cleanly is somewhat tricky, too.

"Surprisingly, our study shows that it is as easy to make concurrency bugs with message passing as with shared memory, sometimes even more. For example, around 58% of blocking bugs are caused by message passing. In addition to the violation of Go’s channel usage rules (e.g., waiting on a channel that no one sends data to or close), many concurrency bugs are caused by the mixed usage of message passing and other new semantics and new libraries in Go, which can easily be overlooked but hard to detect."

On the one hand, they're absolutely right.

On the other hand, statistical methods don't give quality it's due.

I'd rather fight a blocking bug than a data-updated-via-race-condition bug.

confirms some of my suspicions to claims on how sexy and easy go is when it comes to concurrency. if anything it will give you a false a sense of security but god help you in case things go south and you’ve abused these features.

is there a html version?

As much as I don't like javascript, there is a whole swathe of concurrency bugs that exists in golang and not in nodejs.

Makes sense since Nodejs has no parallelism features.

Actually the event loop in nodejs uses several threads. There is parallelism. It's the granularity of the atomic operations that causes certain types of data race bugs.

In node, var x = 5; is atomic. In golang var x: Int = 5 is not atomic.

Go doesn't really have parallelism features either. A language designed for parallelism needs to have data-parallel features like parallel iterators, good support for SIMD, and concurrent data structures, neither of which Go has. (In fact, due to lack of generics, you can't even build your own generic concurrent data structures!)

Your comment makes no sense, goroutines are multiplexed on each cores thus it has parallelism.


"Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously."

Do you enjoy trolling and dismissing Go in general? Reading your comments sounds like some envy toward Go popularity. When you're not complaining about concurrency features it's about Go GC.

And Node has workers and can thus achieve parallelism as well. What matters isn't whether the language can achieve parallelism: it's whether the language is designed for it. Go is designed for concurrency, not parallelism.

It's Rob Pike's view [1] that concurrency enables parallelism "for free". I think that's oversimplified and at odds with the way people who write CPU-bound code actually use parallelism features, but that's how Go was designed: it omits traditional parallelism features in favor of concurrency features.

[1]: https://blog.golang.org/concurrency-is-not-parallelism

Yikes. It's impossible to do parallel processing on my old pentium pro because it lacks MMX?

I didn't say it was impossible to do parallel work in Go. You can get parallelism in Node, too, by using workers, or by just forking processes and having them communicate with pipes. Rather, what I was trying to emphasize is that Go—and Node—aren't designed for parallelism. Go has good support for concurrency, not parallelism. Parallel languages have feature sets more like CUDA, C++ with OpenMP, OpenCL, etc.

The traditional definition was doing multiple operations at the same time. SIMD, SMP, multicore, and clustering would qualify. So, yeah, you could do parallel processing in dual- or quad-CPU configuration. Beowulf clusters started with 486 using MPI and PVM. So, scale the Pentium Pro's up and out.

EDIT: Glanced at Wikipedia out of curiosity. Forgot it was used in ASCI Red. So, definitely can use them for parallel computing. To pass a teraflop, you just need 1,600 sq. ft. of them running at 850 kW. ;)


Well, also because it only features a single processor.

You could mimic parallel processing effects with multitasking tho.

Note however how parent didn't say that Go "can't do" parallelism, but that Go is not "a language designed for parallelism".

And that's a large reason why I use Go: scaling to multiple cores works out of the box if you build it with goroutines.

Python threads don't either due to the GIL, yet they hit the same concurrency bugs.

It's the granularity of the concurrency.

doesn't node have workers ?

when i worked with it there were special npm modules to manage a cluster to use all the cpus. why wouldn’t express just use workers?

well they are relativly new: https://nodejs.org/api/worker_threads.html probably experimental, too

The terminology could use some refinement. Not sure "message passing" is the right phrase to describe Go channels.

Most encounters I've had with "message passing"--whether it was AJAX, Win32 PostMessage, or grade school--have been non-blocking. Go channels block.

edit: Here's a pretty good write-up from 2016 on why channels are an anti-pattern:


It's fine. It's still message passing and it's obvious they refer to Go flavored CSP style message passing whenever they talk about it, not anything else.

(I've been using Go since it was released and like the language.)

Go tried CSP in the beginning but in CSP the 'sequential process' preempts only on blocking (IO) or until finished.

Also message passing is just that. You can't reach out and modify a message after you have posted it, IRL. This means messages are 'safe' to use concurrently since they are copies. That, in conjuction with strict CSP ssemantics (processes always run to completion with no preemptive cooperation) gets you 'safe concurrency' at the price of performance and at scale 'reasoning' about what is happening.

So, the practical decisions were (a) honors system message passing via references, (b) preemption at IO and elsewhere, and (c) introduction of mutex, etc. to fall back to [more performant] 'light weight threads' and 'locks' paradigm.

I don't see why blocking is an issue given that goroutines are light weight threads. They are really cheap and can be created as needed, eliminating much of the need for async patterns. It's one of the nicest things about Go given that async is just manually implemented lightweight threading.

I would say though that go channels have not turned out to be as useful as I thought they would be. I dont find myself using them that much. But maybe it's just my style or the type of stuff I am writing.

> It's one of the nicest things about Go given that async is just manually implemented lightweight threading.

No, async/await is typically stackless, while goroutines have stacks. That's a significant difference.

Blocking is a pretty big deal.

With something like pipes or sockets, there are timeouts and saner handling of breakages. With Go channels, one false move, and you've got a panic or an infinite wait.

I like Go, just not a big fan of channels.

Lightweight threads are different than async models. With the latter work continues until io or a manual yield happens. With threads, your workload can be preempted outside your control.

Blocking implies you can get deadlocks if you're not cautious, whereas with async you just can't.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact