Hacker News new | comments | show | ask | jobs | submit login

Go concurrency is just threads. It's a particularly idiosyncratic userland implementation of them.

There are two claims here I'd like to unpack further:

1. "I've measured goroutine switching time to be ~170 ns on my machine, 10x faster than thread switching time." This is because of the lack of switchto support in the Linux kernel, not because of any fundamental difference between threads and goroutines. A Google engineer had a patch [1] that unfortunately never landed to add this support in 2013. Windows already has this functionality, via UMS. I would like to see Linux push further on this, because kernel support seems like the right way to improve context switching performance.

2. "Goroutines also have small stacks that can grow at run-time (something thread stacks cannot do)." This is a frequent myth. Thread stacks can do this too, with appropriate runtime support: after all, if they couldn't, then Go couldn't implement stack growth, since Go's runtime is built in userland on top of kernel threads. Stack growth is a feature of the garbage collection infrastructure, not of the concurrency support. You could have stack growth in a 1:1 thread system as well, as long as that system kept the information needed to relocate pointers into the stack.

Goroutines are threads. So the idea the "Go has eliminated the distinction between synchronous and asynchronous code" is only vacuously true, because in Go, everything is synchronous.

Finally, Go doesn't do anything to prevent data races, which are the biggest problem facing concurrent code. It actually makes data races easier than in languages like C++, because it has no concept of const, even as a lint. Race detectors have long existed in C++ as well, at least as far back as Helgrind.

[1]: https://blog.linuxplumbersconf.org/2013/ocw/system/presentat...




That Go concurrency is threads is the point of the article. It says this explicitly: "You can think of goroutines as threads, it's a fairly good mental model. They are truly cheap threads". If OS threads become as cheap as userland threads then the implementation of userland threads becomes unnecessary, but the conceptual model stays the same.

In other languages, asynchronous APIs return futures or have explicit callbacks, whereas synchronous APIs return the type directly, thus creating a type level distinction between asynchronous and synchronous code. Go's model eliminates that distinction even though it calls asynchronous OS APIs under the hood.

Go isn't the king of cheap threads, though. Some cilk style task parallelism implementations have figured out clever tricks to make threads and synchronisation even cheaper. They're so cheap in fact, that a recursive fibonacci function that spawns threads for its recursive calls is almost as fast as one that doesn't. This is achieved by spawning those threads lazily: if a core is idle it looks at the call stacks of other cores and retroactively spawns threads for some of the remaining work on those stacks. It steals that work by mutating that call stack in such a way that if that other core returns to the stack frame it will not perform that work itself, but it will use the result computed by the core that stole the work. This not only avoids thread spawning cost unless the parallelism is actually used, but it also avoids synchronisation cost, because synchronisation is only necessary if work was actually stolen. Cilk style task parallelism traditionally focuses on parallelism rather than concurrency, but there's no reason why the same implementation strategies couldn't work for concurrency. OS threads have no hope of beating this.


Note that there is a widely-used implementation of Cilk for Rust, known as Rayon, used in Firefox Quantum among other projects.


After a web search I read that Rayon is a word play on Cilk, but neither are dictionary words.. Would someone explain the word play in the names for us non-native speakers?


Rayon[1] is artificial silk.

[1]: https://en.wikipedia.org/wiki/Rayon


I've never heard of cilk before, but it seems fascinating. Is there any material I could read on how this is implemented?


This is the canonical Cilk paper: http://supertech.csail.mit.edu/papers/cilk5.pdf


Here is the paper describing the lazy thread spawning implementation: https://ece.umd.edu/~barua/ppopp164.pdf


>Goroutines are threads

This is a thing I have to continually remind newer engs that are using Go. Goroutines have exactly the same memory observability / race conditions / need for locking/etc as any threaded language. Every race bug that exists in e.g. Java also exists in Go. The only real difference for most code is that it pushes you very hard to use channels directly, as they're the only type-safe option.

There are some neat tricks lower down, e.g. where you don't have to explicitly reserve a thread for most syscalls (but there are caveats there too, which is why runtime.LockOSThread() exists), which are legitimately nice conveniences. But not a whole lot else.


> The only real difference for most code is that it pushes you very hard to use channels directly, as they're the only type-safe option.

It is, but it is also a super important difference: Go provides inbuilt libraries to make inter-thread synchronization more bearable for the average user, e.g. channels and waitgroups. These are lot harder to misuse than bare mutexes and condition variables. Since those are not available in other languages and are too complex for most users to implement them on their own (especially when select{} is required), the solutions there often end up more error-prone.


Practically every language includes stuff higher level than bare mutexes and atomics. That people use them doesn't mean other options aren't available. And yeah, totally 100% agreed, most people should never implement them themselves.

But "futures", "blocking queues", "synchronized maps", and "locked objects" (e.g. a synchronized wrapper around a java object) are extremely common and often higher level than channels and selects and waitgroups[1].

[1]: Waitgroups in particular are covered by normal counting mutexes, marking them as low-level constructs like they are, and are also available nearly everywhere.


True, there are plenty of primitives available. However I found many of them are mainly for synchronizing low-level access to shared-data (e.g. the concurrent data structures, synchronized objects, mutexes), etc.

Primitives for synchronizing concurrent control flow (like Go's select()) seem less common. E.g. in Java I would have some executor services, and could post all tasks to the same single threaded executor to avoid concurrency issues. But it's clearly a different way of thinking than with channels/select.

You are also fully right about waitgroups. They are really nothing special. But they might see more use in Go since some structuring patterns are more often used there: E.g. starting multiple parallel workflows with "go", and then waiting for them to complete with a waitgroup before going on.


Select is a bit less common (at least in use) from what I've seen, yea... until you start looking at Promises and Rx. Then it's absolutely everywhere, e.g. `Promise.race(futures...)` (which trivially handles a flexible length of futures, unlike select) or any merging operator in Rx. More often though I see code waiting on all rather than any, and Go doesn't seem to change that.

Channels though are everywhere, and sample code tends to look almost exactly like introductory Go stuff[1] - Java has multiple BlockingQueues which serve the same purpose as a buffered channel, and SynchronousQueue is a non-buffered one. Though they don't generally include the concept of "close"...

But streams do have the "close" concept in nearly all cases, and streams are all over the place in many many forms, and have been for a long time. They generally replace both select and channels, but are usually much easier to use IMO (e.g. `stream.map(i -> i.string())` vs two channels + for/range/don't forget to safely close it or you leak a goroutine). Some of that is due to generics though.

[1]: https://docs.oracle.com/javase/7/docs/api/java/util/concurre...

---

some alternate stream concepts are super interesting too, e.g. .NET's new pipelines: https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html


I think more languages should have typed channels, but untyped channels are readily available via OS primitives. Unix has had pipe() since 1973.


True! Haven't thought about it, but it's actually an OS-backed untyped channel, which even supports select().

Guess the differences are: It operates on bytes and not on objects, so a custom protocol is needed on top of it. And it can't provide the same guarantees as an unbounded channel, where there are certain guarantees that sending an item actually reached the other site, and which makes guaranteed resource handoff through channels a bit easier.


> It operates on bytes and not on objects, so a custom protocol is needed on top of it.

That's why typed channels are useful, at least at the copying level (when it's more than just an array of lockable memory addresses underneath--data has to actually get transferred)--the typing is the message delimiter, which itself is the protocol.


Nothing you're saying is wrong, but your perspective is totally off.

For someone experienced with C++, or say, Rust (ahem), obviously Go is a bit backwards when it comes to concurrency and race conditions. Obviously you can mimic go's goroutine stacks, obviously you can obtain fast thread switching.

But Go isn't targeting C++ or Rust, and it's not targeting the domains those languages are best at (although admittedly there are some overlaps between Go and Rust). Go is trying to replace Ruby, Python, and JS. For programmers who only know those languages, or haven't had the opportunity to work in more "heavy" languages, Go makes it dead simple and intuitive to do things that previously would have been completely out of reach. All you're arguing is that if you go farther "down" the stack so to speak you can accomplish everything Go does, which of course is true, since it's turtles all the way down.

The fact of the matter is, if someone boots up a brand new linux laptop goroutine switching _will_ be faster than thread switching. That doesn't mean Go is super crazy performant, or better than C++ or Rust, it means someone who's only programming experience is Rails apps can now write performant multi threaded code with orders of magnitude less domain experience. Same with race detectors. Multithreaded Python is a complete minefield for race conditions. Sure, you might not segfault, but you have to think if you use fork() or spawn() depending on the OS, you have to install libraries to detect races, you have to write special tests. With Go all of that comes out of the box and it makes it _easy_.

Go does nothing new, and a lot of languages do things a lot better. Erlang is mentioned in other comments, Erlang is a fantastic language for concurrency and a fantastic language in general. It's also incredibly hard to find programmers who code in it, or are willing to learn it. It's also incredibly hard to sell to the business people higher up. It's also very hard to find Ops people who can competently support an Erlang stack. C++ gives you the power to build formally correct real time systems, but it also gives you the power to blow your whole leg off if you don't know _exactly_ what you're doing. I can go even further down the stack with C and assembly but I think you get the point, it's all about tradeoffs. Go allows programmers to reason and think about concurrency without having to worry about linux kernel PRs, without having to worry about how to share memory, without having to worry about stack performance.

*edited for clarity


"But Go isn't targeting C++ or Rust.. Go is trying to replace Ruby"...

If you listen to Ken Thompson and Rob Pike talk about the first days of Go, it was directly targeting C++. They mention the absurdly slow compilation times of C++ code at Google, and the high complexity of code that folks were writing. I believe Steve Francia's "Standing on the shoulder" talk goes into this, but I don't have time right now to re-watch and make sure I'm citing the right talk.


This is correct, but they quickly pivoted to targeting the Python codebases. Rob Pike has a talk where he talks about how Google used C++ and C to rewrite hot Python paths and how they wanted Go to be able to completely replace that whole pattern. Russ also has a blog post (maybe? it also could have been a comment in a github issue, tbh I can't remember) where he mentions converting python programmers was orders of magnitude easier within Google, as the C++ programmers often had rose colored glasses about their own abilities and the tradeoffs of C++.

It's interesting, as I don't think Go would have been successful without the pivot, but I also don't think it would have been as successful if they had started off trying to replace Python.


Well, why did not they just use C#? It already had most, maybe even all of Go's current features.

Hell, the main thing of Go, the go op is basically await.


C# would be a good option today, but at the time it was an expensive proprietary closed-source blob maintained by one of their primary competitors, and async/await was still years away. I don't like Go very much, but it was a reasonable choice for Google.


In 2008 Mono was already in quite a good shape.


Why exactly do you say Go is backwards to C++ with regards to concurrency and race conditions? Really curious.

I worked on a sizeable C++ codebase, and it had a home-grown, buggy thread-pooling & task cancellation (similar to Go's context.Context) engine. Go's builtin goroutines were a breeze afterwards. Also, I debugged a race condition in this codebase. Once, and it took me a few weeks full steam digging, thinking and mental construction. I didn't even know I have one at the beginning, just had this nagging feeling. (Valgrind would take the app to a crawl, and it was speed-critical.) In Go, I can just run tests with -race flag, and it finds me truckload races at a blink, pointing to the exact place where they happened, with a stack trace sprinkled on top. I really find it hard to understand how you find Go experience backwards here. But I'm also open and listening, and very curious whether I could maybe learn something enlightening!


Your experience mimics my experience almost exactly (although it sounds like you've worked on much larger and more complex C++ projects than I have), however I've been lucky enough to work with some extremely talented C++ programmers that have been able to accomplish astounding things with good, clean, C++ code. The problem, of course, is that 99.9% of us are not extremely talented C++ programmers.


Ok; so if you confirm my experience, I don't understand how you can at the same time claim Go is "backwards to C++" w.r.t. concurrency & race conditions. Did you mistype "superior" as "backwards"? Btw, we also had extremely talented C++ programmers. Some of them sent ISO C++ proposals from time to time, and I think were even accepted.


It seems weird to switch from python to go. Python is lispy in that it maximises flexibility and developer power at the cost of speed and inbuilt correctness checks. Go is Javaish in that it limits what the developer can do heavily (crippled typesystem) for speed, correctness.

I checked my assumptions, you're kind of right about python developers switching but it's hard to know the real base distribution of people who know those languages (there might just be more python/js developers not more switching proportionally). Not much ruby guys though and plenty of Java/C/C++

https://blog.golang.org/survey2017-results


> but your perspective is totally off.

Well, his perspective is well known in all Go threads on HN and this is not only my opinion (look here [1]). He is repeating the same things[2] again[3] and again[4] and again. About M:N in Go, about why Go is worse because it doesn't have generational GC but when asked if he reached to Go team about that there is no response. If you look at his comments from last month you will see how much downplaying of Go is there + other languages. It seems that he only praise one language (ahem) in his comments.

1. https://news.ycombinator.com/item?id=17886153

2. https://news.ycombinator.com/item?id=18101986

3. https://news.ycombinator.com/item?id=17886144

4. https://news.ycombinator.com/item?id=17886122


> It seems that he only praise one language (ahem) in his comments.

So, I've seen this pop up a few times, but I'd also like to make this really clear: Patrick formally stepped down from working on Rust a year and a half ago, and was inactive for a while before then. At this point, he's the same as any other user.

That is to say nothing about my opinions about his opinions, but let's be clear, rather than insinuating things: Patrick speaks for himself, not for the Rust team.


Thanks for clarification, I didn't know about that. That doesn't change anything I wrote but it's good to know this was not coming from current member of the Rust team.


His perspective while familiar to you is new to me. While he is adding value, you are simply a distracting commenter on a goose chase.


> Finally, Go doesn't do anything to prevent data races, which are the biggest problem facing concurrent code. It actually makes data races easier than in languages like C++, because it has no concept of const, even as a lint. Race detectors have long existed in C++ as well, at least as far back as Helgrind.

While it can't guarantee correctness outside of runtime (and even then, obviously only if whatever you are running actually triggers the race), a race detector has been part of the Go core since 2012[1].

[1]: https://blog.golang.org/race-detector


Also there's a "best effort" concurrent map access detector that runs even when you don't compile with race detector support enabled.


So what's preventing Linux from adopting switchto support? I'd love to see the discussion around it.


Beats me! The author no longer works at Google from what I can tell—emails to him bounced.

I'd love to see 1:1 threading become competitive with M:N for the heaviest workloads. It just plays so much nicer with the outside world than M:N does.


> I'd love to see 1:1 threading become competitive with M:N for the heaviest workloads. It just plays so much nicer with the outside world than M:N does.

After working lots of years with all of the available async paradigms (eventloops, promises, async-await, observables, etc) I would tend to agree. Even though many of those things (including Rusts futures implementation) are very well-engineered, the integration problems (as e.g. outlined in the what color is your function article) are very real. And the extra amount of understanding that is required to use and implement those technologies might often not justify the gains. One basically needs to understand normal threaded-synchronization as well as async-synchronization (e.g. as with .NET Tasks or Rusts futures) to build things on top of it. Same goes for the extra level of type indirection (Task<T> vs T), or the distinction between "hot" and "cold" tasks/promises.

If it would be possible to get 1:1 threading into the same performance region for most normal applications (e.g. everything but a 1 million connection servers) it seems to very favorable.


Doesn't catch everything, but it catches a lot: https://golang.org/doc/articles/race_detector.html


It's great when it works though; I thought I was running into a race condition once but wasn't sure (new to the language), and having something verify it was indeed a race condition was really awesome.


> Goroutines are threads. So the idea the "Go has eliminated the distinction between synchronous and asynchronous code" is only vacuously true, because in Go, everything is synchronous.

Abstractly, I agree, but practically the distinction is important. In Go you don't have to use a frustrating async interface to get decent performance. You don't have to manage a threadpool or other tricks. You pretty much get the threadlike interface you want to use without the difficulty.


Let's be clear: the one-OS-thread-per-connection model does yield decent performance. We used to call async I/O "solving the C10K problem"—i.e. serving 10,000 clients simultaneously.

I'm speaking from experience here, having tried to implement M:N and abandoning it in favor of 1:1, which yielded better performance. Can M:N yield better performance than 1:1? Sure, in some circumstances. But I think that, ideally, we should be striving for 1:1 everywhere.


All true. But I have been waiting nearly 30 years for a 1:1 implementation that scaled to the #threads I want to have. Still waiting...

The point of M:N schemes isn't to achieve better performance but rather to achieve higher scaling.


How many threads do you want?


I'm not the person you're asking, but I want one thread per connection, and I want one million connections per host.


But then with a million of threads either M:N or 1:1 you have other problems, namely the whole shared memory multithreading model breaks and you can forget about all the locks/channels if you want to actually do something useful with that.


I believe there are boxes in production with 1E6 live connections.


If they exist, they are running software written on either C or Rust, that can avoid memory efficiency traps and can schedule their workers on a more optimum way than a general purpose language.

That said, I'm not sure such thing exists. Just the memory overhead some common libraries impose on connections is enough to fill some 32GB.


Erlang works just fine for that many connections (depending on how much CPU you spend doing real work per connection). Socket buffers are tunable. 32GB isn't that much ram for a server, Android phones are shipping with 6gb, you can get 6TB into a server without getting too exotic.


>That said, I'm not sure such thing exists.

Just to clarify : my knowledge on this is pretty deep so the only reason I didn't say "they exist" is that I don't personally have one in my DC (I've only managed up to 200k live connections). I know people very well who do. But yes of course the application would be written in some efficient language fit for the purpose like C, Rust, Go, Erlang. Probably not Java.

Also 32GB is not much memory these days.


3 million connections: https://medium.freecodecamp.org/million-websockets-and-go-cc...

Heck, you can probably handle a million connections using PHP(with swoole).


In the article they got rid of goroutines to get to 3 million connections.


Can you elaborate on what you mean by "the whole shared memory multithreading model breaking"? I would love to hear ;)

> Finally, Go doesn't do anything to prevent data races, which are the biggest problem facing concurrent code

I've been working on a multiprocessing library -- built a wrapper function that makes any function, an atomic operation on the state.

https://zproc.readthedocs.io/en/next/user/atomicity.html

(Since it's protected by the actor model, not locks it's an enforcing mechanism)

Do you think this is a step in the right direction?


I'm trying to find out what you mean by "switchto support". Do you have a link?


See the bottom of the post you’re replying to.


Just focus on C++




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: