
Even in Go, concurrency is still not easy - benhoyt
https://utcc.utoronto.ca/~cks/space/blog/programming/GoConcurrencyStillNotEasy
======
Animats
That's a classic deadlock bug - two producer/consumer relationships in
opposite directions.

Have another goroutine receive and output the "found" results, rather than
doing it in the first loop. Now you have a proper pipeline.

The big win with goroutines is that they're green threads. They can block, and
you can have lots of them. With regular threads, too many will use up
resources. With "async", anything that blocks or just takes too long locks up
the whole system. Go has the best of both worlds, which is very useful for the
case of a server holding open connections to a huge number of human-paced
clients.

~~~
mathw
I've never worked with a green threads system in any real sort of environment
where I was actually solving real problems at scale, but I always wonder, does
the underlying runtime ever push them into actual real threads that can
actually run on another core, or are they always the imaginary concurrency of
using the gaps where another green thread needs to wait the comparative
eternity for some I/O?

~~~
_ph_
By default, a go program runs computations on as many threads as there are
CPUs in your system (this can be adjusted via gomaxprocs). The execution of
all goroutines is distributed amongst these threads. For maximum efficiency
you want to avoid spawning much more threads than there are CPUs, as
scheduling has an overhead and each thread consumes valuable memory for the
stack space. The Go model of running many goroutines distributed across
threads is a very efficient solution and also very flexible, as the program
writer does not have to make any assumptions about the CPU count of the
machine the program runs on.

------
naikrovek
Hmm, I don't know. I had this limited concurrency issue on the first Go
program I ever wrote, and I'm very new to concurrent stuff, certainly if you
only count the past ten years or so.

I figured this out in Go about 30 minutes after I realized I needed to limit
concurrency, and I'm a dumbass, basically.

So, I'm not sure that "still not easy" is accurate, especially given that I
have a lot of trouble conceptualizing async and await, as used in C#. In fact,
I don't think I've ever successfully used that paradigm in C#. I've been told
that C# async and await are "easy" and this article says Go concurrency is
"still not easy" and I've had the exact opposite experience in both cases.

In fact I really don't understand why async and await is even a thing anymore,
given the paradigm that Go uses and the example it sets. It is almost
supernatural in its ease, for me.

~~~
pansa2
> I really don't understand why async and await is even a thing anymore

Programming with threads is hard because they can context-switch at arbitrary
points. Goroutines are nothing more than (lightweight) threads.

Stackful coroutines (as in Lua) aren’t much better, because they can context-
switch at any function call.

Code that uses async-await is different - it can only context-switch at points
explicitly marked with `await`.

~~~
diek
async/await in the general sense doesn't have anything to do with the
preemption policy of a particular language implementation. For instance, C#
implement async/await as keywords on top of its existing concurrency model.

That's really what it comes down to: cooperative multi-tasking vs preemptive
multi-tasking.

Windows 3.1 had cooperative multi-tasking. Everything went great until one
program didn't yield control, and then the whole system ground to a halt. That
should sound familiar to any Node.js developers with their event loop.

Cooperative multi-tasking also extremely complicates code that actually uses
the CPU. If you look inside libuv, it even has to jump through hoops with its
encryption functions by effectively calling yield() in between rounds of
calculation.

~~~
csharptwdec19
>If you look inside libuv, it even has to jump through hoops with its
encryption functions by effectively calling yield() in between rounds of
calculation.

I would have assumed the Yield was there to help with protect against timing
pokes...

------
inaseer
Concurrency is hard and we have very poor support for testing correctness of
concurrent and distributed systems. Language abstractions help but they aren't
nearly enough (as evidenced by this post). My team at Microsoft leverages
Coyote to check the safety of our services against such subtle race
conditions. We blogged about using it to reliably reproduce and fix a very
subtle bug in a bounded buffer implementation over at
[https://cloudblogs.microsoft.com/opensource/2020/07/14/extre...](https://cloudblogs.microsoft.com/opensource/2020/07/14/extreme-
programming-meets-systematic-testing-using-coyote/)

If you're using .NET in your projects, you can start taking advantage of such
tools _today_. I would like for such tools and testing techniques to become
more and more common place in the industry as concurrent and distributed
systems are _hard_ and we should use all the help we can get.

~~~
ereyes01
Go comes with thread sanitizer, which you can enable with go test -race ... If
your unit test exercises a race condition, this will blow up your test with
stack traces of the data race.

It sounds a bit like Coyote, which also looks very useful for C# applications.

~~~
inaseer
Neat to learn about thread sanitizer. It sounds similar to another tool from
Microsoft Research called Torch ([https://www.microsoft.com/en-
us/research/project/torch/](https://www.microsoft.com/en-
us/research/project/torch/)) which automatically instruments binaries to
detect data races. Coyote is similar in some ways but different in others.
Coyote serializes the execution of the entire program (running one task at a
time), exploring one set of interleavings and then rewinding, and then
exploring another set of interleavings, hoping to hit hard-to-find safety and
liveness bugs. In addition to finding concurrency bugs in one isolated
process, we use it to find bugs in our distributed system by effectively
running our entire distributed system in one process and having Coyote explore
the various states our system can be in. It sounded mind-boggingly cool when I
first came across this way of testing distributed systems through Foundation
DB
([https://www.youtube.com/watch?v=4fFDFbi3toc);](https://www.youtube.com/watch?v=4fFDFbi3toc\);)
we're emulating this kind of testing in our distributed system through Coyote.
And unlike Foundation DB which had to develop their own variant of C++ to be
able to do this kind of testing (kudos to them for doing it), Coyote allows us
to do it on regular C# programs written using async/await asynchrony and
benefit from decades of Microsoft Research in exploring large state spaces
effectively.

------
pcjthug
Did Go ever make concurrency easy?

Any modern language allows you to create threads (semantically equivalent to
goroutines) and pass messages between them. Most of them even have high-level
concurrency support such as parallel map, structured concurrency, reactive
extensions, type-safe generic collections...

We could say that Golang makes concurrency using thread semantics scale to a
large number of threads. When this paradigm is appropriate, then indeed Go is
nice. The select statement is nice. But this is not always the right approach.

Yes, it is true that go needs slightly fewer characters to launch a naked
thread than in C# or Java. In that sense, it is easier. Anyone who considers
that a major selling point is a walking race condition.

As a tongue in cheek analogy, what if we consider my new language G, in which
G is an alias for goto. G is going to finally make control flow easy. I've
done away with complicated control flow constructs such as for or while -
users wouldn't understand them anyway. Users are encouraged to use the G
keyword liberally. After all, it's only 1 character. Indeed, by that metric,
it's even simpler than go's go keyword. What could go wrong?

~~~
navidr
ESL here. What do you mean by “ walking race condition”?

~~~
recursivecaveat
A "walking X" is a person who is or is going to create an X. For example, an
unpleasant boss might be a "walking turnover problem".

~~~
navidr
Thank you for your detailed explanation. I really appreciate it.

------
klodolph
My personal style here is to use a number of goroutines equal to the desired
concurrency. I know goroutines are cheap and there is no particular reason to
conserve them most of the time, but this feels more natural to me than using a
semaphore--although it is less flexible.

Simplified:

    
    
        var wg sync.WaitGroup
        wg.Add(jobs)
        ch := make(chan workItem, jobs)
        for i := 0; i < jobs; i++ {
            go func() {
                defer wg.Done()
                for item := range ch {
                    processItem(item)
                }
            }()
        }
        for _, item := range items {
            ch <- item
        }
        close(ch)
        wg.Wait()

~~~
andonisus
Here's how I like to do it:

[https://play.golang.com/p/AB6exUIkNRg](https://play.golang.com/p/AB6exUIkNRg)

Contents pasted below.

    
    
      package main
    
      import (
       "fmt"
       "sync"
    
      )
    
      func main() {
       wg := &sync.WaitGroup{}
       e := NewParallelExecutor(uint64(10))
    
      for i := 0; i < 100; i++ {
        wg.Add(1)
        func(_i int) {
         //make closure to capture i correctly
         e.Submit(func() {
          fmt.Println(_i)
          wg.Done()
         })
        }(i)
    
      }
    
      wg.Wait()
       fmt.Println("All done.")
      }
    
      type Executor interface {
       Submit(func())
      }
    
      type parallelExecutor struct {
       queue []func()
       inCh chan func()
       doneCh chan struct{}
      }
    
      func NewParallelExecutor(limit uint64) Executor {
       pe := &parallelExecutor{
        queue: make([]func(), 0, limit),
        inCh: make(chan func()),
        doneCh: make(chan struct{}),
       }
    
      go func() {
        running := uint64(0)
        for {
         select {
         case <- pe.doneCh:
          running -= 1
         case f := <- pe.inCh:
          pe.queue = append(pe.queue, f)
         }
         if len(pe.queue) > 0 && running < limit {
          running += 1
          fmt.Println(fmt.Sprintf("Goroutines in flight, waiting: %d | %d", running, len(pe.queue)))
          f := pe.queue[0]
          pe.queue = pe.queue[1:]
          go func(_f func()) {
           _f()
           pe.doneCh <- struct{}{}
          }(f)
         }
        }
       }()
    
      return pe
      }
    
      func (p *parallelExecutor) Submit(f func()) {
       p.inCh <- f
    }

------
valenterry
Go just sells itself well.

Languages that _actually_ make concurrency easy, such as Haskell, Scala, OCAML
or distributed concurreny easy (Erlang/Elixir) are way ahead of Go but are not
as hyped and are not marketing themselves as well.

~~~
jshen
Concurrency isn’t easy in any language.

~~~
Jtsummers
Erlang comes pretty close to achieving this. When/if people get past the
“weird” syntax it’s a very straightforward model that minimizes a lot of the
hazards, though locks are definitely still possible.

~~~
neilwilson
The actor model struggles when you run up against something that is naturally
sequential and the order matters.

If you want a 1000 households to visit a set of shops in a repeatable Random
order until the shops runs out of stock, that’s a List sort with a seed and a
sequential loop

Make the 1000 households concurrent and sending messages and it becomes a
major exercise in clocking, scaling and scheduling.

~~~
mikhailfranco
Erlang processes (Actors) solve this kind of problem naturally, with code that
is concise, asynchronous and parallelized. For example, look at solutions for
_' Sleeping Barber'._ It's possible to write a full solution in <40 SLOC.

Starting 1000 processes is trivial and fast in Erlang. See the first few pages
of Joe's presentation from 20 years ago: process creation time ~10us (up to
30k processes); message time ~1us [1]. The code is in his book and the email
thread [2]:

[1]
[https://www.rabbitmq.com/resources/armstrong.pdf](https://www.rabbitmq.com/resources/armstrong.pdf)

[2] [https://erlang.org/pipermail/erlang-
questions/2007-July/0280...](https://erlang.org/pipermail/erlang-
questions/2007-July/028068.html)

~~~
neilwilson
It’s rather more tricky than the sleeping barber. It’s more that their are 100
hairdressers, each household has a set of seven of them and each hairdresser
only has a fixed amount of hair colour so not everybody will be served, and
you need to be able to repeat the visits deterministically based on a random
seed so the only whole process is verifiable.

------
_ph_
I do think that Go has a great set of tools to write concurrent programs. But
I think it is a fallacy to believe that as a consequence "concurrency is easy"
in a general way. These tools still require the programmer to be aware of the
challenges of concurrency. They do make it more easy to deal with them.

It is the same story as with garbage collection. Garbage collection prevents
some kinds of errors and in general makes it way more easy to deal with
dynamic memory allocations. However, garbage collections does not mean you
don't have to think about allocation patters and object lifetime for example.

Go does make concurrency much easier by the tools it provides. Most
importantly are goroutines, which are very lightweight, so within reasonable
limits, you don't have to be concerned about the number of goroutines you
spawen - but as the example shows, you shouldn't try to spawn more goroutines
as there are file handles available to your process, if every goroutine
allocates a file handle. Not only are goroutines very lightweight with small
growable stacks, but as they are part of the language specification, the
compiler can generate code which helps with scheduling, in most cases you have
efficient cooperative sheduling, so thread-based preemption is less frequent.

On top of that, channels provide a very easy to use abstraction for
communication between goroutines. There are a lot of use cases, where
goroutines and channels give you very easy and safe concurrency. It does not
save you from understanding concurrency issues and especially deadlocks. Which
in a general sense is impossible to achieve, because to prove that a program
is deadlock-free would require to prove that all goroutines return and that is
equivalent to the halting problem, which is unsolveable.

It should also not be underestimated, that due to goroutines and channels
being part of the language spec, these are very commonly used features.
Concurrency is present in most Go programs. As a consequence, when one reads a
lot of Go code, there will be plenty examples of their usage. And any library
needs to be thread-safe as the likelyhood is very high that it does get called
from goroutines.

------
axaxs
The problem is defined, but all of the proposed solutions don't seem fitting.
When using a semaphore, it's usually for a reason. I think the solution with
the least overhead and perhaps most idiomatic is to spin the found reading
loop in a goroutine BEFORE you start any at all, and leaving the rest of the
code as is. This prevents having x number of idling goroutines, where x can be
billions(but not in this case). Though again, in this case, using a semaphore
and perhaps goroutines at all seems spurious.

------
donatj
I’ve been wondering for like 15 years when a language is actually going to
show up where I can just write my code procedurally and the compiler
automatically figures out what it can parallelize.

Like if I have 3 definitions in a row that set variables to the result of
methods that don’t share memory, that seems pretty obviously parallelizable.
Why isn’t anyone doing these sorts of optimizations?

I would have thought it would have been solved by now, but I was wrong.

~~~
gonzo41
That sound 'cheap' but if you think of the clock rate of a CPU, doing those
tasks in parallel will take time to setup, process and teardown. That example
doesn't really work well.

Fitting problems into grids of processing like SIMD or GPU's is tricky because
not a lot of problems fit those well like images. Most of what you're asking
for is about understanding business flow which a language will never do.

------
appleboy46
The following is my solution:

See the playground:
[https://play.golang.org/p/pXJaGQ0efe8](https://play.golang.org/p/pXJaGQ0efe8)

Source Code is available on GitHub: [https://github.com/go-
training/training/blob/2ddb95d08c654a6...](https://github.com/go-
training/training/blob/2ddb95d08c654a6410c2cfa70b0f506d8deceda9/example38-concurrency-
is-still-not-easy/answer02/main.go#L1)

------
xmprt
Maybe I'm dumb but I don't understand the bug. I understand why the main code
blocks if no tokens are available in the found channel but why would writing
to `found` block the code (and why only in the case when there's a lot of go
processes)?

~~~
jeffbee
It deadlocks if len(pss) > concurrencyProcesses.

~~~
xmprt
Even in that case, wouldn't the main loop block for a short time but then
unblock after the goroutine returns (and the defer function is executed)?

I thought the whole point of limitCh is to support the case where len(pss) >
concurrencyProcesses but in this case you're telling me that it's breaking
things?

~~~
jeffbee
The inner goroutine cannot return because it blocks at found<-P, because found
is not buffered and there is no reader.

This would work if the consumer was started in a goroutine first.

~~~
xmprt
Thanks. The part about starting the consumer in a goroutine actually made more
sense as a solution than either of the given solutions.

I guess I completely forgot how channels work and didn't realize that until
there's a reader, the channel will block (which makes perfect sense in
hindsight).

------
yencabulator
A more common way to write limited concurrency is to use a worker pool:
[https://blog.golang.org/pipelines](https://blog.golang.org/pipelines)

------
rad_gruchalski
The protoactor library for golang: [https://github.com/AsynkronIT/protoactor-
go](https://github.com/AsynkronIT/protoactor-go).

I can’t recommend it high enough. It makes working with highly concurrent go
so much easier and is a blessing for anybody with prior experience in Erlang
or Akka.

------
atombender
When I first came to Go, I'd heard a lot of people speak profusely about CSP
and how Go made concurrency easy. I was disappointed at just how _not_ easy it
was.

For example, Go has no generic support for atomic primitives, arrays, slices,
or maps. It has sync/atomic, but it's not generic. I always reach for Uber's
atomic library, which has typesafe atomic wrappers such as atomic.Bool.
They're so common, you'd think a CSP-aware language would provide keywords
declare things atomic:

    
    
      var b atomic[bool]
      if cas(b, false, true) {
        ...
      }
    

"Classical" Go didn't have errgroups, contexts or cancellation. To build
something truly robust, "modern" Go ends up involving all three (plus atomics,
of course). That's because almost all complex situations need to build what
are effectively nested trees of goroutines that all need to quickly abort and
unwind on errors or panics.

But as you use more and more of these primitives, your code (or at least my
code) becomes more and more obscured by the layers of error-handling, cancel-
detection, retrying, and so on. I really wish some of this was built into the
language, especially cancellation. For example, channels don't support
contexts. So if you have a loop like this:

    
    
      for evt := range taskCh {
        handle(evt)
      }
    

...then you have to rewrite it to support cancellation. So you have a few
options. One is to select on both the channel and context:

    
    
      for {
        select {
          case evt := <-taskCh:
            handle(evt)
          case <-ctx.Done():
            return ctx.Err()
        }
      }
    

Because of this, you can no longer use a for loop. Immediately the code got
more bloated and less readable.

Another option is to spawn a goroutine whose only job is to abort the channel:

    
    
      go func() {
        <-ctx.Done()
        close(ch)
      }()
      for evt := range taskCh {
        handle(evt)
      }
      return ctx.Err()
      

That's a bit better (and you could move the for loop into its own function for
clarify, without changing anything else). But it's still worse than a simple
for loop!

As a real-world example, here [1] is some code I've been working on lately. A
controller starts N workers that need to process a task queue. If the
controller is stopped, all the workers need to stop. So I use the pattern
above. But I can't use a for loop! That's because I use a task queue
abstraction that needs to support features that Go channels don't.

I can't use a for loop or select block to block on the task queue, because
only channels support that. So that's another problem with Go: Often, you can
use raw channels as your data processing primitive if things are simple
enough, but in practice, channels are best for coordination, and you have to
build your own primitives — and yet, by building those primitives, you lose
language expressiveness.

To address the concrete example, I'd love to do this instead:

    
    
      // Range over task queue (custom object with blocking
      // semantics), automatically cancel if ctx is cancelled
      for task := range taskQueue with ctx {
        handle(task)
      }
    

[1]
[https://gist.github.com/atombender/6bcff2c2d8fec32bc80ce1f57...](https://gist.github.com/atombender/6bcff2c2d8fec32bc80ce1f57a8e61f6)

------
fulafel
What would a test for this look like, having a lot of processes?

~~~
inaseer
That's a great question. Stress testing, which is what you are suggesting
helps, but is not super effective and often misses bugs. You need tools which
can precisely control the task/go-routine scheduling during testing and
systematically explore the various interleavings which can happen in the
system. We generally don't have good tool support for such testing. There are
promising tools emerging however; here is a case study of one such tool and
how it was used to reliably reproduce and fix a subtle concurrency bug:
[https://cloudblogs.microsoft.com/opensource/2020/07/14/extre...](https://cloudblogs.microsoft.com/opensource/2020/07/14/extreme-
programming-meets-systematic-testing-using-coyote/)

------
jacktang
try erlang!

------
The_rationalist
But with Kotlin flow it is!

------
apta
Using goroutines in golang is not dissimilar from threads in other languages
(except that you can spawn much more of them), with all the downfalls and
gotchas. As a matter of fact, it's even worse than other languages with proper
concurrency libraries and data structures (e.g. it has nothing remotely close
to Java's `java.util.concurrent` package).

What we see with golang is a phenomenon where people just parrot what some
well known figures said at some point in time, without proper evidence or any
basis (and in some cases, even when the evidence is counter to those claims).

I'm looking forward to Java's green thread implementation (project Loom), as
it has proper ways to manage cancellation and deadlines and hierarchies, all
of which have are quite verbose and error prone, or not supported at all in
golang.

