
Proposal: Non-cooperative goroutine preemption - mseepgood
https://github.com/golang/proposal/blob/master/design/24543-non-cooperative-preemption.md
======
tinco
The way I understand this, and that I think some other people in this thread
misunderstand, is that this is not a change to the Go language in any way.

The way Go works is that you can spawn green threads, and the Go runtime will
magically run them concurrently, in such a way that you do not have to worry
about many problems usually associated with concurrent programming.

This proposal seeks to modify the way the "magically" part works. Right now,
it suggests that what Go does is to 'cooperatively' preempt at function
prologues. This change would instead allow the runtime to preempt at a choice
of "safe points" the compiler suggests throughout your Go code.

Note that this is a little bit in the gray area between cooperative and non-
cooperative. It's not like the runtime will just randomly preempt, it still
requires cooperation from the code to have these safe points. But maybe I'm
misunderstanding the details.

Anyway, not much changes for the Go programmer. The Go programmer can expect
his code to be a little more predictable, at the cost of having his code be a
little less performant, this performance cost I think will not be significant,
if not fully compensated by the increased parallelism. All the promises the Go
runtime makes towards the programmer will still hold.

~~~
masklinn
> This proposal seeks to modify the way the "magically" part works. Right now,
> it suggests that what Go does is to 'cooperatively' preempt at function
> prologues. This change would instead allow the runtime to preempt at a
> choice of "safe points" the compiler suggests throughout your Go code.

That's how it already works, it's just that safepoints only get introduced at
function prologues. This is an issue as CPU-heavy functions without any non-
inlined function calls mean very few safe points which is problematic for GC
and timely scheduling and may even lock up the entire system.

Previous attempts were made to add loop preemption but there are issues with
that approach.

The approach in TFA is to make safe points "opt-out", all Go code would be
considered safe by default with some "unsafe" regions being excluded. As the
author states, this would make Go preemptive rather than cooperative ("I
propose that we implement fully non-cooperative preemption").

> Anyway, not much changes for the Go programmer. The Go programmer can expect
> his code to be a little more predictable

Go programmers can expect their code to be a little _less_ predictable, not
more: currently your code is guaranteed to run uninterrupted between two
function calls, that will not be the case anymore.

In theory it should not make any difference, in practice it will probably
uncover concurrency bug you're currently shielded from by the runtime's
behaviour (though likely not that many as you can already have multiple
routines running in parallel).

~~~
skj
Not just function prologues, but also channel operations. Might sound like a
minor nit but it's a pretty important nit.

~~~
masklinn
Fair enough, though I'd assume that to be considered IO and would very much
expect to be a place which always yields.

------
Cieplak
Interesting old thread comparing actors to software transactional memory
(STM):

[https://www.reddit.com/r/haskell/comments/175tjj/stm_vs_acto...](https://www.reddit.com/r/haskell/comments/175tjj/stm_vs_actors/)

Seems like pre-emptive actor-based concurrency is simple to implement using
STM, but much harder to implement STM with actors.

------
anonacct37
I'm really excited about this both for the improved long tail latency as well
as (what I believe is) a new way of handling this problem.

I'm not aware of another M:N threading model that handles preemption this way
with signals. Erlang accomplishes it with reduction budgets and reduced
performance.

------
PDoyle
Pre-emptive runtimes are hard. Inserting yield points judiciously in loops
isn't that incredibly difficult, and neither is reducing the cost of a yield
point. It's far harder to make sure every machine instruction in every part of
the runtime is prepared to be interrupted and have all it assumptions violated
by the time it resumes.

~~~
bradfitz
> Inserting yield points judiciously in loops isn't that incredibly difficult,
> and neither is reducing the cost of a yield point.

The Go team has done both over the past few years but we haven't been happy
enough with the performance. Hence Austin's proposal.

------
stefanchrobot
I'd love to see this implemented, especially that I already enjoy this feature
a lot in Elixir/Erlang/BEAM.

~~~
dmytrish
The problem with that approach in Go is that BEAM embraces the shared-nothing
approach, and the Go runtime is quite promiscuous about sharing mutable data.
I am afraid that introducing non-deterministic preemptive stops will just make
Go concurrency a mess similar to the mutex/conditionals mess they've been
trying to avoid.

~~~
jerf
"I am afraid that introducing non-deterministic preemptive stops will just
make Go concurrency a mess similar to the mutex/conditionals mess they've been
trying to avoid."

They already took a few steps inside of Go to try to _ensure_ that concurrency
is quite non-deterministic. Even very simple, naive code like

    
    
        func main() {
            for i := 0; i < 2; i++ {
                go func(j int) {
                    fmt.Println("First", j)
                    fmt.Println("Second", j)
                }(i)
            }
            time.Sleep(100*time.Millisecond)
        }
    

will very frequently produce different results; I just ran that four times and
got four different results. This is good, because it tends to bring bugs to
the forefront more quickly, rather than having a almost-but-not-quite-
deterministic model as has been the case in many other environments.

(Yes, I know sleep is not production-quality here, just trying to keep it
simpler than several more lines for a proper waitgroup usage.)

I doubt there's a lot of code written at the Go level that would explode if
you suddenly ran it with a pre-emptive runtime that isn't already exploding.
As the proposal discusses, there's a lot of issues down at the runtime level,
but at the layer of abstraction presented by the language itself I wouldn't
expect a lot of problems. Non-zero, and even then, it would still be things
that are technically already bugs. And Go code is already running
_concurrently_ all the time in the real world. I'm not even sure how I'd
construct an example that runs correctly in the current runtime but fails with
true pre-emption.

~~~
lloeki
> They already took a few steps inside of Go to try to ensure that concurrency
> is quite non-deterministic

I would not be surprised as they are already proactively ensuring things like
map ordering are randomised so that people just can't rely on some
deterministic ordering: run the example code here[0] multiple times and it
produces not just a seemingly random result (as would be for a typical hash
table) but a truly different result _for each run_.

[0]: [https://nathanleclaire.com/blog/2014/04/27/a-surprising-
feat...](https://nathanleclaire.com/blog/2014/04/27/a-surprising-feature-of-
golang-that-colored-me-impressed/)

~~~
kardianos
Yes, map iteration is explicitly randomized for just this thing.

------
nostalgeek
interesting, is the current situation one of the reason Go debugging
experience is so poor? Delve debugger is constantly crashing for me, making
step debugging extremely hard, especially when debugging tests. If there is
one thing that is inferior to Java and C# with Go it's definitely the
debugging experience.

~~~
anonacct37
I've actually never run into a problem debugging go binaries with gdb. Which
is weird because everyone's always told me it doesn't work. I'm sure there are
edge cases but over the last 5 years or so my experience has been pretty good.

emacs + gud mode is a great experience for go debugging. I've even used it to
debug low-level issues like plugin loading and verification.

~~~
isaachier
On Mac, I find lldb is terribly low level. IIRC Linux gcc isn't great either.
Even delve will frequently seem to have missing/optimized variables that
cannot be printed.

~~~
yorwba
I frequently hit the problem with variables that are <optimized out> when
debugging C code using gdb, and it's especially infuriating when I can look at
the disassembly and see that the variable is _right there_ in the register
file.

Does anyone know whether dwarf simply doesn't support the necessary debug
information or whether compilers just don't retain it through the optimization
process?

~~~
twoodfin
The latter. With quality register allocation, the same register can be used
for multiple variables (or unnamed intermediate results) within the span of a
few instructions. The inverse is also true: the same variable could find its
value in multiple registers.

DWARF appears to support mapping symbol locations to registers, but I’ve never
seen a language environment/debugger that takes advantage of it for optimized
binaries.

------
krylon
Can someone make an informed guess on how much that would complicate the
implementation? And/or how much of a performance impact it would have (for
better or for worse)?

------
arghwhat
So, preemptive green-threads? This seems like a counter-productive and highly
complicated re-implementation of what the OS thread model does, but inside the
Go runtime.

I would very much like for goroutines to remain cooperative. It is much
simpler, naturally more performant, and quite easy to reason about. A local
implementation of preemptive execution seems like a very high price to pay to
only handle a few corner-cases.

That is not to say that preemptive scheduling does not have their place: They
make a _lot_ of sense at an OS level where independent processes are
scheduled, but little sense as an internal green-thread implementation.

~~~
mseepgood
> So, preemptive green-threads? This seems like a counter-productive and
> highly complicated re-implementation of what the OS thread model does, but
> inside the Go runtime.

You can't easily spawn 10.000 OS threads.

> I would very much like for goroutines to remain cooperative. It is much
> simpler, naturally more performant, and quite easy to reason about.

Cooperative is less simple to reason about, as the proposal explains.
Preemption fully delivers what the developer already expects of goroutines.

~~~
vidarh
> You can't easily spawn 10.000 OS threads.

I just spawned 10.000 OS threads running separate mRuby instances in each
thread on my laptop, to see if it'd cause any problems (because I happened to
have an app sitting around that spawns separate threads; note: these are not
Ruby threads, mRuby does not have built in threading). None at all. So yes,
you can. I didn't check how much memory it'd take, however, or attempt to
measure overheads in any way, so I'm not saying there aren't potential issues
with it.

~~~
jashmatthews
Simply using 1,000+ OS threads is often very close to as efficient as using
coroutines or an event loop: [https://www.slideshare.net/e456/tyma-
paulmultithreaded1](https://www.slideshare.net/e456/tyma-paulmultithreaded1)

~~~
zzzcpan
If your event loop is as slow as 1000+ coroutines or OS threads, you are doing
it very very wrong.

~~~
arghwhat
[Citation needed]

~~~
jashmatthews
[https://github.com/golang/go/issues/12061](https://github.com/golang/go/issues/12061)

"Mark termination does not have to rescan stacks that haven’t executed since
the last scan"

With pre-emptive scheduling of OS threads, you potentially dirty a ton more
stacks, and have to scan them all for pointers to heap memory. I'm not sure
how VMs using lots of OS threads deal with this.

------
alex7o
They should just make it possible to add safeties manually

~~~
icholy
isn't that what runtime.Gosched does?

