
The Go 1.1 scheduler - thepumpkin1979
http://morsmachine.dk/go-scheduler
======
davidw
Interesting, looks like they've made progress on this:

[https://code.google.com/p/go/issues/detail?id=543](https://code.google.com/p/go/issues/detail?id=543)

Also, from the article:

> Go garbage collector requires that all threads are stopped when running a
> collection and that memory must be in a consistent state.

People often ask about how Go differs from Erlang. That one's a fairly large
difference under the hood. Erlang does GC on a per process (Erlang process,
not OS process) basis.

~~~
krobertson
I hadn't looked at that issue before, but have definitely been bit by it.

On our main codebase, we experienced some major issues when moving from Go 1.0
to 1.1 from this exact issue. We had a goroutine that was doing some remote
calls wrapped with a timeout, and the call was consistently timing out even
though the remote service was perfectly fine (it was another service on the
same box).

We found the cause was another goroutine running something in an for loop that
didn't do anything that would allow a pause in execution for another
goroutine. So, the scheduler just obsessed on that goroutine, running none of
the others until that one was done, and by then the timeout on the remote call
had expired.

We fixed up that case, but also found a few others were we simply added a very
small sleep call... for no other reason than allowing the scheduler to
evaluate other goroutines. Meh. It made sense when we finally tracked it down,
but was one of those things where we had to pause and ask "really?"... and
adding a sleep call with comments "yes, I am really calling sleep".

~~~
georgemcbay
While it would be nice if the programmer never had to worry about this
situation in the first place, when confronted with something like this I use
runtime.Gosched() rather than a sleep call to yield. It more directly performs
what you're attempting to do and is much more clearly self-documenting for
situations where you really don't need to sleep for whatever period of time
but do need to yield.

~~~
krobertson
I hadn't seen runtime.Gosched() before, will take another look at it.
Mentioned it to a coworker and they already knew of it, so maybe it was always
switched to call it instead of sleep. :)

------
hosay123
"Don't read the design doc, it's too complicated. Instead read this cutpaste!"

The single "lock free" idlep looks like it's just moved the futex contention
elsewhere. This will almost certainly bounce like crazy on a many-cored
system. Would be interested to see benchmarks of the new design before
considering it somehow better.

~~~
pron
> The single "lock free" idlep looks like it's just moved the futex contention
> elsewhere.

Possibly. The Go 1.1 scheduler is inspired by Java's fork/join scheduler,
which suffered from the same problem in Java 7. In Java 8 it's been improved
to no longer have a single wait-list, and external submissions of tasks (i.e.
tasks that are not submitted by tasks running in the thread pool, but
elsewhere) are multiplexed randomly (IIRC) among the individual thread queues.

------
Arnor
Thanks for this. After reading the article I was able to go to the design
document and keep my head above water. Go has a lot going for it, but one
underrated aspect is excellent propagation of information about the language.
The talks by the likes of Rob Pike and the writing here and elsewhere drives
the language in popularity and productivity. We learn the language, true, but
we also learn very strong computer science. No wonder people become more
productive when they switch to Go! By using it and reading about it, they get
better at programming! (I'll grant that this is true of learning new languages
in general, but maintain it's especially true with Go...)

------
bfrog
I've said this before. Go's biggest problem will forever be a blocking GC due
to the global heap they've decided to go with versus a hybrid global/goroutine
local heap style like the beam erlang vm has.

The problem will show up when people try to fire up millions of goroutines and
then wonder "why is my latency suddenly spiking in to the seconds! WTF!"

~~~
GhotiFish
That definitely surprised me to learn that go's garbage collection required
all threads stop. That can't possibly manifest as anything other than a
visible and uncontrollable " _THUNK_ " in your application.

I'd love to see a pros/cons comparison between Go's all at once strategy vs
Erlang's per process strategy.

~~~
pcwalton
The pros and cons are pretty simple.

Go's advantage is that you can share data cheaply while retaining memory
safety [1]. The disadvantage is that you have stop-the-world GC and potential
for data races, so you must rely on the race detector. Erlang's advantage is
that you have no data races and no stop-the-world GC (and Erlang's GC is
easier to implement). The disadvantage is that all messages must be copied and
parallel algorithms that require data sharing are more difficult to write.

There are hybrid approaches like Singularity, JS with transferable data
structures, and Rust (disclaimer: I work on Rust). These systems use some form
of static or dynamic access control scheme (for example, uniqueness or
immutability) to control data races and perform memory management for shared
data structures, while retaining Erlang's thread-local GC.

[1] With one exception, the memory unsafe data race in maps and slices. See:
[http://research.swtch.com/gorace](http://research.swtch.com/gorace)

~~~
pron
Go has gone the Java route (I find both runtimes to be somewhat similar --
well other than the whole JIT thing -- with Java some years ahead in terms of
GC and scheduling), and I suppose that Go, too, will get a concurrent GC
sooner or later, but even the concurrent GC on the JVM has a few stop-the-
world phases.

------
thepumpkin1979
I really wish D could have an scheduler and goroutines like Go does. I think D
has the perfect foundation for gorutines by supporting actors [0] and
fibers[1], they just need to be put together by someone clever on the topic.

[0]
[http://dlang.org/phobos/std_concurrency.html](http://dlang.org/phobos/std_concurrency.html)
[1]
[http://dlang.org/phobos/core_thread.html#.Fiber](http://dlang.org/phobos/core_thread.html#.Fiber)

------
parennoob
I don't understand much about processes and threads, but I do know a bit about
queues. What is the meaning of this?

"Once a context has run a goroutine until a scheduling point, it pops a
goroutine off its runqueue, sets stack and instruction pointer and begins
running the goroutine."

Do they mean: "begins running the _next_ goroutine"?

