
How Stacks Are Handled in Go - jgrahamc
http://blog.cloudflare.com/how-stacks-are-handled-in-go/?
======
pron
In Quasar[1] -- a library that adds true fibers (just like Go's goroutines) to
the JVM -- we've decided to go with copying stacks as well (this is the
current, and original, implementation). Of course, Java doesn't have pointers
into the stack, so there's no need to fix any pointers when copying, and the
implementation can be in pure Java.

[1]:
[https://github.com/puniverse/quasar](https://github.com/puniverse/quasar)

~~~
smegel
Interesting. Is it like Python's Gevent in that it patches otherwise blocking
calls (say socket.recv) to return to the Quasar runtime so it can switch
fibers?

~~~
pron
It does use bytecode injection, but it doesn't automagically intercept calls.
Quasar's users use fiber-compatible implementations of existing APIs (so the
code is the same, but you have to explicitly call a fiber compatible method).
Those implementations aren't built from the ground up, but thin wrappers
around the original library. Usually, the library's asynchronous API is used
under the covers to implement its synchronous API in a way that's fiber-
rather than thread-blocking.

------
mox1
This may be kind of a dumb question, but does Go use the stdlib C stack or are
they talking about their own stack?

I'm assuming that Go has a standard / vanilla / kernel aware C-style stack
living somewhere right? Then the stack they are referring to is part of their
run time environment...which is actually allocated on the heap somewhere.

This is all so meta. :)

~~~
masklinn
> This may be kind of a dumb question, but does Go use the stdlib C stack or
> are they talking about their own stack?

Their own.

> I'm assuming that Go has a standard / vanilla / kernel aware C-style stack
> living somewhere right?

Not unless it has to call into C, which would require a C-style stack.

> Then the stack they are referring to is part of their run time
> environment...which is actually allocated on the heap somewhere.

Yes, as it is in C really. The difference is that the C stack is bounded and
"fairly limited" (and highly variable, from 16k to a few MB[0]).

[0] [http://lists.gnu.org/archive/html/bug-
coreutils/2009-10/msg0...](http://lists.gnu.org/archive/html/bug-
coreutils/2009-10/msg00262.html)

~~~
4ad
> Yes, as it is in C really.

Not really. What people understand by "the C stack" is the stack that
automatically comes when new threads are created. This stack is not on the
heap; in fact it's at the top of the user address space and it has some
special properties, like the operating system setting guard pages for you.

Pthreads allow you to set your own stack for a thread, but it is a feature
seldom used; though it used by Go when cgo is employed.

~~~
rsc
Go does not choose the thread stack when using pthreads. Pthreads does.

~~~
4ad
You're right, I was confused for a moment. Go doesn't do this. It is possible
though, see pthread_attr_setstackaddr.

------
jerf
So first let me say I assume there's a good answer to this. I'm asking to find
out what it is, not as any sort of criticism.

Why not leave the segmented stacks, and simply refuse to shrink them? When a
stack grows, then shrinks back down, you could just leave yourself a note at
the very end of the stack about where the next stack segment is, so you won't
be reallocating. Plus you could start trimming off the end of the stacks if
you shrank some % of the way back down to zero.

~~~
pcwalton
It's not just the allocation—just the cost of switching between stack segments
is really expensive. Function calls are 2 ns on most architectures; any
overhead can easily make that 5x or 10x slower.

~~~
jerf
Thanks.

I know it's the "wrong language" for you, but after years of working in Perl
and occasionally profiling it, it was a shock to switch to Go and profile
_anything_ in "nanoseconds". To see even a "microsecond" appear in Perl code
is often a bit of a pleasant surprise. To some extent I'm still not used to
being concerned that adding 2ns to a code path might have noticeable
performance impacts....

------
norswap
tl;dr The stack is grown in chunks. Previously, Go maintained a linked list of
chunks, but repeated allocation/deallocation of chunks within loop was bad for
performance. They switched to a "realloc-type" of stack growth (allocate a
bigger region and copy the old stack over).

Now my question: why didn't they simply keep a pool of available stack chunks
to reuse instead of constantly allocating/freeing them?

~~~
rsc
There is (was) a pool. It's not the alloc/free that is the problem. It's all
the bookkeeping that is pure overhead compared to a simple CALL or RET
instruction.

~~~
norswap
Am I missing something or is it strange to speak about "a single CALL
instruction"? It implies you'll have to execute some function, so it can't be
really considered as a single instruction.

If there was a pool, then the article is at least misleading on that point
because it seems to imply that in loops, you repeatedly allocate and free,
which would not be necessary with a pool.

But yes, in the end it comes down to see whether pool/linked-list bookkeeping
is better or worse to data copy + pointer bookkeeping.

~~~
rsc
A single CALL instruction on the x86 does a very limited amount: it pushes the
address of the next instruction onto the stack, and then it sets the
instruction pointer to the thing that was called. What happens next is up to
the function and not part of the CALL instruction.

Allocate and free are perfectly reasonable terms to describe the use of a pool
too.

------
justinsb
This article is not technically accurate.

Most importantly, the distinction between virtual-memory stacks and copying
stacks is incorrect: if the process uses more stack space than it has physical
RAM, it is going to (try to) swap, whether virtual or copying.

~~~
known
I believe the process will crash.

------
haberman
I think Rust also moved away from segmented stacks? Seems like a pretty strong
signal that segmented stacks aren't the way to go when everyone is moving away
from them.

[https://mail.mozilla.org/pipermail/rust-
dev/2013-November/00...](https://mail.mozilla.org/pipermail/rust-
dev/2013-November/006314.html)

------
ufo
Moving the coroutine stack when its too big is also how Lua does it, isn't it?

------
cbsmith
So, they're handled basically how most coroutine runtimes handle them...

------
olegp
Would be great to see this added to fibers in Node.js.

