I'm assuming that Go has a standard / vanilla / kernel aware C-style stack living somewhere right? Then the stack they are referring to is part of their run time environment...which is actually allocated on the heap somewhere.
This is all so meta. :)
> I'm assuming that Go has a standard / vanilla / kernel aware C-style stack living somewhere right?
Not unless it has to call into C, which would require a C-style stack.
> Then the stack they are referring to is part of their run time environment...which is actually allocated on the heap somewhere.
Yes, as it is in C really. The difference is that the C stack is bounded and "fairly limited" (and highly variable, from 16k to a few MB).
Not really. What people understand by "the C stack" is the stack that automatically comes when new threads are created. This stack is not on the heap; in fact it's at the top of the user address space and it has some special properties, like the operating system setting guard pages for you.
Pthreads allow you to set your own stack for a thread, but it is a feature seldom used; though it used by Go when cgo is employed.
// In case of cgo or Solaris, pthread_create will make us a stack.
// Windows and Plan 9 will layout sched stack on OS stack.
Let's take a look at runtime·malg again. runtime·malg
calls go/src/runtime/stack.c:^/runtime·stackalloc which uses
runtime·sysAlloc to allocate memory. Moving back to Linux,
go/src/runtime/mem_linux.c:/^runtime·sysAlloc just calls mmap(2).
So, the answer to "who allocates system stacks" seems to be "it depends", but I already qualified this in my original answer (omitting cgo; sorry about that). Now, is there anything wrong in my description?
If you want to call into C code (it's rather unrealistic not to have any C in any of your library dependencies), then you need to make a C-style stack available, following the conventions of the system (e.g. growing downwards vs upwards, using a guard page for incremental allocation, etc.) along with enough reserve for typical C maximum call stack depth. It needs to be set up before the call to C, but it doesn't need to hang around when C code is not in the chain of current procedure activations.
The requirements of the stuff pointed to by the stack pointer for it to be considered stack is defined by the platform ABI, which in turn is heavily dependent on CPU architecture. But as long as you meet the rules, you're fine, with regard to the OS. Third party code may be more sniffy.
Some coroutine implementations copy the coroutine stacks back into the main C stack before resuming execution but usually its not worth the trouble to do that and you just run everything on the heap.
The question remains how Go's new runtime deals with calling out to third-party libraries that don't obey their new stack protocol.
Why not leave the segmented stacks, and simply refuse to shrink them? When a stack grows, then shrinks back down, you could just leave yourself a note at the very end of the stack about where the next stack segment is, so you won't be reallocating. Plus you could start trimming off the end of the stacks if you shrank some % of the way back down to zero.
I know it's the "wrong language" for you, but after years of working in Perl and occasionally profiling it, it was a shock to switch to Go and profile anything in "nanoseconds". To see even a "microsecond" appear in Perl code is often a bit of a pleasant surprise. To some extent I'm still not used to being concerned that adding 2ns to a code path might have noticeable performance impacts....
Now my question: why didn't they simply keep a pool of available stack chunks to reuse instead of constantly allocating/freeing them?
This literally hurts my brain. They are moving pointers in memory for no reason, because they cling to the notion that tightly packed stacks are needed, because they want to support 32-bit processors as first class, because they want to have billions of threads at once.
But every step of that thought process is wrong. You don't want to have billions or even millions of threads at once because then your latency is large and scheduling and order of operations are unpredictable and unstable. You don't want to support 32-bit processors as first-class because even new phones are 64-bit nowadays. You don't want tightly packed stacks because it's a complicated waste to move them and grow them and virtual memory does the same job better.
I just don't get it, there's just so much wasted time and talent wrapped up in this language. I mean they created an entire toolchain just for something that should be a few k of runtime, and all it does in the end is create inefficiencies and problems with interop with everything else.
If there was a pool, then the article is at least misleading on that point because it seems to imply that in loops, you repeatedly allocate and free, which would not be necessary with a pool.
But yes, in the end it comes down to see whether pool/linked-list bookkeeping is better or worse to data copy + pointer bookkeeping.
Allocate and free are perfectly reasonable terms to describe the use of a pool too.
Most importantly, the distinction between virtual-memory stacks and copying stacks is incorrect: if the process uses more stack space than it has physical RAM, it is going to (try to) swap, whether virtual or copying.