> I think it is because of the way go does the stack which is unusual. Perhaps that is done to facilitate green threads.
Kinda. C's stacks are big (1~8MB default depending on the system IIRC), and while that's mostly vmem Go still doesn't want to pay for a full C stack per goroutine, plus since it has a runtime it can make different assumption and grow the stack dynamically if necessary.
So rather than set up a C stack per goroutine, Go sets up its own stack (initially 8K, reduced to 2K in 1.4) and if it hits a stack overflow it copies the existing stack to a new one (similar to hitting the limit on a vector).
But C can't handle that, it expects enough stacks, and it's got no idea where the stack ends or how to resize it (the underlying platform just faults the program on stack overflow), so you can't just jump to C code from Go code, you need an actual C stack for things to work, and that makes every C call from Go very expensive.
Rust used to do that as well, but decided to leave it behind as it went lower level and fast C interop was more important than builtin green threads.
Erlang does something similar to Go (by default a process has ~2.6K allocated, of which ~1.8K is for the process's heap and stack) but the FFI is more involved (and the base language slower) so you can't just go "I'll just import cgo and call that library" and then everybody dies.
Really? Thats not my understanding, its much smarter than that. Goroutines steal memory from the current pthread machine stack, that is, the machine stack. The problem calling C from a goroutine is that whilst you have a real C stack right there .. other goroutines expect to be able to steal memory from it, and once you call C you don't know how much stack C is going to use. So whilst C is running, goroutines cannot allocate stack memory, which means they cannot call subroutines. The only way to fix that is to give the C function being called its own stack.
The problem you're going to have is that if 10K goroutines all call PCRE you need 10K stacks, because all the calls are (potentially) concurrent.
What makes go work is that the compiler calculates how much local memory a goroutine requires and so after a serialised bump of the stack pointer the routine cannot run out of stack. Serialising the bump between competing goroutines is extremely fast (no locks required). Deallocation is trickier, I think go uses copy collection, i.e. it copies the stack when it runs out of address space on the stack, NOT because its out of memory (the OS can always add to the end), but because the copying compacts the stack by not copying unused blocks. Its a stock standard garbage collection algorithm .. used in a novel way.
The core of Go is very smart. Pity about the rest of the language.
> Really? Thats not my understanding, its much smarter than that. Goroutines steal memory from the current pthread machine stack, that is, the machine stack.
There is no "machine stack", and yes in the details it tries to set up and memoise C stacks, but it still need to switch out the stack and copy a bunch of crap onto there, and that's expensive.
Kinda. C's stacks are big (1~8MB default depending on the system IIRC), and while that's mostly vmem Go still doesn't want to pay for a full C stack per goroutine, plus since it has a runtime it can make different assumption and grow the stack dynamically if necessary.
So rather than set up a C stack per goroutine, Go sets up its own stack (initially 8K, reduced to 2K in 1.4) and if it hits a stack overflow it copies the existing stack to a new one (similar to hitting the limit on a vector).
But C can't handle that, it expects enough stacks, and it's got no idea where the stack ends or how to resize it (the underlying platform just faults the program on stack overflow), so you can't just jump to C code from Go code, you need an actual C stack for things to work, and that makes every C call from Go very expensive.
Rust used to do that as well, but decided to leave it behind as it went lower level and fast C interop was more important than builtin green threads.
Erlang does something similar to Go (by default a process has ~2.6K allocated, of which ~1.8K is for the process's heap and stack) but the FFI is more involved (and the base language slower) so you can't just go "I'll just import cgo and call that library" and then everybody dies.