
Abandoning segmented stacks in Rust - dbaupp
https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html
======
cespare
As noted in the link, Go is also moving to contiguous stacks[1]. This is
supposed to be implemented in 1.3.

[1]
[https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzW...](https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub)

~~~
afhof
I thought the segmented stack was the key to having hundreds of thousands of
userland threads. It seems pretty farfetched (though not impossible) that
realloc'ing the entire thread stack when it gets too big.

~~~
qznc
I think Go is primarily designed for servers and servers are overwhelmingly
64bit these days. In a 64bit virtual address space, you can have millions of
contiguous stacks each of gigabytes size.

~~~
tomp
Not really. Assuming that page size is 4KB, and given that for most systems,
the minimal amount of memory that can be committed is a page, you will exhaust
all 4GB of your server's physical RAM with only 1 100 000 threads.

~~~
eliben
I don't think many real servers have 4 GB these days. The dev box under my
desk has 32. Linux servers often have 1 TB of physical memory. So based on
your calculation that's 256 million threads. Sounds enough :)

~~~
ithkuil
The whole point of a cheap concurrency like goroutines is the ability to avoid
using callback based async IO (i.e. the go standard libraries perform
select/epoll/... under the hood but your code behaves like if it was blocking
on IO).

You might have better use of your memory (caching, in memory databases) than
wasting it for even a fraction of a million of routines that are just waiting
for something to happen.

------
axaxs
I'm a big fan of Mozilla and the Go team, but do recognize this is something
Dlang's Walter Bright criticized from the beginning. As I assume all parties
know much more than I do, I wonder what reason both languages chose this
route...

~~~
Ygg2
I'd rather someone tries segmented stacks and determines it isn't a good fit
than to just discard it on a comment from something someone said. This way
future researchers have more data points, whereas just dismissing out of hand
leaves a rather big question mark.

~~~
axaxs
Well, I agree I'd rather someone try it for another data point. However, this
isn't just a random internet comment, it's a man who has probably written more
compiler code than about anyone of recent times .

~~~
Ygg2
That probably is true, but lots of great men were wrong about stuff that were
in their domain - see Einstein and Cosmological constant.

------
jpollock
The hot split problem is an example of a poor grow/shrink algorithm, and the
contiguous stack would exhibit the same behaviour if the growth and shrink
thresholds are the same.

How about avoiding hot split by changing the deallocation algorithm to allow
empty blocks and see what happens?

~~~
tomp
> The hot split problem is an example of a poor grow/shrink algorithm

Not really. With segmented stacks, when you hit the end of a stack, you must
(1) allocate a new segment to extend the stack and (2) copy the current
function frame to the new segment and resume execution. Then, when the
function returns, you have to (3) deallocate the new segment and shrink the
stack, and (4) resume execution of the last function frame on the previous
stack segment.

Now, steps (1) and (3) are optional, but steps (2) and (4) are not, and incur
a significant overhead (compared with non-segmented, contiguous stack, e.g. C
stack).

~~~
Daniel_Newby
You could add step (5) recognize that you have a hot split and amend step #2
to copy all of the hot frames to the new segment.

Their point is that if you have done something wrong if you have to
dynamically turn on profiling to keep the stack from catching on fire.

~~~
tomp
That's a possible solution, but it requires that stack is relocatable (i.e.
that you can find and rewrite all pointers to data on the stack), which is a
significant complication with regards to a segmented stacks implementation. If
you have a relocatable stack, you might as well use copy-the-whole-stack
strategy with continuous stacks.

~~~
simonmar
Segmented stacks are better even if you have a relocatable stack. GHC switched
from monolithic copy-the-whole-thing-to-grow-it to segmented stacks a while
ago, and it was a big win. Not just because we waste less space, but because
the GC knows when individual stack segments are dirty so that it doesn't need
to traverse the whole stack. To avoid the thrashing issue we copy the top 1KB
of the previous stack chunk into the new stack chunk when we allocate a new
chunk.

------
akavel
_" [...] I expect that on 64-bit platforms the number of concurrent tasks will
be comparable to using segmented stacks. On 32-bit platforms, with their
limited address space, the situation will not be as good, but this is a
calculated risk that we can live without the same amount of concurrency
there."_

I thought that Rust is meant to aim for a "C replacement", but then doesn't
the above quote raise some concerns towards ARMs and more embedded targets
(e.g. AVRs, ...)? Or maybe my hopes were misguided from the beginning, I start
to think now, given multithreading etc.? I'm confused now.

~~~
dbaupp
Embedded targets normally don't want segmented stacks anyway, and running
hundreds of thousands of threads/tasks (which is what segmented stacks allows)
is unlikely to be particularly useful(?).

In any case, this ML post is basically saying "Rust is reverting to C-like
stack behaviour" (with some checks to handle stack-overflow safely (i.e.
memory safety & security), the current implementation just aborts the whole
process on stack overflow).

------
mattgreenrocks
Disclaimer: not a language designer/implementer (yet).

Couldn't segmented stacks for async/await-style IO be simulated by swapping
the stack out at every await? I mean copying out the registers and the raw
stack to some heap location, then restoring them by reversing the process
later.

(One problem I see is that it'd be slow when lots of IO operations are
involved.)

~~~
pcwalton
Doesn't work if threads can have pointers into each other's stacks (which we'd
like to support someday for fork/join parallelism).

