Ask HN: Why is the stack so constrained?

qlk1123 · on April 14, 2021

While your question is pretty general, your perspective is quite limited to C language. Some other languages, such as Go, can have dynamic-sized stack allocation. (Actually, the user does not even bother stack or heap is used in Go)

Take Linux, a giant user of C language, as an example. Allocating a large chuck of memory on stack is just not useful, even given unlimited amount of memory. You note the most critical reason yourself in the post: (assuming that data doesn't need to be returned outside of the current stack frame) However, large data structures are almost always for others to use. Just consider sys_clone (that the kernel eventually generates a body for the new thread) or sys_mmap (that the kernel manipulates existing virtual memory area structures from elsewhere). Allocating them on the stack seems pointless.

brundolf · on April 14, 2021

What I actually had in mind was Rust. Broadly speaking Rust requires all stack-allocated values to be sized at compile time. Also, thanks to borrow-checking, code is encouraged to stay fairly stack-oriented, so at least I personally end up with lots of function-local data structures that get passed by reference down to other logic. So I think "pointless" is overstating it.

However, the bigger question wasn't about the size limit but about being runtime-dynamic. Why can't a function's argument have a dynamic size, and the stack frame just allocates the needed space at call-time? Even if we're not talking about large structures, this would be useful for things like (for example) ad-hoc union types (as opposed to enums, where all variants have to be known at compile time and are packed into a single bit-width regardless of the value). Maybe you could even get rid of some of the code-duplication that happens when generic functions get reified.

I assume there's a good reason, I just don't see what it is.

qlk1123 · on April 14, 2021

Let's look at this question from the compiler's point of view : What will be the assembly sequence are you going to generate for that dynamic-sized-argument runtime behavior?

For compiled languages it is handled with calling convention, and most architectures (ARM, PPC, RISC-V, ...) have register-based calling convention. Only when all the argument registers are consumed, the function call pass the rest arguments on the stack.

maltalex · on April 14, 2021

What you may be missing is that the stack is a lot faster than the heap.

First, stack allocations and de-allocations are extremely cheap. Allocating a known amount of stack memory requires nothing more than moving a pointer. The same goes for de-allocation.

Second, accessing stack memory is typically a LOT faster than heap memory since it's continuous and can be loaded in big chunks into the CPU's cache.

brundolf · on April 14, 2021

That's exactly why I'm wondering why we can't use it for more things :)

bjourne · on April 14, 2021

Large stack allocations are discouraged on modern CPUs because call stacks are placed in relatively small stack segments. If these overflow you get stack overflows. You can use the limit command to see default size of the stack segment on your machine. On mine it is eight megabytes which is almost nothing. :)

In "normal" C code, the layout of all call frames (stack frames) is fixed. The location of all local variables relative to the top of the stack is known at compile-time. However, in functions that use alloca, or Variable Length Arrays (VLAs), the layout of the call frame is potentially different in each function call. Therefore, the compiler has to insert code to keep track of where all local variables are and it also makes some compiler optimizations harder.

Look at the machine code generated for the following function using godbolt.org. It is way more complicated than if a and b's sizes had been compile-time constants:

    void fun(int x, int y) {
        int a[x], b[y];
        b[5] = 10;
    }

Dynamically sized call frames are a PITA to implement correctly and efficiently which I suspect is the reason why alloca/VLAs are poorly supported in many compilers.

There's some good discussion here about VLAs and alloca: https://stackoverflow.com/questions/1018853/why-is-the-use-o... Although be aware that some of the answers are wrong so read the comments too.

Even with good support for VLAs/alloca, large stack allocations violates the assumption that the stack segment is large enough. You may be a tidy coder, so you check whether your mallocs returns NULL, but I bet you have never ever checked whether you have enough stack space for a function call. :) Most of the time this works fine because relatively little data is placed on the stack (lots of recursive tree-traversal code, however, breaks badly). So you have to keep track of how much stack space you have left and grow the stack segment if you are nearing exhaustion. For example, if you only have 100 bytes left but want to create a 200 byte local variable, you would allocate a new stack segment, perhaps twice as large as the old one, copy the old stack to the new segment and continue from there. Unfortunately, it would be a major performance drag since you would need to check that you have enough stack space for every dynamic stack allocation and for every call frame.

Some virtual machines use guard pages to check for stack overflows. But the amount of overflow they can detect is limited to the size of the guard page. For example, if you have 64 kb guard pages they would be too small to detect overflows caused by 100 kb stack allocations. So you don't get around the need for explicit checks.

brundolf · on April 14, 2021

> You can use the limit command to see default size of the stack segment on your machine. On mine it is eight megabytes which is almost nothing.

Sure, but this can be configured :) Someone in one forum said they were able to set their system's stack size to "unlimited" and were able to utilize more than 500MB of stack space in a quick test

> However, in functions that use alloca, or Variable Length Arrays (VLAs), the layout of the call frame is potentially different in each function call. Therefore, the compiler has to insert code to keep track of where all local variables are and it also makes some compiler optimizations harder.

> For example, if you only have 100 bytes left but want to create a 200 byte local variable, you would allocate a new stack segment, perhaps twice as large as the old one, copy the old stack to the new segment and continue from there. Unfortunately, it would be a major performance drag since you would need to check that you have enough stack space for every dynamic stack allocation and for every call frame.

This is the closest I've heard to a real explanation so far. Still- if the downside is just performance, would the cache benefits not potentially outweigh the performance downsides? I'm not a compiler engineer, but if I were I would definitely do some experiments around this. Maybe people already have and it's known to be a dead end?

bjourne · on April 15, 2021

> Sure, but this can be configured :) Someone in one forum said they were able to set their system's stack size to "unlimited" and were able to utilize more than 500MB of stack space in a quick test

Yeah, but run the same test in a multithreaded program and it is no longer that easy.

> This is the closest I've heard to a real explanation so far. Still- if the downside is just performance, would the cache benefits not potentially outweigh the performance downsides? I'm not a compiler engineer, but if I were I would definitely do some experiments around this. Maybe people already have and it's known to be a dead end?

The performance downsides is that every allocation on the stack could cause an overflow and therefore needs to be checked. Every CALL instruction and everytime you declare a local variable. Maybe with smart compiler optimizations you could get rid of some of these checks, but I don't think you could get rid of them entirely. The other big problem that I didn't mention in my last comment is that the language needs to support movable pointers. If you move a stack to a new memory segment all pointers to locations in the stack suddenly becomes invalid and has to be updated.

For small and fixed allocations, allocating on the stack when possible is certainly a good performance boost over heap allocations, but the advantage may not be very big for large dynamic allocations.