You need alloca in virtual machines to allocate local variable frames, if you have opted for using the native stack for doing so. These frames are dynamic; their size comes from some field pulled from a function object. Alloca is also useful in resuming stack-snapshot-based delimited continuations. These objects have to be planted into the stack to work properly. They also have a variable size: whatever it took to capture them up to their delimited prompt from.
Basically, C function calls are doing the equivalent of alloca all the time. The size of stack needed by a function call is not known in advance. It may be a constant in that particular function, but what function is called depends on run-time tests, and indirect calls. (*fptr)(foo) could call one of fifty possible functions, whose stack consumption ranges from 50 bytes to 50,000. That's kind of like calling alloca with a random number between 50 and 50,000. And like with alloca, there is no detection. C is not robust against the problem of "no more room for the next stack frame"; it is undefined behavior.
The simplified idea that "all alloca is bad" comes from a parochial, in-the-box thinking.
(Sure, alloca for, say, temporary storage for catenating strings that come from "read a line of any length" routines, whose inputs are controlled by a remote attacker, is undeniably bad. However, even without using alloca, we can cause the problem that some input available to an attacker controls the depth of some (non-tail-optimized) recursion.)
Interesting article, but I feel like the conclusion using `STACK_SIZE` doesn't make any sense.
If you are willing to publish that in a public header, you will break any code if your stack struct changes size. Like, I get the point of using `alloca(stack_size())`... but I don't see why `alloca(STACK_SIZE)` is any better than `struct stack my_stack;`.
The value passed to alloca, in this case, will be a reasonably small, bounded integer that can reasonably be expected not to be controllable by a remote attacker when the application is running. (Or so it has to be painstakingly ensured.)
An alternative would be to use padded structs: when the struct is newly introduced, make it substantially larger than necessary. Then all the clients are reserving the room already. A transparent (or not) union between the real struct type and some array can be used, or padding members. Additional requirements may be that the unused stuff must be initialized to zero by everyone, or else some version field has to be initialized or whatever.
Versioned symbols are another solution. Binary clients that are allocating the smaller, older structure are routed to compatibility functions, at least for those functions where it matters. This approach is seen inside glibc for instance.
An approach found in numerous places in Microsoft's WIN32 is to store the structure's size, as known at compile time to the given client, into a dedicated size field. The API then knows it is called by older compiled clients when the size they are passing is smaller than the current sizeof (that_struct).
The idea with an ABI (an Application Binary Interface rather than an API) is that there is no need to re-compile, existing binaries still work since the binary interface stays the same.
Is that still relevant though (or rather, is the cost / benefit ratio of ABI stability still worth it ?). My laptop from last year is able to bootstrap and build a full yocto system from gcc to the kernel to glibc to X11 to Qt5 in ~15 hours.
Of course it's still relevant. As long as dynamic loading is relevant, stable ABIs are relevant.
If someone finds a security bug in OpenSSL (Heartbleed, for instance), you only need a single update to the OpenSSL libs to fix every program that dynamically link it. But that only works if you have a stable ABI: otherwise, every single program would have to be recompiled to fix the bug.
Not everyone has a top level computer, not everyone is an expert on rebuilding distributions from scratch, not everything is available as source code to build from, not every OS is a GNU/Linux clone.
> Not everyone has a top level computer, not everyone is an expert on rebuilding distributions from scratch
you aren't but your distro maintainer should be
> not everything is available as source code to build from,
and most software not available as source code actually do ship their dependencies because it would be madness to suppose that there will never be any ABI or API break at any point in the future
> not every OS is a GNU/Linux clone
GNU/Linux is actually the exception here - the two big others just ship and replace the whole OS on every major update (and I've heard that this was the case for minor windows update too nowadays, not sure how much this is true)
That's a very big cost for the end users, who'd have to download the equivalent of an official ISO release for each minor update to some base component. According to apt, thousands of packages depend on openssl.
> That's cute, but how long does it take to push that updated system out to your customers
I have never seen a proprietary software that does not ship all its shared libraries along anyways - so, about the same time I would say ? eg. look at all the linux games that just come with the entire ubuntu 12.04 or 14.04 userspace.
The real tragedy of the C language is that VLA are not implemented safely in any known implementation. They are so natural to use! Any pair of malloc/free on the same scope could be cleanly replaced by a single VLA declaration.
All C compilers I know either have VLA or alloca.
In one of my projects, a bit of macro magic uses either VLA or alloca to generate a temporary array. Works great! I don't have further requirements here.
Please explain what you mean by "safe" in this context.
I mean a way to control for stack overflow. I would like to allocate very large images as VLA, where the size is only known at runtime. If I malloc a too large image, the systems asks for more memory or malloc fails. If I declare a VLA too large, it fails silently in an unrelated part of my program due to stack corruption.
Correct implementations use stack probing (access one word for every page in the allocation) to guarantee that a stack overflow happens immediately rather than resulting in stack corruption.
The stack is not a concept belonging to the C language definition, it is just an implementation choice made by some compiler vendors (all that I know of). The semantics for VLA are clear and very useful. I would certainly not expect that the memory for VLA is implemented on the same stack as the function call stack. My complain for C programming (what I called its "tragedy") is that no compiler provides a mechanism for safely using all available memory via VLA.
Edit: you may answer: "use alloca then". But the syntax for alloca is not that convenient, and the semantics are very different. A common use for VLA is to declare a temporary array inside the body of a loop, this is not possible with alloca.
I used to work on a (now defunct) RTOS whose C runtime didn't support alloca. Functions were all compiled down to a portable bytecode representation that had to statically declare the stack usage, so there was no way to have a function with variable stack size. Not having a real alloca was occasionally an issue for porting C code to it (but less annoying than not having fork()).
> Your running program's char x[4096] could go spectacularly wrong.
This is a tiny size, and it would never pose a problem. I'm talking about allocating a few high-resolution video frames on a VLA for temporary computation. It is extremely natural to declare these temporary space inside a for loop as
for (...)
{
float frame[3*width*height]; // temporary data
...
}
but due to the "unsafety" of the VLA implementation it is not very portable. To make it safe in my linux computer I just set "ulimit -s" to a huge value and it works. But it is not very practical to distribute such code to others.
VLAs were added to satisfy the FORTRAN crowd, who are used to efficiently instantiating huge arrays on the stack. AFAIU, you can't catch stack overflow in FORTRAN either.
That doesn't mean that they're not safe, so long as the program reliably terminates[1]. And you can catch stack overflow, it's just not easy, and made especially difficult from a software architecture perspective because neither C nor POSIX support per-thread signal handlers, so catching stack overflow requires cooperation across components and libraries. (Fortunately, POSIX does provide per-thread signal stacks via sigaltstack(2). And I suppose it should be possible to write a shared library that interposes pthread_create and signal/sigaction to support per-thread signal handlers.)
TL;DR: VLAs are fine if you're doing something you might otherwise be doing in FORTRAN. Otherwise, just don't. It's trivial to use GNU obstacks or to write your own, as I do. Just don't go down the rabbit whole of trying to solve all allocation needs; it's not a solvable problem.
[1] Sadly, before Stack Clash was published neither GCC nor clang/LLVM did this :(
> Accessing a char value as if it were a different type just isn’t allowed.
Is this still bad if you have the proper alignment (by attribute or implicit placement in a struct), or have it wrapped in a packed struct? There's a lot of code that does this, e.g. overloading the definition of struct sockaddr.
sockaddr is problematic. Semi-recently glibc had to add compiler magic (may_alias IIRC) to prevent breakage (posix says it has to work, so it is up to the implementer to make it work).
Basically, C function calls are doing the equivalent of alloca all the time. The size of stack needed by a function call is not known in advance. It may be a constant in that particular function, but what function is called depends on run-time tests, and indirect calls. (*fptr)(foo) could call one of fifty possible functions, whose stack consumption ranges from 50 bytes to 50,000. That's kind of like calling alloca with a random number between 50 and 50,000. And like with alloca, there is no detection. C is not robust against the problem of "no more room for the next stack frame"; it is undefined behavior.
The simplified idea that "all alloca is bad" comes from a parochial, in-the-box thinking.
(Sure, alloca for, say, temporary storage for catenating strings that come from "read a line of any length" routines, whose inputs are controlled by a remote attacker, is undeniably bad. However, even without using alloca, we can cause the problem that some input available to an attacker controls the depth of some (non-tail-optimized) recursion.)