Okay. How do you tell the kernel that? Sure, the kernel will have put a guard pa...

Dylan16807 · on Aug 4, 2022

I think the normal pattern is a stack probe every page or so when there's a sufficiently large allocation. There's no need to check the stack pointer all the time.

But that's not my point. If the compiler/runtime knows it will blow up if you have an allocation over 4KB or so, then it needs to do something to mitigate or reject allocations like that.

anyfoo · on Aug 4, 2022

> I think the normal pattern is a stack probe every page or so when there's a sufficiently large allocation.

What exactly are you doing there, in kernel code?

> But that's not my point. If the compiler/runtime knows it will blow up if you have an allocation over 4KB or so, then it needs to do something to mitigate or reject allocations like that.

Do what exactly? Just reject stack allocations that are larger than the cluster of guard pages? And keep book of past allocations? A lot of that needs to happen at runtime, since the compiler doesn't know the size with VLAs.

It's not impossible and mitigations exist, but it is pretty "extra". gcc has -fstack-check that (I think) does something there.

Dylan16807 · on Aug 4, 2022

> What exactly are you doing there, in kernel code?

In kernel code?

What you're doing is triggering the guard page over and over if the stack is pushing into new territory.

> Do what exactly? Just reject stack allocations that are larger than the cluster of guard pages? And keep book of past allocations? A lot of that needs to happen at runtime, since the compiler doesn't know the size with VLAs.

Just hit the guard pages. You don't need to know the stack size or have any bookkeeping to do that, you just prod a byte every page_size. And you only need to do that for allocations that are very big. In normal code it's just a single not-taken branch for each VLA.

anyfoo · on Aug 4, 2022

That seems to be what -fstack-check for gcc is doing:

"If neither of the above are true, GCC will generate code to periodically “probe” the stack pointer using the values of the macros defined below."[1]

I guess I'm wondering why this isn't always on if it solves the problem with negligible cost? Genuine question, not trying to make a point.

[1] https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html

Dylan16807 · on Aug 4, 2022

What I'm finding in a quick search is:

* It should be fast, but I haven't found a benchmark.

* There appear to be some issues of signals hitting at the wrong time vs. angering valgrind, depending on probe timing.

* Probes like this are mandatory on windows to make sure the stack is allocated, so it can't be that bad.

anyfoo · on Aug 4, 2022

I'm mostly interested in it for kernel code though, so the second point at least does not apply, at least not directly. Maybe there is something analogous when preempting kernel threads, I haven't thought it through at all. But interesting.

Joker_vD · on Aug 5, 2022

Because it's on by default in MSVC [0], and we all know that whatever technical decisions MS makes, they're superior to whatever technical decision the GNU people make. /s

Speaking seriously, I too would like an answer.

[0] https://docs.microsoft.com/en-us/windows/win32/devnotes/-win...

pjmlp · on Aug 5, 2022

One decision Microsoft made was not to support VLAs at all, even after their new found C love.

bjourne · on Aug 5, 2022

An attacker would first trigger a large VLA-allocation that puts the stack pointer within a few bytes of the guard page. Then they would just have the kernel put a return address or two on the stack and that would be enough to cause a page fault. The only way to guard against that would be to check that every CALL instruction has enough stack space which is infeasible.

Dylan16807 · on Aug 5, 2022

But that's the entire point of the guard page, it causes a page fault. That's not corruption.

Denial of service by trying to allocate something too big for the stack is obvious. I'm asking about how corruption is supposed to happen on a reasonable platform.

naasking · on Aug 5, 2022

Perhaps they're trying to guard against introducing easy vulnerabilities on unreasonable platforms. With VLAs unskilled developers can perhaps more easily introduce this problem. It would be a case of bad platforms and bad developers ruining it for the rest.

Someone · on Aug 5, 2022

An attacker could trigger a large VLA allocation that jumps over the guard page, and a write to that allocation. That write would start _below_ the guard page, so damage would be done before the page fault occurs (ideally, that write wouldn’t touch the guard page and there wouldn’t be a page fault but that typically is harder to do; the VLA memory allocation typically is done to be fully used)

Triggering use of the injected code may require another call timed precisely to hit the changed code before the page fault occurs.

Of course, the compiler could and should check for stack allocations that may jump over guard pages and abort the program (or, if in a syscall, the OS) or grow the stack when needed. Also, VLAs aren’t needed for this. If the programmer creates a multi-megabyte local array, this happens, too (and that can happen accidentally, for example when increasing a #define and recompiling)

The lesson is, though, that guard pages alone don’t fully protect against such attacks. The compiler must check total stack space allocated by a function, and, if it can’t determine that that’s under the size of your guard page, insert code to do additional runtime checks.

I don’t see that as a reason to outright ban VLAs, though.

bjourne · on Aug 5, 2022

VLAs give the attacker an extra attack vector. The size of the VLA is runtime-determined and potentially controlled by user input. Thus, the only safe way to handle VLAs is to check that there is enough stack space for every VLA allocation. Which may be prohibitively expensive and even impossible on some embedded platforms. Stack overflows may happen for other reasons too, but letting programmers put dynamic allocations on the stack is just asking for trouble.

Someone · on Aug 6, 2022

I don’t think “may be prohibitively expensive and even impossible on some embedded platforms” is a strong argument for not including it in C. There are many other features in C for which that holds, such as recursion, dynamic memory allocation, or even floating point.