"I just want to publicly thank Qualys for working with the Open Source community so we (Linux and BSD) could all get this fixed properly. There was a lot of work from everyone involved and it all went pretty smoothly."
And just a sample from the Qualys announcement (out of many more), "local-root exploit against Exim", "local-root exploit against Sudo (Debian, Ubuntu, CentOS)", "local-root exploit against /bin/su", "local-root exploit against ld.so and most SUID-root binaries (Debian, Ubuntu, Fedora, CentOS)", "local-root exploit against /usr/bin/rsh (Solaris 11)", as well as proof of concepts for OpenBSD, FreeBSD and so on.
Yup, it is, Qualys did some amazing work, and I suspect there are still more problems like this. So if you find stuff like this please let us (Red Hat / etc.) help you help us help everyone (I think that parses correctly =).
That is a crazy number of CVEs. At a quick glace I am seeing a lot of local root exploits. Generally speaking if a attacker has an account on your system you are already hosed. But this doesn't bode well for more vulnerabilities of this nature that don't require local root.
On the other hand, all of these are local exploits against userspace software, not against the kernel. So assuming your container's security model works (... which is a big assumption), the most an attacker can do is gain privileges within the container, if you have setuid-container-root binaries inside, or perhaps get arbitrary code execution or something inside the container. But they shouldn't be able to escape.
(That said, the same class of attack is in theory possible in the kernel, although the kernel tends to be way more disciplined about stack usage than userspace software.)
As we have seen with old SSL versions and Windows XP, it will be 10-15 years before the whole world is migrated onto containerized software, so this exploit should have a long, long active life.
Well, none of these exploits are container-specific. They're against setuid binaries, and it just so happens that setuid binaries in CLONE_NEWUSER containers only give you root in the container, and not real root. Containers that don't use CLONE_NEWUSER and have setuid-root binaries do give you real root, though, and I believe e.g. Docker doesn't do user namespaces by default.
Another perfectly valid way of working around the exploits is to just have an installation with no setuid binaries or file capabilities. This is difficult in a general-purpose OS that needs to be backwards compatible with traditional UNIXy things, but usually totally achievable if you're building a machine to run some specific service. (And in particular it should be really easy within a container, such that these attacks would get you neither container root nor host root.)
A contained-process-to-actual-root exploit is basically either going to be an arbitrary code execution vulnerability in the kernel itself, which tends to be referred to as something more dramatic as "local root", or a logic flaw in the container security mechanisms of the kernel itself.
Well yeah, the intended reading would be "if you're not running in a (user-namespaced) container, container root is local root". Maybe "namespace root" is a better term, though, where it's clear what it means if you're running in the init namespace.
It seems to me that "local root" is quite commonly used for kernel code execution vulnerabilities. There's also the possibility of a flaw in a service running in the init namespace that is exposed to the container.
I mean, it's certainly some level of access if there's a local privilege escalation, but it's not "your box is already pwned". Proper isolation and correct permissions on the system will keep the damage isolated. I've seen attacks that were more-or-less nerfed because they couldn't find an open local privilege escalation.
I don't think these should be written off as "Well, you're screwed anyway" like we tend to do when discussing security against an attacker who has direct physical access to the hardware. Isolated user accounts are widely used as the model for running services because they provide significant (if non-total) benefit.
> Isn't that just saying "I don't believe in multiuser systems and/or their security models"...? If so, what specifically do you have against them?
[Core Services] + SSH is generally something you can harden effectively against attacks.
[2903429034902323094230 binaries] is something you generally struggle to maintain security patches/etc on.
The simple fact is, there is just too much attack surface on a vanilla Linux box once you have an account that you can reliably do EVERYTHING you need to do to secure it 24/7/365.
> that all these people's hard work (e.g. OP) is all in vain...
Part of the reason people's hard work is in vain is because any time the topic of doing things better comes up, a cluster of developers will insist there's no point in improving e.g. filesystem because "once they're on the box, you're screwed." So it becomes a self-fullfilling prophecy.
In general there seems to be quite diverse opinions out there about "security", and a lot of the space seems occupied by "extreme pragmatics" (or even "anti-intellectuals"). E.g. lots of people feel it's warranted to peddle (simple) falsehoods instead of trying to understand (complex) problems. I can understand it may be the right approach from a day-to-day IT management perspective, but I'm not so sure it's the most viable path towards better security long-term.
> I can understand it may be the right approach from a day-to-day IT management perspective, but I'm not so sure it's the most viable path towards better security long-term.
Yeah, this is why I had the caveat:
> At least imho, given my time constraints/budget.
The "best" long term path is to have larger security budgets that allow for the objective you and the other folks who dislike my response want. The problem, frankly, is we just aren't there yet.
For instance, our budget for maintaining security is ~5% of the IT budget. A large portion of that goes to perimeter defense appliances (firewalls, barracuda antispam/antivirus filters, etc.) as well as making sure ublock, anti-malware, etc are installed on every machine. The other major chunk ends up in securing WAN-facing services that can be exploited remotely. The last major chunk is user training to get them to stop doing things like pay bills for services we never purchased, clicking on strange links, running strange attachments, etc.
After that, we have no resources to do more than run apt-get update && apt-get upgrade -y for protecting the attack surface once an account is breached. We've got a few things we had to re-compile ourselves manually and break with that process so we moved them out of the package manager for the OS. Our actual applications we build internally also likely have exploitable vulnerabilities if attacked from a local account. Those items never have the budget to be maintained and we certainly wouldn't survive someone taking over a local shell account.
I suspect given this is (roughly) the situation every place I've worked at, its simply too common to be an issue.
The other one I see touted everywhere far too often is "false sense of security." Because improving the security of one thing can't possibly help when there could be a dozen other things that may yet be vulnerable. No, instead of helping, fixing that one thing lulls you into believing you are safe and that alone makes it all less secure. :-)
I'm trying my hardest to understand how this is a novel problem. Maybe somebody can help me?
If I control the stack pointer and can write to where it points, I can write to arbitrary in process memory. Sure!
Is that just valuable as a ROP trick?
But if I have that, isn't just writing to the actual stack more valuable? Why does stack growth matter at all besides being a complication where one can not write to one specific page?
How does this get you to write to out of process memory?
Firstly, it's not exactly a novel problem - as the advisory points out, there were earlier public examples of the bug class in 2005 and 2010.
That aside, the issue here is that you can have a program that correctly writes only to properly allocated objects on the heap, and to properly allocated objects on that stack: but if you can get the stack to grow down into the heap without it being detected, now your properly-allocated objects on the heap alias part of the stack (and your properly-allocated objects on the stack alias part of the heap). So now the correct writes done by the program can write to unintended places, like return values on the stack or function pointers in the heap.
The core trick is the "without it being detected" part. What they've done is find places - some of them in library code - that the stack is grown by more than the size of the guard page in one go, and where writing to the guard page itself can be avoided (or in the BSD case, they've found ways that the guard page itself can be disabled). There's also some other clever tricks around expanding the stack and the heap.
> If I control the stack pointer and can write to where it points, I can write to arbitrary in process memory. Sure!
But you don't usually have that sort of control. Normally you'd use something like a buffer overflow on a stack allocated buffer.
This thing is a problem even if you have no buffer overflows on stack and do not have arbitrary write access to anywhere on stack.
What's happening here that the program gets confused about how large its stack is, and keeps utilizing more memory than it should for the stack. But that memory is allocated for heap objects, so a simple write to one of these (not requiring any sort of buffer overflow or other such bug to exploit) could be used to smash the stack.
I'm very surprised. I was sure -fstack-check was on by default. The fact that it isn't secure without it is known for years. Windows compilers have had that check for years. The bug isn't in any executables, gcc and all other compilers should have -fstack-check on by default, with optional disable. I'm even more suprised that people who are supposed to know what they are doing don't compile with it.
"The Stack Clash is a vulnerability in the memory management of several operating systems. (...) It can be exploited by attackers to corrupt memory and execute arbitrary code."
"If you are using Linux, OpenBSD, NetBSD, FreeBSD, or Solaris, on i386 or amd64, you are affected. Other operating systems and architectures may be vulnerable too, but we have not researched any of them yet: please refer to your vendor’s official statement about the Stack Clash for more information."
If you're looking for a background on the attack itself, I did a writeup of the basic attack some time ago at https://ldpreload.com/blog/stack-smashes-you . The new thing is that Qualys has developed this into a real class of exploits against lots of software; I think we all thought that it was relatively rare to run into this vulnerability.
On Linux and most Unices (at least on ix86), the heap starts at the bottom of the address space, and the upper bound grows up as you allocate more memory, while the stack starts at the top of the memory space and grows downward as you allocate more stack (stack allocation : mostly function calls and the odd alloca()). malloc() who manages the heap as no idea of where the stack currently ends, and thus can allocate memory at an adress that is also claimed by the stack. When writing to this address, this permit to smash the stack without needing an actual buffer overflow in the attacked code.
There is good news, this looks way harder to attack on 64 bit systems (all the CVE are on 32 bit OS), possibly because the adress space is so huge, and standard OS counter measure like ASLR, stack gap etc all help protecting against the attack.
> malloc() who manages the heap as no idea of where the stack currently ends, and thus can allocate memory at an adress that is also claimed by the stack
The other way around. The kernel has already mapped pages for the stack, and would never hand such pages to malloc unless there's a serious bug in the vm subsystem.
However, code that uses the stack has no idea where the stack ends. And reaching out of the stack is only detected via page faults. If that access happens to land on a mapped page with the right permissions, there is no fault and the code can effectively grow the stack into heap region.
Right. The malloc heap has defined starting and ending points. (I say "points" because it's common to use mmap to get a bunch of new pages, which might be discontiguous from the existing heap, but you still know exactly where those pages are.) The stack, on the other hand, is just a pointer.
In theory, you can always decrement the stack pointer for your variables. If you find an unallocated page, the kernel will notice that you're right below the stack and give you more stack pages. There's no other to request more stack memory, the way you can use brk() or mmap() to request more heap memory: you're supposed to page-fault and let the kernel come up with more stack.
In slightly less theory, you can decrement the stack pointer, and if you reach more memory than the kernel is willing to give you, the page fault will turn into an actual segfault, because you'll hit a specially-defined guard page that prevents you from infinitely growing the stack.
In practice, you can decrement the stack pointer by any arbitrary amount and now you just have a pointer somewhere and you have to hope it's either within the stack or in the guard page....
Slightly more specifically, if the stack is growing into space that the heap already owns, you don't get a page fault. So the OS assumes that the stack already owns that memory, since there was no page fault. Now if you write to the heap, you can corrupt the stack.
> Slightly more specifically, if the stack is growing into space that the heap already owns, you don't get a page fault.
Exactly. But the stack is only growing in the program's view of the world.
> So the OS assumes that the stack already owns that memory
Nah, the OS is blissfully unaware that the program has moved its stack pointer to point off the stack. If anything, the OS wrongly assumes the program is still operating within the stack space explicitly and rightly reserved for it.
And the program in turn wrongly assumes this memory newly referenced via the stack pointer has been reserved for its stack, because it wasn't killed for accessing it.
> Now if you write to the heap, you can corrupt the stack.
That is right. You will corrupt what the program believes to be stack.
The user-space stack of a process is automatically expanded by the kernel:
- if the stack-pointer (the esp register, on i386) reaches the start of the stack and the unmapped memory pages below (the stack grows down, on i386),
- then a "page-fault" exception is raised and caught by the kernel,
- and the page-fault handler transparently expands the user-space stack of the process (it decreases the start address of the stack),
- or it terminates the process with a SIGSEGV if the stack expansion fails (for example, if the RLIMIT_STACK is reached).
Unfortunately, this stack expansion mechanism is implicit and fragile: it relies on page-fault exceptions, but if another memory region is mapped directly below the stack, then the stack-pointer can move from the stack into the other memory region without raising a page-fault, and:
- the kernel cannot tell that the process needed more stack memory;
- the process cannot tell that its stack-pointer moved from the stack into another memory region.
This is crazy. I remember thinking when I first heard about stack and heap growing towards each other: uh-oh. But the problem was so blindingly obvious that I just assumed any system written for anything beyond co-operative multitasking had a fix - because if not there was obviously no actual memory safety...
What would the fix be? The fact that you can point the stack pointer at arbitrary memory and the CPU will treat that memory as the stack is a feature, and an important one, of the ISA.
The real issue here is that writing programs in memory-unsafe languages is inherently difficult and risky, and fewer programs should be written that way.
It looks like the fix is going to involve adding some code to LLVM to probe each stack page when you make a large stack allocation, but once that happens, it's straightforward for clang to implement -fstack-check for C and C++ programs, too.
This is a weird emergent problem from the fact that stack and heap memory are part of the same address space and that the stack is designed to implicitly grow as needed. It's not clear that it's a language or compiler's fault for relying on the stack doing that, nor that it's the platform or kernel's fault for making that approach possible.
I'm not totally sure how you would design a language so that you don't have this problem. I guess you could forbid a function from using more than some small amount of stack, and make sure your stack guard area is at least that big, but that seems like an unfortunate restriction. Maybe something like the Stackless Python approach, where all variables are heap-allocated, would work?
Also, all that said, note that Windows gets this right: MSVC inserts calls to _chkstk() to do stack probing. I believe this entire class of vulnerability doesn't exist on Windows, and probably some people at MS who spend their entire lives in memory-unsafe languages are feeling very smug today.
You're really barking at the wrong tree here. The user of the memory-unsafe language isn't at fault here for doing things the implementation will utilize stack for. The language specification -- the contract between him and the implementation, never stated or implied that he is responsible for making sure that these objects on stack aren't accessed in larger than page sized increments. The word "stack" never appears in the C specification.
At a source level, these programs may as well be 100% bug free.
This is nothing more than a quirk of implementation. And one that affects all languages that use the stack (and by which I mean the stack, not some heap-allocated structure the language provides stack-like operations on). Memory safety doesn't really enter the picture.
The design might be fundamentally broken, but if so you're saying that even with an mmu, we can only have co-operative multitasking. That's not the promise of a multi-user/multi-prosess system.
If "all programs are safe", we could just use the Amiga OS kernel and no longer need an mmu, or a similar design.
[ed: apparently windows NT takes steps to avoid this according to a sibling comment. Not clear, but I assume it implies a performance hit for certain heavy stack usage?]
I don't entirely follow what you're saying in your first sentence, but I'm going to try to respond to what I think you're saying. If I'm off base please let me know!
1. Memory-safe languages are about making sure that the programmer's intended behavior matches the actual behavior, that is, eliminating a class of bugs related to memory unsafety. They are a security scheme insofar as these bugs are security bugs, but they're not an interprocess security scheme. In particular, you can write a memory-safe debugger that goes and makes arbitrary modifications to other processes the OS gives it access to. You can even write a memory-safe program in Rust that goes and edits /proc/self/mem. But in these cases, the programmer is intending to mess with process memory directly, so the language isn't obligated to stop the programmer. It is obligated to stop the programmer from, say, overflowing a string and overwriting the return address.
2. It is certainly possible to design a memory-safe language that is usable for interprocess memory protection. Microsoft had a research OS called Singularity that did exactly this: https://www.microsoft.com/en-us/research/wp-content/uploads/... But it's another step on top of memory safety.
3. Preemptive multitasking and protected memory aren't inherently related (although, yes, in the market, most cooperatively multitasked OSes lacked memory protection, and most OSes with memory protections were preemptively multitasked). You can have a preemptively multitasked system with no MMU at all; you just need to respond to timer interrupts and switch tasks.
You're right, I leapt over some points, and landed slightly outside the discussion - I guess I think of growing the stack and allocating heap memory as something the kernel should be the arbiter of - and that the api should never allow you to grow into your own (or another process') memory.
I suppose it's fine to say mallloc will return memory, but it's up to the process to check if there are any overlaps - but that sounds a little crazy?
That is basically how it works, with the caveat that if you want the kernel to arbitrate your stack expansion you must only expand by a page at a time (and that's what gcc's -fstack-check does).
If you decrement your stack pointer by a large value and then offset it - which is essentially what's happening in these cases - the kernel can't arbitrate that because if the access lands in otherwise allocated memory it doesn't fault and so the kernel never sees the access at all.
I meant that the equivalent for mallloc would be that if you allocate 1mb buffer and a 2mb buffer, the kernel might return a 2mb buffer overlapping your earlier 1mb buffer - and be all like: "you asked for 1, you asked for 2 - and you've got 2 - if you wanted 3, you should've asked for 3". Afaik mallloc doesn't work like that - it assumes that you want more memory (and can fail or succeed etc).
I can see how the current stack/heap thing evolved - but I still think it's crazy :-)
All that's happening here is that userspace is moving its stack pointer into the heap it had previously allocated. Note that "moving the stack pointer" is not a kernel-mediated operation.
No of course, but the fact that you can "ask for more memory" by growing the stack onto your heap (rather than say, having the two start somewhere together and grow apart) - means that there's an asymmetry: mallloc will give you more ram or fail; growing the stack - can make your allocated memory overlap.
Your stack has to grow towards something. Sure, you can have it grow towards the bottom of the address space (which, due to wraparound, is also the top - where it will safely collide with the kernel addresses) but that only works for one stack - as soon as you create another thread, its thread has to grow towards something else.
The thing is that (as I mentioned upthread) "growing the stack" isn't a well-defined operation. What actually happens is that you overflow the stack by a very little bit, and the kernel says "Oh, I bet you want more stack pages" and maps some virtual memory for you. But the kernel is guessing; it has no way of knowing that you meant to grow the stack. Maybe you dereferenced a wild pointer that, by chance, happened to point right below the current stack.
Conversely, if you grow your stack by some value on the order of gigabytes, you're basically coming up with a pointer that appears to have no relation to the stack, and dereferencing it. So the platform is going to do exactly what it does if you were to dereference the same pointer value with no stack involved: read/write memory if it's mapped and segfault if not.
You could totally imagine a platform where growing the stack were a more well-defined operation. You want to avoid each function call and local allocation having the overhead of a system call, though: the nice thing about the current scheme is that it's zero-overhead if there's a mapped stack page. So the scheme was designed (or probably emerged more than was intentionally designed) for the case where syscalls are very slow, MMUs work fine, and perfect memory safety isn't the goal, i.e., the original UNIX target audience. :-)
You could keep a thread-local variable somewhere indicating the current stack limit, and make a system call when you need to increment it. That doesn't require an MMU at all: the userspace API is that you call some system routine when you need to expand the stack, and it says yes or no (or it either says yes or kills you with a segfault, or whatever). In an MMU-less system, you can just keep track of the amount of heap allocation, and have the system routine fail when you're too close to your heap.
Or you could do stack probing, which works but requires an MMU.
There would be a performance hit, but that is likely to be insignificant. Apparently the stack check code touches every stack page.
If performance were a concern (and I don't really think it is), it should be possible to reduce the impact by assuming a sufficiently large stack guard that the majority of functions with fixed size stack frames could never leap over. Then these functions wouldn't need any runtime checks. The only checking you'd have to do is on code that uses VLAs, alloca, or such, along with the few outlier functions that use ridiculously large fixed size buffers. I don't see why you should need to touch every page.
I doubt it. The stack pointer is modified upon function entry, exit, and per function call (parameters pushed onto the stack, popped off afterwards). You could try to add code to every function to check of the stack pointer is between two addresses and fault if not, but then threading becomes more difficult ...
Yeah, it's basically inherently unsafe to allocate large buffers on the stack. grsecurity/PaX have apparently mitigated this somewhat by increasing the size of buffer required to exploit this, but there's no way for the kernel to fix the issue.
Qualys found some exploits, told parties responsible for fixing the problems, those parties fixed the problems.
These are remarkable because they are conceptually straightforward but the power of the exploits was potentially substantial.
edit: original first sentence was "told Red Hat, Red Hat fixed them." Apparently that was wrong and people (appropriately!) want credit attributed to the right parties.
Actually no, Qualys told Red Hat and SUSE initially (I asked for early access to confirm how bad it was, Red Hat and SUSE are capable of handling very sensitive embargoed material as we have enough engineers internally to do Kernel/glibc/etc stuff in house) and then we (Red Hat and SUSE) agreed that 1) this was as bad as Qualys said and 2) we need to get the entire community involved ASAP (via the distros list and CC's for people not on it like the Kernel people and so on).
It's not clear to me why compiling all userland code with -fstack-check would help. Couldn't you work around that by copying or creating an executable in assembly that doesn't write every 4 KB?
The issue isn't that a user can run their own code that can stack smash, rather that a user can exploit the smash to run their code in a privileged context (set-guid, i.e. sudo, su, etc.).
On 32bit x86, couldn't the SS segment selector be mapped to a completely different set of memory compared to CS/DS/ES and thus remove the possibiliy of the stack and the heap clashing?
What if you set up SS to mirror DS, but with a limited range so that any attempt to access memory outside the stack via SS: causes a page fault? Wouldn't any exploit running ESP down into the heap be thwarted by any stack-related instruction (push, pop, or an interrupt?)
The accesses causing issues here are not guaranteed to use SS - that only happens for effective addresses [ebp+...] and [esp+...]. If ESP is copied into another register first (which in practice will almost always be the case) then the access will use DS. PUSH will always use SS but that's not the issue here (that only moves ESP by 4 bytes so it'll always hit the guard page). And in modern OSes, interrupts don't use the user mode stack at all - the CPU will switch to kernel mode and use a kernel stack since the user mode stack isn't guaranteed to be valid.
Interesting. I was just curious if it would be impossible to write shellcode without triggering an SS:ESP access (via call,push,pop,ret) that would page fault due to protection/selector limits, because that seemed like a neat way to mitigate.
Quote from Kurt Seifried of RedHat http://www.openwall.com/lists/oss-security/2017/06/19/2
"I just want to publicly thank Qualys for working with the Open Source community so we (Linux and BSD) could all get this fixed properly. There was a lot of work from everyone involved and it all went pretty smoothly."
Debian security advisories rapid fire:
glibc https://lists.debian.org/debian-security-announce/2017/msg00...
linux https://lists.debian.org/debian-security-announce/2017/msg00...
exim4 https://lists.debian.org/debian-security-announce/2017/msg00...
libffi https://lists.debian.org/debian-security-announce/2017/msg00...
And just a sample from the Qualys announcement (out of many more), "local-root exploit against Exim", "local-root exploit against Sudo (Debian, Ubuntu, CentOS)", "local-root exploit against /bin/su", "local-root exploit against ld.so and most SUID-root binaries (Debian, Ubuntu, Fedora, CentOS)", "local-root exploit against /usr/bin/rsh (Solaris 11)", as well as proof of concepts for OpenBSD, FreeBSD and so on.