Hacker News new | past | comments | ask | show | jobs | submit login
LLVM merges SafeStack (github.com)
267 points by theraven on June 26, 2015 | hide | past | web | favorite | 79 comments

Does this mean that it will no longer be possible to do things like return-oriented programming?

LE: indeed, it's quite clear from the mentioned article (http://dslab.epfl.ch/pubs/cpi.pdf). So this provides great exploit protection.

According to https://github.com/Microsoft/clang/blob/master/docs/SafeStac... safestack alone doesn't fully protect against ROP:

> With SafeStack alone, an attacker can overwrite a function pointer on the heap or the unsafe stack and cause a program to call arbitrary location, which in turn might enable stack pivoting and return-oriented programming.

And you need additional features (such as CPI from the paper you and the commit message link to) for full protection.

ROP is an exploit technique. Stack corruption is a class of vulnerabilities. There are other memory corruption techniques that can be exploited with ROP.

It is NOT clear from that article. ROP can occur on the heap and CPI is bypassable (although safestack is a great contribution, and frankly it's about time). There is great literature on this issue already (see many forms of the Control Flow Integrity defense), and many solutions that exist come close to full security without providing it (CPI only protects code pointers, and side-channel attacks that work through data pointers can still achieve arbitrary memory reads and writes). In particular, use-after-free vulnerabilities still exist. Without full memory safety, exploits of these types will always be possible.

What is return-oriented programming? Recursion?

Return-oriented programming is an exploit technique that relies on reusing snippets of existing code (called gadgets) in a program in order to carry out attacker code. Each gadget generally ends with a return instruction, which causes it to read the address of the next gadget off the stack and jump to it. In this way, arbitrarily complex code can be built up by chaining together sequences of gadgets controlled by an initial set of return addresses on the stack.

It's used as a way to defeat DEP (Data Execution Prevention); with DEP the attacker can no longer write code into memory and then execute it, so instead they just set up the stack cleverly so they can carry out a return-oriented payload (most commonly, these payloads just disable DEP and then move on to a more traditional second stage).

More info:

The paper that introduced the name ROP (though some would argue that the techniques existed before this paper): https://cseweb.ucsd.edu/~hovav/dist/geometry.pdf

Wikipedia: https://en.wikipedia.org/wiki/Return-oriented_programming

If you're interested in learning about exploit writing, you might want to check this page : https://www.corelan.be/index.php/articles/ .

It looks like it's an exploit technique where the stack is modified to set up malicious calls to functions: https://en.wikipedia.org/wiki/Return-oriented_programming

That's not really the essence of ROP because other attack techniques often also need to manipulate the stack. The key novel idea in ROP is to use data in unintended ways. This is based on the insight that the memory often contains short sequences of bytes (e.g. a .jpg image) that can be interpreted as machine instructions. For example an mp3 file might contain the sequence 99 19933, 16 which translates to

    increment register 16
in the ambient machine language. Call that "dual use data". ROP searches the memory for sufficient "dual use data" and then builds an ac-hoc compiler with "dual use data" as target language. Then the attack software compiles to "dual use data" and then runs the compiled code.

Of course one may ask: can we always find enough "dual use data" to build a Turing-complete set of instructions as a compilation target. Turns out that with high probability that is the case.

ROP gadgets are usually harvested from libraries loaded into the program, not MP3 files.

The key novel idea in ROP is to use instruction sequences in unintended ways. ROP is a refinement of ret2libc, improving on it by returning into arbitrary locations in functions rather than their entry points. That, and of chaining together gadgets with returns. Hence the name.

It is true that "ROP gadgets are usually harvested from libraries loaded into the program, not MP3 files", but that's not because there's something intrinsically wrong with mp3s as source of gadgets, it's just that mp3s are often not executable. I have emphasised mp3s and jpgs precisely to emphasise what' novel about ROP, namely that any data can be used as machine language.

Yeah this just doesn't seem like an illuminating example in practice. In practice, gadgets for ROP chains are harvested from program text. It's for that reason that so much effort is expended in many exploits on memory leaks that reveal the locations of libraries loaded into memory.

Thanks. Is this because it's mostly programs that live in executable space on a well-maintained machine or because gadgets can be precomputed (at least in parts) which makes compilation easier? (Not that that is mutually exclusive!)

Both of those are true.

None of that should be marked executable, though. The real risk in ROP is using little bits of legitimate functions to bypass DEP.

That is true, I should have been more clear about this but you don't use the "legitimate function"'s intended functionality, you only use the fact that it can be execute (and the byte-string that is it's code).

I used mp3s and jpgs as extreme examples of data that was never intended to be executed, but still can be interpreted as code. In ROP, you don't care about the intended meaning of the bytes that make up "legitimate functions" (or any other data you may use) for it's unlikely to have the sought functionality. Instead you use you search for "dual use code" too, and piece together the functionality you need.

Well, but you missed that data can not be executed.

Unless you store your MP3s and jpegs in .text, the memory pages all that stuff is in are marked not executable and will only cause a crash if you jump to it. Regardless of whether the bytes make useful instructions.

Data can be executed if it is in the executable part of the memory. There is quite a lot of such data. In particular, code is data!

Why is this post downvoted? This idea of (mis)using code is part of the essence of ROP! Could somebody please explain where I'm wrong, so I can learn?

I didn't downvote it, but I feel like your comments have been technically correct but practically misleading.

It's possible to have executable data, but if you do, you generally have bigger problems: the exploit can simply write a complete first-stage into the data and execute it directly, and not bother going through return-oriented contortions.

The reality is that gadget harvesting is about analyzing program text --- actual binary machine instructions --- not about looking for ways to interpret JPEGs or MP3s or (I wrote DOCX and then PDF and then thought "huh bad examples") RTF files as instruction streams.

It's also true that you can exploit insane x86 encoding to synthesize unintended instructions, but that's (I think) less important than the simpler idea of taking whole assembled programs, harvesting very small subsequences, wiring them together with a forged series of stack frames, and achieving general computation.

Right. But in practice ROP targets the executable portions - any and all of them. If someone leaves something executable that they shouldn't, it'll use that. If only code is left executable, it's still often able to use that.

Remember, x86 can be parsed differently depending on offset. You jump into the middle of a multibyte instruction you get an entirely different instruction stream. And x86 doesn't have any real protection against that.

You may however be able to get different legitimate instructions than originally intended by jumping to an address in the middle of a multi-byte instruction that happens to decode into a useful series of operations. It follows that there are more usable "returns" in a body of code than just those written in the original source.

There are a multiple usable "returns" in functions because "return" is just "pop" then "jump", so a block that has the effect of manipulating the stack and then transferring control can often be used to the same effect as a return.

on 32-bit x86 the RET instructions are 0xC3 and 0xCB. Any other instruction containing these bytes can be subverted into a return if you can make the processor read the preceding instructions from the wrong starting point.

Sure; the authors in the Roemer paper found a couple of those. The best place to get a sense of how this works is the "Gadget Catalog" section of that paper.

Don't forget the variants with a stack-adjusting immediate, 0xC2 and 0xCA. Those can also come in handy.

Accidentally downvoted you. My apologies.

If only there was a way to space those two tiny arrows further apart. Let's hope science comes up with a way some day..

HN is a mind hack to exercise your jquery skills.

The best explanation I've heard for HN's lack of some features is that any HN power user is expected to find a solution for these problems. That doesn't necessarily mean making a solution, it may just entail searching for a third party solution, or discovering a bit of specific HN behavior that isn't well documented.

I imagine any user with over 1000 karma probably has at least a few tricks, and could mention off the top of their head a few things that help quite a bit when using the site. At the same time, I think it's best they aren't mentioned too often, as they offer a sort of natural advantage to those that have been around an while and contributed to discussions (it's the nature of many of these helpers that they actually help discussion), and putting those tools into the hands of a novice user may actually be detrimental to the site as a whole.

This sounds a lot like rationalising poor UI.

Upvoted for you.

To be clear: SafeStack does NOT prevent return oriented programming. It makes the bar much higher, and it should be lauded for that. But please don't for a second think that this is a solved problem: ROP can occur on the heap, for instance. CPI as a system also does not completely solve the problem: it is possible to break, for example (http://web.mit.edu/ha22286/www/papers/conference/Oakland15.p... ) and despite the CPI author's conclusions, produces high overheads for programs with large amounts of code pointers (C++ programs with vtables are good examples). Also not prevented are attacks that use data pointers (non control-flow data attacks), an area that has seen little study.

Also see papers like BlindROP: http://www.scs.stanford.edu/~sorbo/brop/bittau-brop.pdf and Sigreturn oriented programming: https://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf to get a little bit more of the idea of how complicated ROP can actually get.

From the article: "The overhead of our implementation of the safe stack is very close to zero (0.01% on the Phoronix benchmarks)". That's quite awesome, congratulations.

Especially if you add the follow-up sentence:

"This is lower than the overhead of stack cookies, which are supported by LLVM and are commonly used today, yet the security guarantees of the safe stack are strictly stronger than stack cookies."

Win-win for everyone

The performance is impressive, considering that maintaining a second stack presumably requires another register exclusively dedicated to it. I'm surprised it makes such little difference. Or are there some cunning optimisations going on?

I was wondering the same thing. The linked article (http://dslab.epfl.ch/pubs/cpi.pdf) says that they don't use a dedicated register. The unsafe stack pointer is saved in the thread control block, accessible through a segment register. From there, they let LLVM choose a register (not necessarily the same one for each function). They also say that only 25% of functions need any unsafe stack access at all, which I guess is why this is faster than using a dedicated register.

Essentially, what they are doing is allocating all unsafe objects (i.e. arrays and objects whose pointer is passed to another function) on a dedicated region of the heap (that they call the unsafe stack) that only keeps these objects, so all (de-)allocation happens in the LIFO order and can be implemented as a stack. As pointed out by evanpw, they keep a pointer to this region in the thread-local store.

Safe stack gets a dedicated register, unsafe stack does not. Then the problem is to keep as much as possible on safe stack.

From the paper:

> This is because objects that end up being moved to the regular (unsafe) stack are usually large arrays or variables that are used through multiple stack frames. Moving such objects away from the safe stack increases the locality of frequently accessed values on the stack, such as CPU register values temporarily stored on the stack, return ad- dresses, and small local variables.

I wonder if this speedup effectively hides the performance overhead of SafeStack.

Why don't we have a CPU architecutre with two stacks - one for stack data and another for return addresses?

That's what SafeStack (and IA-64) do.

Or write code in languages that, you know, don't encourage unwanted arbitrary code execution?

Apple, which has started distributing LLVM bitcode, will be able to apply it on all the new apps in the App Store transparently.

Google will be able to do likewise for NaCL apps.

>Google will be able to do likewise for NaCL apps

All 5 of them?

>Google will be able to do likewise for NaCL apps.

No they won't, because NaCl is native x86. Unless you mean PNaCl, which is a different technology with much less adoption than NaCl.

Should Google bother? The point of NaCL apps is that memory corruption flaws in them aren't supposed to matter.

Well bother business-wise? I mostly included them for completeness. I'm a bit fan of PNaCL but its not really taken off.

So technically bother, then? NaCL programs are just as susceptible to buffer-overflow as conventional programs, its just they are better sandboxed. Exploit mitigation is a belt-and-braces thing, and I can't see why Google wouldn't be enabling this pass right now as we speak.

You have something compiled in NaCL like the Flash plugin and it can control the camera and stuff and you are hoping that an attacker can't feed it some malformed JPEG or something that makes it use the caps it has in a bad way etc.

Here's the only thread I could dig up on buffer overflows in NaCL: http://permalink.gmane.org/gmane.comp.security.nativeclient....

It's not that they're better sandboxed; it's that regardless of whether memory corruption exploits work, in the NaCL security model, if the sandbox doesn't work, you're fucked regardless. So why bother with hardened runtimes?

Your threat model seems to be that anyone who sends input to a sandboxed app should be assumed to have full control of it. But the apps frequently process input without wanting to give the sender of that input access to all its resources.

By your analysis, Gmail could implement a feature that lets a sender run arbitrary JavaScript in the recipient's browser, and this would have no security impact as long as the JavaScript sandbox was not escaped. But in reality this would be a huge breach, because there are valuable things inside the sandbox that attackers should not have access to.

Put another way, this wouldn't help defend Chrome from NaCl, but it would help defend the NaCl app from it's clients. This would be in Google's interest to implement because it would make the platform more attractive to developers.

That seems like an uncommon scenario for the kinds of things NACL gets used for. These are intrinsically server-controlled applications, when they're applications at all (my real sense of what NACL is about is "another tool to migrate away from things like Flash and plugins; ie: a platform for video codecs).

I see your point. I guess you're saying, there could be a photo editing app in which Alice can send pictures to Bob, and Mallory might send a malicious picture to Alice that coerces her client into betraying all its photos.

The buffer overflow in NaCL extension that comes with Chrome e.g. Flash may be just the a stepping stone in a Pwnium entry that escapes the sandbox through yet another webGL validation bug or something.

Why would Google want to not bother applying a belt-and-braces exploit mitigation that costs 0% CPU?

You're not following my point. NaCL programs are content-controlled. Attackers start out with arbitrary code execution.

One NaCL use case is a plugin downloaded by the page

But NaCL is also used to isolate "built in" embeddables e.g. Flash, which I have used as an example of a NaCL plugin that comes with chrome in both the previous posts?

Another example is the voice activation plugin that got them into so much trouble recently...

Imagine you walk past your colleagues computers saying things like "let's google something naughty" loudly...

And now let's extend that to playing an audio snippet that invokes a stack smash? :)

I'm still not following how hardening Nacl against memory corruption bugs can help Chrome, given that Nacl-enabled Chrome already runs content-controlled Nacl code.

Are the pdf viewer or flash players 'content-controlled'?

Are not google-bundled plugins allowed to run even when the user doesn't allow third-party NaCL plugins to run?

Its belt and braces. Why wouldn't chrome bother to enable a pass that costs 0% runtime performance? After all the money and time Google have sunk into other runtime checks like ASan, why wouldn't they also enable this?

It made me think: what other 'exploit mitigations' do Google put into NaCL, even though its sandboxed? So I quickly gooogled to see if they use ASLR inside NaCL, for example. Here's what I found:


So they added ASLR, and they made a nice point:

"for the threat model where a NaCl module might process untrusted input, it would be nice to provide ASLR for the NaCl module so untrusted input won't (easily) be able to take control of the NaCl module (if/when the NaCl module has bugs). (this is a different threat model that the usual one in NaCl, where the NaCl module is untrusted. here we are trying to make sure that a NaCl module, which is executing code on the behalf of some domain/web site, isn't easily pwned, even if the NaCl runtime is itself okay.)"

>>Google will be able to do likewise for NaCL apps.

They might be able to apply that over all of android - and that will automatically apply over java based apps.From there it's just a matter of incentives, to rapidly create change in rest of the apps.

Apple is making its apps LLVM-based now? Doesn't that mean they could soon push for ARM CPUs in Mac OS, too?

Or, the other way around, for x86_64 in mobile devices.

Now they already can, technically (via fat binaries, they've already been through multiple architectural transitions), the interesting part is they could now go through these transitions without developers having to be involved.

No, bit odd is too low level for this and I believe it's only used for iOS apps ATM.

yes. developers will be able to submit apps as LLVM Bitcode which means that Apple can switch to ARM for some new device and recompile the app without the developer having to resubmit it.

It also means the end of Universal binaries as Apple can thin the compiled app for each target device.

the PPC switch over was painful, as was the 32 versus 64 bit era. They want to avoid that in the future.

At the moment LLVM bit code generated by Clang is still architecture dependent (e.g. sizeof() still generates code that depends on the architecture).

But modern CPUs are all converging on 64-bits, for example.

Use of SIMD intrinsics are a tougher nut, but I've actually been playing with them at an IR level for hobby stuff and I declare its not intractable.

Converging on 64-bit CPUs doesn't mean converging on standard `sizeof()` values (c.f. windows x86_64 ABI is LLP64 but linux x86_64 ABI is LP64)

It's easy to loose sight of the fact that we are talking about 64bit OS X here... The diff between ir that targeted x86-64 and arm64 and so on is something Apple has some control over.

I know, I play with a hobby llvm backend that retargets.

true, I didn't mean that it would solve a future 32 v 64 problem.

"the PPC switch over was painful"

How so? I recall it being amazingly painless.

It means exactly that.

Here's a nice article describing it:

> This means that apps can automatically “take advantage of new processor capabilities we might be adding in the future, without you re-submitting to the store.”


It doesn't. Bitcode is arch specific and unportable. It allows some instruction-set level optimizations, but not changing e.g. endianness (exactly what x64 and arm differ in) or type sizes.

ARM cores, although technically bi-endian, are in practice little-endian.

I play writing LLVM backends, and its entirely viable to even do things like rewrite float80 and other assumptions that a front-end has made for a particular target.

But what about things like non-aligned memory access (allowed on x86, not allowed on ARM)? Porting code across architectures can't be as easy as just swapping LLVM backends, when not even recompiling the code is sufficient.

Modern ARMs are just fine with unaligned access.

There will be ISAs sufficiently different that you cannot make bitcode generated when targetting one not be massaged to fit another, but they are not mainstream. The mainstream are all increasingly similar 64-bit targets.

It never occurred to me before to ask, but aren't Emscripten asm.js programs vulnerable to the same exploits C programs are? e.g. I could exploit a buffer overflow in some trusted js code to get some sensitive information from the site. If that's the case, would emcc with SafeStack mitigate that?

Interesting. I haven't fully digested the paper, but a few notes for context:

- Most real-world exploits these days are based on use-after-frees, heap buffer overflows, and other heap-related weirdness, rather than stack buffer overflows. It's nice that SafeStack mitigates that attack vector though (but if you disable stack canaries in favor of it, you actually reopen the door to exploit certain types of vulnerabilities...)

- A (the most?) common method to proceed from memory corruption to return-oriented programming is to redirect a virtual method call or other indirect jump to a stack pivot instruction. SafeStack alone does nothing to prevent this, so it doesn't prevent ROP.

- However, the general code-pointer indirection mechanisms described in the paper, of which SafeStack is an important component, could make ROP significantly harder, because you would only be able to jump to the starts of functions. This guarantee is similar to Windows's CFG (although the implementation is different), but SafeStack makes it harder to bypass by finding a pointer into the stack (either on the heap or via gadget).

- In practice, interoperation with unprotected OS libraries is likely to seriously compromise the security benefits of the combined scheme, because they will store pointers into the real stack, jump directly to code pointers on the heap, etc. JIT compilers are also likely to be problematic.

- In addition, there are more direct ways for an attacker to work around the protection, such as using as gadgets starts of functions that do some small operation and then proceed to a virtual call on an argument. The larger the application, the more possibilities for bypass there are.

- Still, "harder" is pretty good.

Edit: By the way, the point about function start gadgets makes questionable the paper's claim that "CPI guarantees the impossibility of any control-flow hijack attack based on memory corruptions." Also, if you want to guarantee rsp isn't leaked, it isn't enough to keep all pointers near it out of regular memory: they also have to be kept out of the stack itself, because functions with many (or variable) arguments will read them from the stack - at least, I don't see a claim in the paper about moving them - so subverting an indirect call to go to a function that takes more arguments than actually provided (or just changing a printf format string to have a lot of arguments) will cause whatever data's on the stack to be treated as arguments. Ditto registers that either can be used for arguments or are callee-saved. That means frame pointers have to be disabled or munged, and any cases where LLVM automatically generates temporary pointers for stack stores - which I've seen it do before - have to be addressed.

If you do move non-register arguments to the safe stack then the situation is improved, but you still have to watch out for temporaries left in argument registers.

SafeStack is essentially an IA-64/SPARC-style two-stack model - there are no pointers to the %rsp stack.

The issue is preventing pointers to the real stack on the real stack. I'm pretty sure you can't do that reliably at the LLVM IR level, since as I said such pointers can be introduced during code generation. In fact, I just looked at the source to the merged pass, and it doesn't even try - it only checks if stack pointers are passed to calls, but e.g.

    int *p = cond ? &a : &b;

    ...later enough that this isn't trivially optimized into two stores...

    *p = 1; 
will probably not be flagged (it depends on what optimization passes have run before the SafeStack pass), but will put the pointer in the stack or a register that may be saved.

Now if we can get the stack to grow upwards instead of downwards my life will be complete and I can die.

IA-64 had this since it was created - nice to see it coming to x86.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact