Hacker News new | past | comments | ask | show | jobs | submit login
Morpheus Turns a CPU into a Rubik’s Cube to Defeat Hackers (ieee.org)
60 points by kaisix 28 days ago | hide | past | favorite | 30 comments



> What is the overhead for Morpheus?

> Todd Austin: It’s about 10 percent slower on average.

If you accept that kind of overhead, why not just go the Burroughs B5000/Lisp Machine/etc. route and have a hardware instruction set which is essentially byte code for a safe machine, which cannot be tricked into reading and writing in arbitrary places.

Of course, then you would still have the problem of actual security holes in programs which has bugs, which Morpheus effectively hides, by obscuring them.


A 10% performance hit for "secure mode" doesn't seem at all that bad - run at last year's speed for an additional layer of security.

Something like this for IoT devices might actually provide some of the missing security thereof.


Do processing speeds really still increase 10% every year?


Thanks to AMD, it's actually a little better than that (for one specific year, though the entire ryzen line has been roughly similar)

https://www.cpubenchmark.net/compare/AMD-Ryzen-5-5600X-vs-AM...


Even if they're not, the processing power at a given price point is increasing by 10% a year, which effectively works out to the same - especially if you use these in low-cost devices that are never updated (IoT seems a perfect fit).


I think you're omission of what Todd Austin really answered in the article, on this question, is close to insincere.

> Todd Austin: It’s about 10 percent slower on average. We could reduce those overheads if we had more time to optimize the system. You know, we’re just a couple of grad students and some faculty. If a company like Intel, AMD, or ARM built this, I would expect that you’d get the overheads down to a few percent.

Edit: Take into account current performance hits for e.g. Spectre, Meltdown patches.


> insincere

I was not criticizing the Morpheus CPU. I was using it to make a tangential point about other possible design choices.


Fair enough!


Are you talking about about both Burroughs B5000 and Lisp Machines having a tagged architecture[0]?

[0]: https://en.wikipedia.org/wiki/Tagged_architecture


Mostly, yes. But it’s not just tags, it’s the whole notion of a machine instruction set designed for post-von Neumann-architecture programming. I.e. the instruction set knows that there are such things as functions, objects, processes, users, etc. instead of the OS having to emulate all these things on top of an agnostic and uncaring machine code running on raw RAM.


I wish this was more a technical paper. The interview tries to dumb it down to the point where you can't follow what they actually did, or what type of attacks are actually in scope.

Is this just encrypting pointers, where the encryption key changes very frequently?


There is a technical paper; just that's not it.


I'm a bit confused - how do you do pointer arithmetic if the cpu encrypts pointers for your automatically and doesn't let you know the actual address in memory that they point to?


I know nothing of the implementation, but hypothetically a compiler could translate:

    p++
to instructions that effectively do:

    p = enc(dec(p)+1)
Without exposing p outside the CPU. The implementation described (seems to be) new instructions for incrementing and dereferencing pointers.


In which case, can't you just use it in your shell code? I guess the defense is info leaks are useless so you can't defeat aslr(?)


(that should have read without disclosing dec(p))

But yeah, you'd have to end up with equivalents to p* = 123 which becomes (dec(p)* = 123 etc. My guess is that due to the simon cipher they're allocating full blocks whenever you do the safe version of malloc, and then do a range check to ensure that there is no overrun when storing / incrementing etc.


This is, from what I gather from the paper, because the pointers are not individually encrypted via cipher. Rather, pointers in the same domain (domains being identified via a tagging scheme) are all displaced together by a common value, which comes from that encryption engine. So imagine that some pointers p and q are actually p + c and q + c under the hood. When the machine churns, c will change, and so all pointers have to be updated. There is some sort of moving barrier which indicates which parts of memory are "clean" (ready for use) and which are still using the old key.


> you change what it means to add a value to a pointer.

Morpheus encrypts pointers. How does it allow C to access an array or a struct and prohibit access beyond bounds to the return address somewhere beyond the bounds of said array or struct, I wonder.


Maybe it uses homomorphic encryption?


That would be a much more impressive breakthrough if they did with only 10% overhead.

But it also is not a good match for the problem they are trying to solve (the attacker is not the cpu in this situation)


Perhaps it could have pointer arithmetic instructions, where "ptr + n" makes code that uses special instructions to decrypt "ptr", add "n", and encrypt the result before giving it back to you?


I don't see how that works. If such an instruction exists, why couldn't an exploit developer use it to defeat pointer encryption?


Homomorphic encryption does that, yes. But in the interview, he talks about it hypothetically.


That’s not really homomorphic encryption, IIUC. In homomorphic encryption, you don’t even need decryption and re-encryption steps.


Normal encryption does that. The entire point of Homomorphic encryption is that it does not do a decrypt step.


This reminds of Ghost in the Shell, the anime. There is a concept of a barrier maze that has to be solved before gaining access. There is also a concept of an attack barrier if one tries to force their way into a system.


>"IEEE Spectrum: So you can detect an attack just by looking at what’s being looked at?

Todd Austin: Yeah. Let me give you a classic example—the return stack. [When a program “calls” a subroutine into action, it stores the address the program should return to when the subroutine is over in the return stack.] This is a real popular place for attackers to manipulate to do what’s called a buffer overflow or a code injection. There’s only one part of the program that writes and reads from the return address, and that’s the beginning of a function and the end of a function. That’s really the semantics of the return address. But when you see a program attacking the return address to do code injection, for example, you’ll see writes from many different places in the program. So, these sort of unexpected, undefined accesses, normal programs never do. That’s a tell-tale sign that there is an attack going on. Now, some systems might try to detect that. And when they detect that, then they can kill the program."

PDS: Observation: An x86 emulator, such as Bochs, could be programmed such that it "knew" about where any given program/processes' stack is -- and explicitly looked for/flagged/logged/warned/alerted and possibly stopped (and dumped for further analysis) -- programs/processes whose stacks were accessed/changed in questionable ways...

Also, this leads to my next thought...

Why couldn't all processes' stacks -- have some form of hardware/CPU protection?

I mean, we have Ring 0 for OS code, and Ring 3 for user program code -- this leaves Rings 1 and 2 free for other purposes... well, why not put program/process stacks there, and guard access to them via the OS?

(Now, a better idea would be a custom CPU designed to know about and explicitly protect each process's stack memory -- but short of that, x86 has those extra rings...)

Yes, it would make programs slower.

Possibly much slower, depending on what's being done. And of course, the program wouldn't have immediate access to its stack variables, without OS intervention...

Maybe part of the solution would be in what Forth did a long time ago -- separate the call/return (control) stack from the local variables stack...

Like, maybe if the call/return stack -- is placed in a Ring that the OS controls (requiring an OS syscall -- which could also log/alert in the case of an anomaly), but the data (local variable) stack is still kept in a common area...

Yes, compilers would have to be modified to support that scheme, and yes, it would be slower... but could it work?

I don't think it would prevent all kinds of attacks (I don't think anything could) -- but it might help mitigate some of the more common stack-based attacks...

Also, the idea of encrypted pointers -- both on the stack and not -- is a good idea.

It would be slow if not implemented in hardware, unless the encryption was a simple XOR with some known-only-at-runtime value (which would be a bit slower than without it) -- but that would be the fastest thing that could implement an approximation of that scheme in software...

Anyway, a lot of interesting ideas in this paper...


Most of these already exist

Rings are code privilege levels, they don't "protect" memory, page tables do that, and the implementation to protect the stack has been around for over a decade - DEP

call/return stack detection is very new, afaik only current generation intel processors have it under the name CET (control-flow enforcement technology). This works by a processor-level "shadow stack," which is pushed to on calls and popped from on returns, and throws a fault if the actual stack is returning to a different location

Pointer encryption is available on ARM as PAC (pointer authentication). Afaik there's no equivalent on x86 processors.

Hardware mitigations are gaining more and more traction, so hopefully all of them will be available across all major architectures within the next decade or so.

EDIT: another huge mitigation that's starting to gain traction is process-level virtualization. Qubes is most known for it, but Windows has been building more support for it (currently only edge and office support it, under the name "Application Guard"). The idea is that if a malicious actor is able to exploit a program, they would also have to break out of the vm the app is in to gain access to the full system, which can be a much more difficult task.


>Most of these already exist

On a general level, I agree.

But, if we get really specific, "The Devil Is In The Details"... let me elaborate...

>Rings are code privilege levels, they don't "protect" memory, page tables do that,

Page tables describe aspects of blocks of memory, if I recall correctly, they describe such things as the physical memory a page is mapped to, if the page is swapped in or out, and possibly the privilege level of code that supposed to be executing there.

But the idea that they "protect" memory is, well, an abstraction at best...

First off, OS, running at ring level 0 has access to all memory. That is because even if there is a page-based protection scheme in place via CPU data structures, the OS is typically privileged enough to modify those data structures such that the memory in question can be accessed.

Second, it's an abstraction.

At the CPU level, the CPU knows the privilege level of the code it is running at any time, and is supposed to, is supposed to (and it's a big if, given all of the complexity, the microcode, etc.) under the hood, that it actually protects that memory.

For example, there was an old AMD "bug" (and it may have applied to Intel too) where if one of the "string" registers (ESI or EDI, I forgot) -- was set to a specific "magic" value -- then any memory could be accessed regardless of protection level!... so much for "page tables protecting memory"... (https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...)

Oh, I'm sure it's fixed now... but as the old expression goes, "fool me once, shame on you, fool me twice, shame on me..."

>and the implementation to protect the stack has been around for over a decade - DEP

DEP is basically marking a page in memory as "Non-Execute" (that is, as "Data" as opposed to "Code"). When this is done to the memory where a program's stack is located, then the CPU's instruction pointer (EIP, RIP, etc.) -- cannot be changed to an address there without triggering an exception. In other words, code, should it exist there, cannot be executed.

But... can data there, as data, be changed?

In other words, we're not debating if DEP prevents a control-flow violation, by allowing a CPU's instruction pointer to be changed to a memory address which is on a program's stack.

We agree that DEP prevents that.

But, you see, the way x86 programs typically work, they need to be written with CALL and RET instructions, because that's how higher-level languages compile procedure/function/method calls. CALL pushes the current executing address (the CPU's instruction pointer, EIP/RIP) on the stack and changes it to the address of the procedure/function/method being called. Control resumes at the address in memory representing that called procedure/function/method. When it's time to return from it, RET does the opposite -- it pops the previous address (well, plus a few bytes to skip the CALL instruction!) back into EIP/RIP -- and control resumes where it had left off, that is, in the earlier function that CALL'ed the later one.

Now, what's wrong with that?

Well, on the one hand, nothing, because that's the way x86 programs typically work.

On the other hand, everything (and this is where we have our debate!) -- because

CALL and RET instructions are permitted to change and modify the stack!

Other x86 instructions -- any x86 instruction that accesses memory (as long as it's running in a process) -- is also permitted to change and modify the stack (of that process!)!

That's because the stack -- is memory!

Now, it may very well be that later iterations (and if so, we're talking very late!) of Intel/AMD/? x86 processors spotted this, and restricted by whatever means the use of some of those instructions (but then, I'm not really sure what a program is supposed to do when passing local variables from one function to another, because in order to do so, modifying the memory of the stack is required, because how else do you pass those values?)

You see, it's possible for a buffer overflow attack, one which gains the ability to execute code locally, in a local process (aka "program"), to easily modify the stack of that process (because it appears as legitimate code to the CPU!) -- whether or not DEP is enabled or not !!!

What I proposed in my comment, above, is a radically different departure from using stacks in the normal sense -- my proposal requires separating the control flow portion (CALL/RET) of the stack from the data (local variables) portion, moving that into OS memory.

That requires an OS specially written for that purpose, and a compiler also specially written for that purpose (basically CALL/RET or anything stack-control-flow oriented is replaced with OS calls. These make the code slower, but you gain an increase (although not perfect) in security from worms and buffer overflows...

>call/return stack detection is very new, afaik only current generation intel processors have it under the name CET (control-flow enforcement technology).

But, this could be enabled on older processors if the stack is moved to OS control...

>This works by a processor-level "shadow stack," which is pushed to on calls and popped from on returns, and throws a fault if the actual stack is returning to a different location

A "shadow stack" is a good idea. While not perfect, it is a step in the right direction...

>Pointer encryption is available on ARM as PAC (pointer authentication). Afaik there's no equivalent on x86 processors.

I know I said that this was a good idea, and it is, it's another step in the right direction, but that being said, I wouldn't trust this (at least, not 100%) simply because hardware does not have the same transparency as easily readable source code... On an FPGA Soft CPU with no hidden cores, maybe (again, if the user has the the HDL VHDL/Verilog code for the Soft CPU)...

>Hardware mitigations are gaining more and more traction, so hopefully all of them will be available across all major architectures within the next decade or so.

I assume zero-trust for anything I can't see, I can't modify, that I don't know the full workings of...

>EDIT: another huge mitigation that's starting to gain traction is process-level virtualization. Qubes is most known for it, but Windows has been building more support for it (currently only edge and office support it, under the name "Application Guard").

I do not trust Windows. Or Linux either for that matter. Or Qubes, or Virtual Machines... or any so-called "secure" OS.

I trust the vendor who says: "This is the worst written OS from a security perspective, and your best bet is not trusting us" -- because at least if they say that, they are being honest...

And that "vendor" -- is usually a simple, hobbyist-level, not-overly complex, well documented open source OS, which still might (and probably does!) have any number of security issues!

With respect to Qubes:

As soon as you are running on a CPU that supports nested page tables (nested virtualization) -- all bets are off, because your OS, running inside of your VM -- might be running on another VM, which might be running on another VM, which might be running on another VM, practically ad-infinitum...

You sure there isn't another VM -- a VM that you don't control -- a VM that you don't know about -- that your highest-level VM is running in?

Because I'm not...

The distinction between "VM" and "bare metal" and "VM that you don't know about appearing as bare metal" -- is starting to blur these days... read all you can about Rootkits -- they are the entry-level understanding to this...

Oh yes, anything that you or I have said about page tables or OS control or protection -- goes right out the window when a higher-level VM is present and operating... that's because higher-level VM's have control over everything,

including the page tables of lower-level VM's and and OS's that they run...

What's the old expression?

"Eternal vigilance is the price of liberty?"

>The idea is that if a malicious actor is able to exploit a program, they would also have to break out of the vm the app is in to gain access to the full system, which can be a much more difficult task.

Do me a favor... Google "VM Escapes" -- and then spend about a year and a half reading academic papers about them...

In fact, don't Google it... go to Millionshort and search for it there (also, for future readers 3+ years in the future, don't go to Millionshort -- find your own not-yet-superpopular up and coming search engine, and search on it -- since all of them seem to get censored after a certain number of years -- but you can go to Millionshort on or around the date of this message, 2021/04/17)

https://millionshort.com/

Don't forget to remove the first, oh, hundred thousand or so search results! <g>

Also, if any of this offends you, well, that wasn't the intention...

I only pass what I believe to be truthful information that is coalesced from many sources, edited into an easily readable format...

If you dislike anything, you dislike the information, not the conduit...


Isn't it sort of what OpenBSD(?) does with having return/entrance pointers masked with an additional bits that only kernel knows? (my memory is really vague on details, sorry)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: