Keep in mind: in a variable length instruction set like x86, you can jump into the middle of an instruction, and have it interpreted as another instruction. If you have
mov $0x123, %r16
you can think of it as a sequence of bytes, which (oversimplified) encodes to something like:
if you jump to address 2, you're in the middle of your constant. If that constant happens to be a malicious opcode, you're screwed.
So you can't just decode instructions starting from the program entry to find out if an opcode is trying to execute malicious code. You need to find all possible paths, including all computed jumps, to make that decision.
It's particularly bad with x86 due to the enormous variety of encodings and prefixes accumulated over the last half-century in the architecture. For example, the two instructions:
Even better, there are multiple encodings of those instructions. And even better than that, there are Turing-complete subsets comprised only of printable ASCII characters. And even better than that, there are subsets of x86 the instruction set that can masquerade as readable English text: https://news.ycombinator.com/item?id=16312317
If your program counter ever diverges off to start executing unsanitized user data on x86, you are potentially pwn'd.
What about CET and the ENDBR64/32 instructions? Isn’t there a “shadow stack” as well? Not sure if this is possible with a modern x86 OS, but it also wouldn’t surprise me if there were workarounds either.
> The ENDBRANCH (see Section 73 for details) is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement. On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the processor pipeline to detect control flow violations.
> The CPU implements a state machine that tracks indirect jmp and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state. In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH. If an ENDBRANCH is not seen the processor causes a control protection exception (#CP), else the state machine moves back to IDLE state.
Random comment that has a bunch of semi-interesting info in it: https://news.ycombinator.com/item?id=26061230 (the bit about the lack of Linux support is potentially(?) out of date)
I watched a super interesting Black Hat video on youtube that talked about discovering secret instructions on CPUs by iterating over each bit of opcodes until you get an illegal instruction, and thereby discovering if an opcode is valid or not.
He set up a room full of PCs running his code and had hardware to auto-reset them when they crashed.
That's sounds like one of those tasks that's isomorphic to the halting problem. Replace the malicious opcode sequence with "HALT" and your static analyzer can determine whether an arbitrary program can halt, something known to be impossible.
This is a really good distinction to make - people use the halting problem as an excuse to avoid doing any sort of analysis because it can't work in general and give a fully definitive answer - but most analysis solutions don't need to be fully general or fully definitive in order to be valuable.
Yes, but deciding whether any program can halt is exactly what's called for here. You're operating on x86 machine code, which is Turing-complete, and you don't have the option of restricting to a non-Turing-complete subset of x86 machine code without significantly shrinking your addressable market. (There's already been significant research on creating provably-secure or provably-terminating instruction sets, with some success in specific problem domains, but they tend to fail with general purpose personal computing because consumers won't use your product if it means SimCity won't work.)
Your argument can be seen to be invalid because it attempts to apply to any Turing-complete architecture, when in fact the problem is more architecture-specific than that.
In an architecture that used fixed-length instructions and required the program counter to be aligned on them -- or had some other mechanism that disallowed overlapping instructions -- it would be easy to scan for an opcode like this. (Assuming you can distinguish the instructions from the data segment, anyway.) Such a solution doesn't run into any problems with the halting problem or Rice's theorem because it's not attempting to tell you with certainty whether the program will hit that opcode; it just gives you a one-sided test that says, aha, it could hit that opcode (because it's present) and therefore the program is suspicious. A negative result guarantees it won't hit the opcode, a positive result doesn't guarantee anything.
It's only x86's allowing of overlapping instructions that makes this problem hard. I mean, notionally you could scan for such an opcode at every offset, it's just that the false positive rate might be unacceptably high. (Whereas in the fixed-length instruction case, it wouldn't be, because a program using one of these opcodes at all truly is suspicious, regardless of whether it's hit. Although, again, I'm assuming you can somehow filter out the data segment; if you can't, then we have more of a problem.)
Oh, wait -- there's a hole in my claim above. Self-modifying code. Well, it works if we also disallow self-modifying code; this still leaves things as Turing-complete and so still shows that your argument cannot be valid.
It's possible to prove that a given x86 binary does not have malicious opcodes. Trivial examples would be a zero-length program, or one containing only NOP instructions. You just scan the program and if the byte sequence representing malicious opcodes never appears, you know that you're in the clear.
That's not the problem being posed here, though. To be useful as a security feature, it needs to be able to work on arbitrary x86 programs, i.e. ones that can contain potentially malicious opcode sequences. And many of these programs will not in fact be malicious, i.e. they never jump into the middle of some other legit opcode to have it interpreted differently. A one-sided test that tells you it could hit that opcode isn't useful if it flags every useful program on the computer.
Yes, I don't disagree. My point is that your argument above for that claim is invalid, as your argument is based on purely the Turing-complete nature of x86, while other Turing-complete architectures do not have this problem.
I know that in the past legitimate programs did obfuscations like this for anti-piracy reasons. But do they still? Or can it nowadays be considered a red flag?
These instructions only work if the CPU is in a particular, unusual state ("Red Unlocked"); even then, they require a value to be written to a MSR, which is a privileged operation. It's highly unlikely that there is any way to activate this functionality from userspace. Indeed, it's not clear that it's possible to activate without physical intervention (e.g. JTAG access).
But in the paper they explicitly test the possibility that the undocumented instructions could be speculatively executed and have visible side effects.
They found that the instructions are actually executed, but they couldn’t jump to arbitrary microcode (a side effect of one of the undocumented instructions) because of a single lfence ucode op. They proved that at least on the atom processor they could unlock, the speculative execution of the undocumented instruction calculated an offset and placed it in internal registers even though the instruction was executed from the “green” (normal) operating mode.
no way - x86 has no instruction alignment requirements, so the same bytes could be an immediate in some innocuous instruction. Also the same bytes can be run two ways in x86 and this is sometimes used. Plus, code can be created dynamically on almost all platforms except iOS.