Should we be able to scan binaries for these instructions? I guess an exploit sh...

ori_b · on July 7, 2021

Keep in mind: in a variable length instruction set like x86, you can jump into the middle of an instruction, and have it interpreted as another instruction. If you have

    mov $0x123, %r16

you can think of it as a sequence of bytes, which (oversimplified) encodes to something like:

    0: $0xb8 # opcode for mov
    1: $0x10 # destination register
    2: $0x23 0x01 0x00 0x00 # constant

if you jump to address 2, you're in the middle of your constant. If that constant happens to be a malicious opcode, you're screwed.

So you can't just decode instructions starting from the program entry to find out if an opcode is trying to execute malicious code. You need to find all possible paths, including all computed jumps, to make that decision.

retrac · on July 7, 2021

It's particularly bad with x86 due to the enormous variety of encodings and prefixes accumulated over the last half-century in the architecture. For example, the two instructions:

  mov rax, [table + rax*8] 
  mov [table + rax*8], rax

plus some lookup tables make a Turing-complete subset: https://drwho.virtadpt.net/files/mov.pdf (With jumping via conditional faulting done with an invalid MOV, of course.) There are many other subsets. And even C compilers targeting them: https://github.com/xoreaxeaxeax/movfuscator

Even better, there are multiple encodings of those instructions. And even better than that, there are Turing-complete subsets comprised only of printable ASCII characters. And even better than that, there are subsets of x86 the instruction set that can masquerade as readable English text: https://news.ycombinator.com/item?id=16312317

If your program counter ever diverges off to start executing unsanitized user data on x86, you are potentially pwn'd.

SAI_Peregrinus · on July 8, 2021

That last one (Tom 7's executable x86 paper) is incomplete without the video where he demonstrates running the paper (or running the paper yourself).

https://www.youtube.com/watch?v=LA_DrBwkiJA

sabas123 · on July 7, 2021

The chance you're pwn'd is really unlike as generating these kinds of binaries are nearly impossible.

ericbarrett · on July 7, 2021

What about CET and the ENDBR64/32 instructions? Isn’t there a “shadow stack” as well? Not sure if this is possible with a modern x86 OS, but it also wouldn’t surprise me if there were workarounds either.

exikyut · on July 8, 2021

CET = Control-flow Enhancement Technology

TL;DR:

> The ENDBRANCH (see Section 73 for details) is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement. On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the processor pipeline to detect control flow violations.

> The CPU implements a state machine that tracks indirect jmp and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state. In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH. If an ENDBRANCH is not seen the processor causes a control protection exception (#CP), else the state machine moves back to IDLE state.

Refs:

Above text: https://binpwn.com/papers/control-flow-enforcement-technolog... (from page 8, page 15 also has info)

"How do prior processors execute these?": https://stackoverflow.com/questions/56120231/how-do-old-cpus...

Random example of someone trying to leverage this capability to harden an app: https://github.com/dotnet/runtime/issues/40100

Random comment that has a bunch of semi-interesting info in it: https://news.ycombinator.com/item?id=26061230 (the bit about the lack of Linux support is potentially(?) out of date)

LWN article: https://lwn.net/Articles/758245/

brokenmachine · on July 9, 2021

I watched a super interesting Black Hat video on youtube that talked about discovering secret instructions on CPUs by iterating over each bit of opcodes until you get an illegal instruction, and thereby discovering if an opcode is valid or not.

He set up a room full of PCs running his code and had hardware to auto-reset them when they crashed.

https://youtu.be/KrksBdWcZgQ

nostrademons · on July 7, 2021

That's sounds like one of those tasks that's isomorphic to the halting problem. Replace the malicious opcode sequence with "HALT" and your static analyzer can determine whether an arbitrary program can halt, something known to be impossible.

bqmjjx0kac · on July 7, 2021

The halting problem concerns deciding whether any program will halt.

It's definitely possible to decide whether some programs will halt.

justinjlynn · on July 8, 2021

This is a really good distinction to make - people use the halting problem as an excuse to avoid doing any sort of analysis because it can't work in general and give a fully definitive answer - but most analysis solutions don't need to be fully general or fully definitive in order to be valuable.

nostrademons · on July 8, 2021

Yes, but deciding whether any program can halt is exactly what's called for here. You're operating on x86 machine code, which is Turing-complete, and you don't have the option of restricting to a non-Turing-complete subset of x86 machine code without significantly shrinking your addressable market. (There's already been significant research on creating provably-secure or provably-terminating instruction sets, with some success in specific problem domains, but they tend to fail with general purpose personal computing because consumers won't use your product if it means SimCity won't work.)

Sniffnoy · on July 8, 2021

Your argument can be seen to be invalid because it attempts to apply to any Turing-complete architecture, when in fact the problem is more architecture-specific than that.

In an architecture that used fixed-length instructions and required the program counter to be aligned on them -- or had some other mechanism that disallowed overlapping instructions -- it would be easy to scan for an opcode like this. (Assuming you can distinguish the instructions from the data segment, anyway.) Such a solution doesn't run into any problems with the halting problem or Rice's theorem because it's not attempting to tell you with certainty whether the program will hit that opcode; it just gives you a one-sided test that says, aha, it could hit that opcode (because it's present) and therefore the program is suspicious. A negative result guarantees it won't hit the opcode, a positive result doesn't guarantee anything.

It's only x86's allowing of overlapping instructions that makes this problem hard. I mean, notionally you could scan for such an opcode at every offset, it's just that the false positive rate might be unacceptably high. (Whereas in the fixed-length instruction case, it wouldn't be, because a program using one of these opcodes at all truly is suspicious, regardless of whether it's hit. Although, again, I'm assuming you can somehow filter out the data segment; if you can't, then we have more of a problem.)

Sniffnoy · on July 8, 2021

Oh, wait -- there's a hole in my claim above. Self-modifying code. Well, it works if we also disallow self-modifying code; this still leaves things as Turing-complete and so still shows that your argument cannot be valid.

nostrademons · on July 9, 2021

It's possible to prove that a given x86 binary does not have malicious opcodes. Trivial examples would be a zero-length program, or one containing only NOP instructions. You just scan the program and if the byte sequence representing malicious opcodes never appears, you know that you're in the clear.

That's not the problem being posed here, though. To be useful as a security feature, it needs to be able to work on arbitrary x86 programs, i.e. ones that can contain potentially malicious opcode sequences. And many of these programs will not in fact be malicious, i.e. they never jump into the middle of some other legit opcode to have it interpreted differently. A one-sided test that tells you it could hit that opcode isn't useful if it flags every useful program on the computer.

Sniffnoy · on July 9, 2021

Yes, I don't disagree. My point is that your argument above for that claim is invalid, as your argument is based on purely the Turing-complete nature of x86, while other Turing-complete architectures do not have this problem.

fooker · on July 7, 2021

Look up 'Posts Correspondence Problem' which is almost like this problem and is probably equivalent to the halting problem.

im3w1l · on July 7, 2021

I know that in the past legitimate programs did obfuscations like this for anti-piracy reasons. But do they still? Or can it nowadays be considered a red flag?

olliej · on July 7, 2021

this is actually something various JITs have to deal with as it provides an attack vector for loading shell code

sabas123 · on July 7, 2021

Regular compilers also deal with this now as well. They try to avoid generating an ENDBR as an argument when possible IIRC.

duskwuff · on July 7, 2021

These instructions only work if the CPU is in a particular, unusual state ("Red Unlocked"); even then, they require a value to be written to a MSR, which is a privileged operation. It's highly unlikely that there is any way to activate this functionality from userspace. Indeed, it's not clear that it's possible to activate without physical intervention (e.g. JTAG access).

ipython · on July 7, 2021

But in the paper they explicitly test the possibility that the undocumented instructions could be speculatively executed and have visible side effects.

They found that the instructions are actually executed, but they couldn’t jump to arbitrary microcode (a side effect of one of the undocumented instructions) because of a single lfence ucode op. They proved that at least on the atom processor they could unlock, the speculative execution of the undocumented instruction calculated an offset and placed it in internal registers even though the instruction was executed from the “green” (normal) operating mode.

dmitrygr · on July 7, 2021

no way - x86 has no instruction alignment requirements, so the same bytes could be an immediate in some innocuous instruction. Also the same bytes can be run two ways in x86 and this is sometimes used. Plus, code can be created dynamically on almost all platforms except iOS.

so in short: no

CalChris · on July 7, 2021

That would help but self modifying code after a stack smash could still JIT them.