Hacker News new | past | comments | ask | show | jobs | submit login
Undocumented x86 instructions to control the CPU at the microarchitecture level [pdf] (githubusercontent.com)
144 points by ingve 23 days ago | hide | past | favorite | 45 comments



Undocumented x86 instructions to control the CPU at the microarchitecture level [pdf] - https://news.ycombinator.com/item?id=27764806 - July 2021 (27 comments)


(Intel maintains a database with passwords of all manufactured chips.)

I find this... somewhat hard to believe, both because Intel's volumes are huge and it would mean something like the original Pentium III Processor Serial Number came back to haunt us.

There was recently a leak of Intel internal information, but based on the contents of this article, it seems that was really mundane (schematics, etc. --- which any companies buying and using their chips would easily have access to) in relation to the true depth of the secrets that lie within these processors... part of me really wishes there was far more that got leaked, because besides the security aspect, and the despicable practice of companies hiding information on devices from their actual owners and locking them out, this stuff is just intensely curiosity-satiating.

however, the existence of such instructions poses a security threat since there is publicly available PoC code [9] that can activate the Red Unlock mode on one of the modern Intel platforms.

Security threat, or maybe path to freedom...?

As an aside, the VAX microcode (and hardware schematics) can be easily found, for those interested in this stuff and looking for something less secretive.


Every security threat is a path to freedom: Security means you have a treasure, an attacker and a defender. In this case, the treasure is data inside the processor, the attacker is the current owner, the defender is a vague group of hollywood, previous owners, intel, governements, etc...

Freedom means the owner (laptop thief) can steal the data from the previous owner (victim). Freedom means the owner can steal movies from hollywood. Freedom means Linus could create an OS without the blessing of Microsoft, etc... The problem is it is everything or nothing. So if you're a victim of theft, you want security. If you're pirating, you want freedom. If you want to clone the DVD your toddler keeps watching and destroying, you also want freedom. If you want to fully use all capabilities of your device, you want freedom.


This is an incredibly one dimensional view. There is "owner" in the sense of "who currently physically possesses it" and "owner" as in "rightfully, legally belonging to regardless of current physical location".

Freedom in the OP context means we actually own the things we buy rather than existing in a total corporate oligarchy were we only ever lease the temporary right to use a product but never actually own it in the legal sense.


You're correct there. The problem is that technology can't think in shades of gray. A TPM can only have an incredibly one dimensional view.

Take your own example of "owner" as in "rightfully, legally". There can be a huge difference between rightfully and legally. Think about all the things western nations took away legally from 'their' colonies. Or the completely legal civil forfeiture procedures in the US. Plenty of multidimensional trouble just by putting the word rightful after legal.

I want to actually own my hardware, hack mu hardware, and I would love to see things like TPMs and DRM and locked phones banished from this planet, even if this encourages some people to steal hardware. But plenty of corporations love bitlocker and love to trade away the useless hackability of hardware. And after talking for a few scary minutes with a movie exec, I got the impression they see computers as nothing more but theft machines and the current copyright enforcement mechanisms as way to inadequate.


>As an aside, the VAX microcode (and hardware schematics) can be easily found, for those interested in this stuff and looking for something less secretive.

http://www.bitsavers.org/pdf/dec/vax/780/fiche/

http://bitsavers.trailing-edge.com/pdf/dec/vax/780/fiche/

Filenames for general search (in case the above links aren't found in the future):

"EP-ES0AA-DL-124_1of6_780uCode_Jan82.pdf" to

"EP-ES0AA-DL-124_6of6_780uCode_Jan82.pdf" (There are 6 of them, matching the same 'Xof6' pattern, in case that wasn't obvious...)

Related:

"First new VAX in 30 years?"

https://news.ycombinator.com/item?id=27758962

http://mail-index.netbsd.org/port-vax/2021/07/03/msg003899.h...


In case anybody doesn't feel comfortable downloading a PDF within the context of the title :)

https://github.com/chip-red-pill/udbgInstr/blob/main/paper/u...


Turns out that opening it in the browser downloads it too.


You can't view it if you don't have the data to show it!


Interesting, the conclusion only speculates that this can have security consequences. Anyone with better understanding that could provide an example of what would be possible to do using these instructions?


It should be possible to see if a given "secure" binary makes use of a given microcode instruction. If that is the case, it could be possible to use udbgrd/udbgwr to modify the microcode instruction to do something else, e.g. to store some register data in the URAM or the SRAM. By doing this you should be able to "spy" the data the "secure" binary is processing.


anything! if you can get them to execute. the authors found one way to enable them, but others may exist (we do not know). maybe another secret instruction enables these also. we do not know.



Should we be able to scan binaries for these instructions? I guess an exploit should be able to call these instructions.


Keep in mind: in a variable length instruction set like x86, you can jump into the middle of an instruction, and have it interpreted as another instruction. If you have

    mov $0x123, %r16
you can think of it as a sequence of bytes, which (oversimplified) encodes to something like:

    0: $0xb8 # opcode for mov
    1: $0x10 # destination register
    2: $0x23 0x01 0x00 0x00 # constant
if you jump to address 2, you're in the middle of your constant. If that constant happens to be a malicious opcode, you're screwed.

So you can't just decode instructions starting from the program entry to find out if an opcode is trying to execute malicious code. You need to find all possible paths, including all computed jumps, to make that decision.


It's particularly bad with x86 due to the enormous variety of encodings and prefixes accumulated over the last half-century in the architecture. For example, the two instructions:

  mov rax, [table + rax*8] 
  mov [table + rax*8], rax
plus some lookup tables make a Turing-complete subset: https://drwho.virtadpt.net/files/mov.pdf (With jumping via conditional faulting done with an invalid MOV, of course.) There are many other subsets. And even C compilers targeting them: https://github.com/xoreaxeaxeax/movfuscator

Even better, there are multiple encodings of those instructions. And even better than that, there are Turing-complete subsets comprised only of printable ASCII characters. And even better than that, there are subsets of x86 the instruction set that can masquerade as readable English text: https://news.ycombinator.com/item?id=16312317

If your program counter ever diverges off to start executing unsanitized user data on x86, you are potentially pwn'd.


That last one (Tom 7's executable x86 paper) is incomplete without the video where he demonstrates running the paper (or running the paper yourself).

https://www.youtube.com/watch?v=LA_DrBwkiJA


The chance you're pwn'd is really unlike as generating these kinds of binaries are nearly impossible.


What about CET and the ENDBR64/32 instructions? Isn’t there a “shadow stack” as well? Not sure if this is possible with a modern x86 OS, but it also wouldn’t surprise me if there were workarounds either.


CET = Control-flow Enhancement Technology

TL;DR:

> The ENDBRANCH (see Section 73 for details) is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement. On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the processor pipeline to detect control flow violations.

> The CPU implements a state machine that tracks indirect jmp and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state. In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH. If an ENDBRANCH is not seen the processor causes a control protection exception (#CP), else the state machine moves back to IDLE state.

Refs:

Above text: https://binpwn.com/papers/control-flow-enforcement-technolog... (from page 8, page 15 also has info)

"How do prior processors execute these?": https://stackoverflow.com/questions/56120231/how-do-old-cpus...

Random example of someone trying to leverage this capability to harden an app: https://github.com/dotnet/runtime/issues/40100

Random comment that has a bunch of semi-interesting info in it: https://news.ycombinator.com/item?id=26061230 (the bit about the lack of Linux support is potentially(?) out of date)

LWN article: https://lwn.net/Articles/758245/


I watched a super interesting Black Hat video on youtube that talked about discovering secret instructions on CPUs by iterating over each bit of opcodes until you get an illegal instruction, and thereby discovering if an opcode is valid or not.

He set up a room full of PCs running his code and had hardware to auto-reset them when they crashed.

https://youtu.be/KrksBdWcZgQ


That's sounds like one of those tasks that's isomorphic to the halting problem. Replace the malicious opcode sequence with "HALT" and your static analyzer can determine whether an arbitrary program can halt, something known to be impossible.


The halting problem concerns deciding whether any program will halt.

It's definitely possible to decide whether some programs will halt.


This is a really good distinction to make - people use the halting problem as an excuse to avoid doing any sort of analysis because it can't work in general and give a fully definitive answer - but most analysis solutions don't need to be fully general or fully definitive in order to be valuable.


Yes, but deciding whether any program can halt is exactly what's called for here. You're operating on x86 machine code, which is Turing-complete, and you don't have the option of restricting to a non-Turing-complete subset of x86 machine code without significantly shrinking your addressable market. (There's already been significant research on creating provably-secure or provably-terminating instruction sets, with some success in specific problem domains, but they tend to fail with general purpose personal computing because consumers won't use your product if it means SimCity won't work.)


Your argument can be seen to be invalid because it attempts to apply to any Turing-complete architecture, when in fact the problem is more architecture-specific than that.

In an architecture that used fixed-length instructions and required the program counter to be aligned on them -- or had some other mechanism that disallowed overlapping instructions -- it would be easy to scan for an opcode like this. (Assuming you can distinguish the instructions from the data segment, anyway.) Such a solution doesn't run into any problems with the halting problem or Rice's theorem because it's not attempting to tell you with certainty whether the program will hit that opcode; it just gives you a one-sided test that says, aha, it could hit that opcode (because it's present) and therefore the program is suspicious. A negative result guarantees it won't hit the opcode, a positive result doesn't guarantee anything.

It's only x86's allowing of overlapping instructions that makes this problem hard. I mean, notionally you could scan for such an opcode at every offset, it's just that the false positive rate might be unacceptably high. (Whereas in the fixed-length instruction case, it wouldn't be, because a program using one of these opcodes at all truly is suspicious, regardless of whether it's hit. Although, again, I'm assuming you can somehow filter out the data segment; if you can't, then we have more of a problem.)


Oh, wait -- there's a hole in my claim above. Self-modifying code. Well, it works if we also disallow self-modifying code; this still leaves things as Turing-complete and so still shows that your argument cannot be valid.


It's possible to prove that a given x86 binary does not have malicious opcodes. Trivial examples would be a zero-length program, or one containing only NOP instructions. You just scan the program and if the byte sequence representing malicious opcodes never appears, you know that you're in the clear.

That's not the problem being posed here, though. To be useful as a security feature, it needs to be able to work on arbitrary x86 programs, i.e. ones that can contain potentially malicious opcode sequences. And many of these programs will not in fact be malicious, i.e. they never jump into the middle of some other legit opcode to have it interpreted differently. A one-sided test that tells you it could hit that opcode isn't useful if it flags every useful program on the computer.


Yes, I don't disagree. My point is that your argument above for that claim is invalid, as your argument is based on purely the Turing-complete nature of x86, while other Turing-complete architectures do not have this problem.


Look up 'Posts Correspondence Problem' which is almost like this problem and is probably equivalent to the halting problem.


I know that in the past legitimate programs did obfuscations like this for anti-piracy reasons. But do they still? Or can it nowadays be considered a red flag?


this is actually something various JITs have to deal with as it provides an attack vector for loading shell code


Regular compilers also deal with this now as well. They try to avoid generating an ENDBR as an argument when possible IIRC.


These instructions only work if the CPU is in a particular, unusual state ("Red Unlocked"); even then, they require a value to be written to a MSR, which is a privileged operation. It's highly unlikely that there is any way to activate this functionality from userspace. Indeed, it's not clear that it's possible to activate without physical intervention (e.g. JTAG access).


But in the paper they explicitly test the possibility that the undocumented instructions could be speculatively executed and have visible side effects.

They found that the instructions are actually executed, but they couldn’t jump to arbitrary microcode (a side effect of one of the undocumented instructions) because of a single lfence ucode op. They proved that at least on the atom processor they could unlock, the speculative execution of the undocumented instruction calculated an offset and placed it in internal registers even though the instruction was executed from the “green” (normal) operating mode.


no way - x86 has no instruction alignment requirements, so the same bytes could be an immediate in some innocuous instruction. Also the same bytes can be run two ways in x86 and this is sometimes used. Plus, code can be created dynamically on almost all platforms except iOS.

so in short: no


That would help but self modifying code after a stack smash could still JIT them.



Except that these latest findings are actually undocumented instead of known debug instructions (albeit still technically impressive)


How do people find these instructions?


Perhaps someone accidentally left the doors open on the Atom CPU which allowed them to run a disassembler.

"In mid-2020, our team managed to extract microcode for modern Atom processors that are based on the Goldmont microarchitecture. It became possible to do this on Atom Goldmont systems-on-chip (SoCs) due to an arbitrary code execution vulnerability in Intel CSME (Intel-SA-00086)."


Probably using something like that: https://github.com/xoreaxeaxeax/sandsifter


yeah, it's cited in the paper:

[4] C. Domas. Breaking the x86 ISA.https://www.blackhat.com/docs/us-17/thursday/us-17-Domas-Bre..., Jul. 2017.


Reminded me his cool setup: https://youtu.be/_eSAF_qT_FY?t=1972


They have been reverse engineering Atom for over half a decade (possibly longer) now.




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: