
Undocumented CPU behavior: analyzing undocumented opcodes on Intel x86-64 (2018) [pdf] - luu
https://www.cattius.com/images/undocumented-cpu-behavior.pdf
======
saagarjha
Associated GitHub repository with more information:
[https://github.com/cattius/opcodetester](https://github.com/cattius/opcodetester)

I know next to nothing about processors at this level, but I wonder if it
would be possible for a skilled engineer to try to find these instructions by
scrutinizing the actual physical instruction decoder on the chip and/or
inspect the processor's microcode. Are these things possible to do? If they
are, is it feasible to reverse engineer them?

~~~
jfkebwjsbx
Theoretically possible, practically impossible.

For two reasons: the equipment needed (extremely expensive) and the complexity
of the task (transistors are placed by software, not humans anymore).

~~~
saagarjha
What kind of complexity are we looking at, roughly? Surely someone with deep
pockets and the necessary expertise would be interested in trying to find
these kinds of things, no?

~~~
alxlaz
A modern CPU has a transistor count in the billions/low tens of billions. I
haven't really thought about it but I'm tempted to say that looking at the
decoder stage(s) alone won't do. Undocumented operation doesn't have to be in
the form of an entire undocumented instruction. You could design the device so
that the "right thing" would happen simply by scheduling the right
instruction, with the right arguments, under the right conditions (the right
execution unit, the right amount of pipeline clog etc.). The whole thing is
significantly more complex than the "fetch-decode-execute" diagrams would have
you believe -- execution isn't strictly sequential, executing exactly the same
instruction won't cause the exact same transistors to "fire" each time etc..

So the level of complexity is pretty daunting. IMHO if you want to find out
undocumented behaviour that was deliberately introduced, you're better off
looking at other methods, no matter how deep your pockets are.

~~~
dmitrygr
> A modern CPU has a transistor count in the billions/low tens of billions

A large percentage of that is simply 6T/cell SRAM L1/L2/etc cache though

~~~
alxlaz
Certainly. But even those can be routed so that the WR signal for a particular
address also doubles as half of the AND input which causes a read from another
range to always return zero, for example. (That's not an "undocumented
opcode", of course, but it can be used maliciously). It's certainly not easy
to do this kind of meaningful obfuscation though, especially between different
blocks, since different blocks are usually in different clock domains, too.

Edit: sorry, my neurons got all jumbled and I was thinking of a far more
general case, i.e. undocumented behaviour in general, not just undocumented
instructions. Indeed, only a relatively small subset of all these transistors
is relevant in terms of undocumented instructions specifically.

------
kchoudhu
Related talk about this from Blackhat a few years ago:

[https://www.youtube.com/watch?v=KrksBdWcZgQ](https://www.youtube.com/watch?v=KrksBdWcZgQ)

~~~
anon73044
Unfortunately Chris works for Intel now so I don't think he'll be giving any
more of these talks in the future. (At least until his NDA expires)

~~~
kchoudhu
All good things get eaten by the majors eventually, it would seem.

------
userbinator
I believe the first "page" of opcodes (i.e. 1-byte opcodes, the ones that
don't start with 0F) has already been extensively researched and documented,
at least in 16 and 32-bit mode; the interesting things are all in the "second
page", the ones that begin with 0F and are relatively new instructions, and
the awkward and somewhat inconsistent way in which 64-bit mode was
implemented.

Also, the fact that they're trying to test undocumented behaviour from within
a full OS was a bit unexpected; in the retrocomputing community, where CPUs
like the Z80 and 6502 have been studied extensively, the usual way of testing
undocumented behaviour is to boot into a very minimal environment whose only
purpose is test that behaviour, so as to eliminate any other variables from
the process. Logic analysers/bus monitoring are also used sometimes, although
that might be harder with a modern high-speed CPU.

~~~
s_gourichon
"High speed" shouldn't be a concern, should it? By adjusting the clocks I
believe you can run the CPU as slow as you wish.

Complexity and the ratio of visible behavior over unobservable state is
astronomically worse than for a 8bit CPU and therefore a concern, still.

~~~
userbinator
Due to things like dynamic logic and PLLs
([https://en.wikipedia.org/wiki/Phase-
locked_loop](https://en.wikipedia.org/wiki/Phase-locked_loop) ), modern CPUs
can't clock down into the tens of MHz range or lower. There's also the issue
of things like DRAM refresh.

------
kken
It would actually be interesting to see examples of actual undocumented
opcodrles. There are none in the linked article.

~~~
guidedlight
But then they would be documented.

Think McFly!

~~~
kken
Think more! One would guess that the search for undocumented opcodes yields
results...?

~~~
cat_easdon
Author here - the main reason there's no examples is because I didn't have any
interesting ones to report at the time! I was trying to develop new detection
methods, but found only the (thousands) of undocumented software prefetches
which were previously reported by Domas in his Sandsifter project, e.g. 0f 0d
/2 and /3-7 on Intel CPUs (these are documented by AMD, but not Intel, and
opcode behavior varies more often between the two than you'd expect). Many of
the interesting undocumented x86 opcodes (e.g. icebp, salc, loadall) were
either only present in older CPUs or are now at least partially documented.
There are some much more interesting undocumented opcodes on other
architectures (which have architectural effects, e.g. changing register
values, halting the CPU), but that's still an ongoing project.

Edit: 0f 0d /2 is documented as prefetchwt1 but (allegedly) unsupported by the
CPUs I tested it on, so the fact it executes at all is undocumented.

------
smitty1e
Undocumented for you doesn't remove the possibility that someone, somewhere,
has a firm grasp of what that opcode does, and why.

~~~
cat_easdon
Author here - the aim of this project was to explore exactly why such opcodes
are problematic for security. Even if they're implemented with entirely
innocent intentions - e.g. for debug+verification purposes - they can lead to
vulnerabilities in operating systems, emulators, and hypervisors. They induce
edge cases which developers can't protect against if they don't know they
exist in the first place (due to the lack of any public documentation).
There's a more thorough writeup of the project here:
[https://github.com/cattius/opcodetester/blob/master/thesis.p...](https://github.com/cattius/opcodetester/blob/master/thesis.pdf).

~~~
smitty1e
I recall some years ago seeing a post on the OpenBSD mailing list about Intel
chip errata and thinking: "I love Big Brother, and Big Brother loves loving
me."

------
floatingatoll
The final thesis behind this slide deck is alongside it in the repo:
[https://github.com/cattius/opcodetester/](https://github.com/cattius/opcodetester/)

------
h2odragon
[https://en.wikipedia.org/wiki/LOADALL](https://en.wikipedia.org/wiki/LOADALL)

You could almost emulate a MMU on a 286

------
s_gourichon
Just thinking aloud (not the only one, obviously).

So is this the combined result of market mechanics? Intel being leader their
top priority was to release the fastest chips at all costs, letting
security/simplicity/sustainability behind? On top of that, complexity becomed
another barrier to competitors. This feels insane, unsustainable.

Whole parts of the industry have already switched to alternative
architectures. MIPS was prevalent in set-top box, then replaced with ARM in
the 2010s. ARM reigns on most mobile devices. Risc-V is on the rise.

Areas craving for performance without concern about power consumption or
security still run on Intel. For how long?

Supercomputers get 10x more power from GPUs than CPUs, switching to an
alternative may come.

Could we imagine the gamer market switching to nVidia on ARM/RISC?

It Intel architecture a huge sinking ship?

~~~
s_gourichon
Well... For the motivation "New flaw in Intel chips lets attackers slip their
own data into secure enclave"
[https://news.ycombinator.com/item?id=22537216](https://news.ycombinator.com/item?id=22537216)
and for the predicted outcome "ARM-ed Mac: Not Again Or For Real This Time?"
[https://mondaynote.com/arm-ed-mac-not-again-or-for-real-
this...](https://mondaynote.com/arm-ed-mac-not-again-or-for-real-this-
time-a3548eece86)

Intel bashing becomes much too easy.

------
guerrilla
Will video for this be uploaded?

~~~
cat_easdon
Sorry - the presentation was never recorded. There's more information in the
GitHub repo, however:
[https://github.com/cattius/opcodetester](https://github.com/cattius/opcodetester).

------
Jahak
nice document

------
stopads
When I learned to program (before 9/11) there was a big emphasis on assembly
language and using low level interfaces to communicate with other hardware.
The idea was that everyone studying computer science should understand every
aspect of the CPU down to the register and operation level, and then be able
to design logic gates to replicate that functionality if needed.

Now we have CPUs that are fundamentally undocumented, unknowable, and
untranslatable. The entire infrastructure of the network, the telecoms, and
the cpu design itself has all been subverted to the needs of the national
security complex or corporate advertising.

I'm not sure what computer science even means anymore. Everything I learned is
completely useless.

~~~
leggomylibro
I'm with you up to the last paragraph.

It's not useless. FPGAs have plummeted in cost and there are now open-source
toolchains for some of them. There's also a Free commercial-grade ISA that you
can use in your personal designs (RISC-V). These days, it is not expensive to
design your own well-understood computer which can run microcode generated by
commercial-grade compilation toolchains such as GCC. Even hardware production
is getting cheapER with shared wafer runs like MOSIS, although custom silicon
is still out of reach for hobbyists.

Chin up, buddy. The US is not the entire world, and the pendulum of our
generations' zeitgeist can still swing back towards the ideals of liberty and
equality of access which the mavens of computing once stood for. You can
already buy ARM application processors from vendors other than Intel/AMD, and
I would be surprised if we lived in a world where every new computer comes
with "management engine" spyware in its CPU for much longer.

~~~
zadokshi
I love your optimism, I’m not sure if I can see a path towards the public
voting for a government that would make the necessary adjustments to reign in
the ability of government powers to influence “management engine” code.

