
X86 MMU fault handling is turing complete - mman
https://github.com/jbangert/trapcc
======
tptacek
This is more or less the greatest thing I've learned about in the last couple
years.

What's happening here is that they're getting computation _without executing
any instructions_ , simply through the process of using the MMU hardware to
"resolve addresses". The page directory system has been set up in such a way
that address resolution effects a virtual machine that they can code to.

This works because when you attempt to resolve an invalid address, the CPU
generates a trap (#PF), and the handling of that trap pushes information on
the "stack". Each time you push data to the stack, you decrement the stack
pointer. Eventually, the stack pointer underflows; when that happens, a
different trap (#DF) fires. This mechanism put together gives you:

    
    
        if x < 4 { goto b } else { x = x - 4 ; goto a }
    

also known as "subtract and branch if less than or equal to zero", also known
as "an instruction adequate to construct a one-instruction computer".

The virtual machine "runs" by generating an unending series of traps: in the
"goto a" case, the result of translation is another address generating a trap.
And so on.

The details of how this computer has "memory" and addresses instructions is
even headachier. They're using the x86 TSS as "memory" and for technical
reasons they get 16 slots (and thus instructions) to work with, but they have
a compiler that builds arbitrary programs into 16-colored graphs to use those
slots to express generic programs. Every emulator they could find crashes when
they abuse the hardware task switching system this way.

Here's it running Conway's Life:

[http://youtubedoubler.com/?video1=E2VCwBzGdPM&start1=0&#...</a><p>Here's
their talk for a few months back:<p><a
href="http://www.youtube.com/watch?v=NGXvJ1GKBKM"
rel="nofollow">http://www.youtube.com/watch?v=NGXvJ1GKBKM</a><p>The talk is
great, but if you're not super interested in X86/X64 memory corruption
countermeasures, you might want to skip the first 30 minutes.

~~~
0x0
The slides in the github repo (
[https://github.com/jbangert/trapcc/blob/master/slides/PFLA-s...](https://github.com/jbangert/trapcc/blob/master/slides/PFLA-
shmoocon.pdf) ) also have a few interesting points, like "No publicly
available simulator implements this correctly" (how did they record the
youtube video?) and a few vague hints about exploiting this for doing VM
escapes.

~~~
calt
> how did they record the youtube video?

By running it on a physical machine. Unless there is a requirement that the
processor not be multitasking that I am missing.

~~~
mikeash
The Life video pretty clearly shows it running in Bochs. I assume they fixed
it up.

~~~
jbangert
Author here: Actually, the slide 'no publically available simulator implements
this correctly' is somewhat misleading (it had a follow up slide that I cut
and replaced by verbal comments - I should re-add it to the PDF). What I meant
is that the entire mechanism (Task switching, page fault handling, etc.) is
not implemented correctly on any sim, i.e. you have subtly different behavior.
This was quite a challenge in debugging this in the first place, so I actually
wrote the code to work around any bochs quirks I encountered -- just so that I
have a debug environment.

------
jbangert
Author here: While it is true that with the current implementation, memory
access is extremely limited (essentially one DWORD per page, or about 0.1% of
the available physical RAM) that limitation can certainly be avoided. For one,
you could shift how the TSS is aligned (and align them differently for
different instructions), multiplying your address space by a factor of 10 or
so. Furthermore, you could also place another TSS somewhere in memory (only a
few of the variables need to actually contain sane values) with an invalid EIP
and use that as a 'load' instruction.

The easiest way however would be to use the TrapCC mechanism to transfer
control between bits of normal assembler code (perhaps repurposed from other
functions already in your kernel), doing something similar to ROP. Of course,
for additional fun, feel free to throw in BX's Brainfuck interpreter in ELF
and James Oakley's DWARF exception handler. We might drop a demo of this soon,
i.e. implementing a self-decrypting binary via page faults.

~~~
sounds
"memory access is extremely limited (essentially one DWORD per page" –
referring to non-code addresses, yes? In the current (simplest)
implementation, each instruction (a TSS) must be aligned across a page
boundary. You do comment below that altering alignment could increase the
available code space.

I'm wondering what method PFLA uses to read/write non-code addresses. Only one
address per page can be addressed? I'll take a look at the compiler.

By simply expanding the addressing capability, a very tiny program could
emulate an instruction stream from memory, overcoming the limited code space
(at the cost of execution speed).

Cheers!

------
networked
>Move, Branch if Zero, Decrement

This is basically the canonical instruction for OISCs (one instruction set
computers). Wikipedia describes it pretty well:
[https://en.wikipedia.org/wiki/One_instruction_set_computer#S...](https://en.wikipedia.org/wiki/One_instruction_set_computer#Subtract_and_branch_if_less_than_or_equal_to_zero).

------
majke
There was a talk on 29c3 about this. Abstract:

[https://events.ccc.de/congress/2012/Fahrplan/events/5265.en....](https://events.ccc.de/congress/2012/Fahrplan/events/5265.en.html)

video:

<https://www.youtube.com/watch?v=NGXvJ1GKBKM>

------
codex
Another place for root kits to hide.

~~~
tomrod
Could one write a preemptive rootkit that sniffs for other rootkits?

Would that slow things down incredibly?

~~~
simias
That's called an antivirus :)

------
ars
How fast (slow) is this relative to the host CPU?

~~~
simias
Probably incredibly slow given the reduced instruction set and that it relies
on context switches/pushing stuff on the stack for functioning.

------
rocky1138
This is really interesting. In a way, it's a form of computer self-
replication. Could the virtual machine created by the computer be considered
offspring?

Is there a way the virtual machine might spawn another virtual machine child
of its own?

------
ithkuil
if you like this kind of things there is also:

[http://www.cs.dartmouth.edu/~bx/elf-bf-tools/slides/ELF-
berl...](http://www.cs.dartmouth.edu/~bx/elf-bf-tools/slides/ELF-
berlinsides-0x3.pdf)

------
traxtech
That the hardware version of the brainfuck philosophy.

~~~
switch33
Best explanation ever. I second this. lol

------
conductor
Expect this technique in the future malwares and software protection DRM
systems for making code analyzing harder.

------
general_failure
somebody checked in vim backup files :-)

