

Clever ARM instruction validation for Chrome Native Client - unwiredben
http://www.chromium.org/nativeclient/reference/arm-overview

======
nickzoic
Not knowing anything about ARM, I didn't get how this bit is supposed to work:

    
    
        We enforce this rule by restricting the sorts of operations that 
        programs can use to alter sp.   Programs can alter sp by adding or 
        subtracting an immediate, as a side-effect of a load or store:
        ldr  rX,  [sp],  #4!        ; loads from stack, then adds 4 to sp
    
        These are safe because, as we mentioned before, the largest 
        immediate available in a load or store is ±4095.  Even after adding or 
        subtracting 4095, the stack pointer will still be within the sandbox or 
        guard regions.
    

I get the idea of the guard protecting against the biggest immediate offset,
but what stops me doing an SP-updating LDR with a big offset multiple times,
pushing SP beyond my "safe" memory segment?

EDIT: I guess I might be taking:

    
    
        Any other operation that alters sp must be followed by a guard   
        instruction.
    

too precisely, and you could just follow every ldr which writes back SP with a
BIC too. Maybe I'm missing the point.

EDIT2: Wait, wait, I get it now. Once the stack pointer is in the guard area,
the CPU faults if you do another LDR. Don't mind me!

~~~
jbri
Because you'll fault in the guard zone on one of those memory accesses, and
then your app will get killed.

EDIT: No, you don't need to check _after_ a stack memory access either. The
farthest you can get out of the sandbox is an access-then-adjust of 4095
bytes, followed by an adjust-then-access of 4095 bytes, which means you access
the very tail end (offset 8190) of the guard zone, fault, and die.

~~~
nickzoic
Thanks, I just realized that ... duh. I'd been imagining the guard regions as
a "passive guard", eg: an area where nothing important is, rather than as an
"active guard", eg: where you fault if you read/write.

------
pnp
I see a lot of interesting techniques here. I couldn't figure out is how
writes are prevented to code areas in the sandbox. I'd guess they mark pages
with code-bundles as read-only but I don't see any specific mention of it.

(The article does mention that the guard pages are set to no
read/write/execute)

~~~
pjscott
The trampolines are located in a segment of the address space which is marked
as read-only, presumably by the MMU.

~~~
jbri
The validator also needs to prevent modifications to the program's own code,
otherwise it could, say, remove the breakpoint instruction from the start of a
data bundle.

------
wmf
Turning every load or store from one instruction into two sounds slow; pNaCl
can't arrive soon enough.

~~~
pjscott
It sounds slow, but think about what's going on in the processor. Memory
access instructions are _far_ slower than simple arithmetic instructions. As
for the conditional nature of the load instructions, I'm sure that the
Cortex-A9 branch predictor will make the overhead from that pretty close to
negligible. And there's probably something similar on the Cortex-A8, though I
haven't checked.

In other words, this is a lot less slow than it sounds.

~~~
mansr
The branch predictor is not involved here since there is no branch. A
conditional non-branch instruction is, on most implementations, scheduled
identically to the unconditional base instruction. A load/store instruction
whose condition passes thus executes identically to an unconditional one. If
the condition fails, it schedules as though it had hit L1 cache.

The main problem I see here is significantly increased code size, which will
put additional pressure on the L1 I-cache.

