
W^X now mandatory in OpenBSD - fcambus
http://undeadly.org/cgi?action=article&sid=20160527203200
======
byuu
I've always been in favor of all OpenBSD security enhancements I've seen, but
I have to say, and please hear me out, this is an objectively terrible idea.

Yes, most programs should disallow W|X by default. But trying to banish the
entire practice with a mount flag, knowing full well few people will go that
far to run a W|X application, is bad practice. I'd rather see this as another
specialty chmod flag ala SUID, SGID, etc. Or something along those lines. One
shouldn't have to enable filesystem-wide W|X just to run one application.

The thing is, when you actually _do_ need W|X, there is no simple workaround.
Many emulators and JITs _need_ to be able to dynamically recompile
instructions to native machine code to achieve acceptable performance
(emulating a 3GHz processor is just not going to happen with an interpreter.)
For a particularly busy dynamic recompiler, having to constantly call mprotect
to toggle the page flags between W!X and X!W will impact performance too
greatly, since that is a syscall requiring a kernel-level transition.

We also have app stores banning the use of this technique as well. This is a
very troubling trend lately; it is throwing the baby out with the bathwater.

EDIT: tj responded to me on Twitter: "the per-mountpoint idea is just an
initial method; it'll be refined as time goes on. i think per-binary w^x is in
the pipeline." \-- that will not only resolve my concerns, but in fact would
be my ideal design to balance security and performance.

~~~
yoklov
> One shouldn't have to enable filesystem-wide W|X just to run one application

This is a very good point, and it seems like enabling it more granularly would
be worth it.

But, for your second point, I have trouble believing that switching between
W!X and X!W is going to happen frequently enough to be substantial for
performance. (It was very small in Firefox's JavaScript JIT, causing something
like a 1%-4% performance hit depending on system).

~~~
byuu
An emulator is a very different use case than Firefox's JIT.

Take Dolphin (the Gamecube / Wii emulator) for example: you have a 4GiB DVD,
it has a massive 80-hour game's entire code engine on there. You cannot
recompile the entire disc at startup. Even if you could (you really can't),
these games tend to push code into RAM to execute, which a static recompiler
cannot handle.

The way dynamic recompilers handle the tremendous burden is to recompile small
blocks at a time, and track which ones are hot and cold. When the buffers fill
up, they start dropping old, stale code.

It's hard to say exactly what the performance impact would be, and it'd
probably vary per game title. But it'd be a lot worse than a web browser
recompiling jQuery plus another 200KiB of custom Javascript once on page load.

Plus, I am sure there are many more uses cases for W|X than just emulators and
JITs. It would be a shame to try and eradicate them all from existence.

~~~
lisivka
Map page two times at separate addresses: for executing and for writing. See
[http://nullprogram.com/blog/2016/04/10/](http://nullprogram.com/blog/2016/04/10/)
for details.

~~~
yoklov
Extremely clever. Wouldn't this still violate W^X though? I would have thought
the OS would see through a trick like this.

~~~
nanofortnight
It doesn't violate W^X, because the same virtual memory address isn't writable
and executable at the same time.

------
cranium
For those heading into the comments to know what this is about: W^X is a
protection policy on memory with the effect that every page in memory can
either be written or executed but not both simultaneously (Write XOR eXecute).
It can prevent, for example, some buffer overflow attacks.

~~~
CharlesMerriam2
I guess I've never understood this. The W^X is a response to one of the
classic Multics paper attacks, but doesn't actually work well in a world where
JavaScript, JVM byte code, and other non-native code is "sort-of executable".
Why not put the effort into a great MMU so that every allocation is its own
"page" for the purpose of triggering page faults on overruns?

~~~
monocasa
> Why not put the effort into a great MMU so that every allocation is its own
> "page" for the purpose of triggering page faults on overruns?

We used to have that, they're called segments.

~~~
twic
Only one processor did it like it really meant it:

[https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-
oriented...](https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-
oriented_memory_and_capabilities)

~~~
nickpsecurity
Look up i960. Had most key parts with otherwise normal RISC.

~~~
twic
It seems to have been the i960MX and i960MC which had the funky object-
oriented memory; they have long since been discontinued, and i can't find any
documentation about them online, sadly.

EDIT: There's a passing mention in a book [1]

> The Intel i960 extended architecture processor used a tagged architecture
> with a bit on each memory word that marked the word as a "capability", not
> as an ordinary location for data or instructions. A capability controlled
> access to a variable-sized memory block or segment. The large number of
> possible tag values supported memory segments that ranged in size from 64 to
> 4 billion bytes, with a potential 2^256 different protection domains.

[1] [https://books.google.co.uk/books?id=O3VB-
zspJo4C&pg=PA189#v=...](https://books.google.co.uk/books?id=O3VB-
zspJo4C&pg=PA189#v=onepage&q&f=false)

~~~
nickpsecurity
Type i960 wikipedia into Google. It has a description plus a link to ISA
datasheet in references section. About enough info to recreate it.

EDIT: Now that I'm on a PC I've added the link here for you. Copy and paste of
PDF's is a b __ __on my mobile. ;)

[https://en.wikipedia.org/wiki/Intel_i960](https://en.wikipedia.org/wiki/Intel_i960)

[http://bitsavers.org/pdf/biin/BiiN_CPU_Architecture_Referenc...](http://bitsavers.org/pdf/biin/BiiN_CPU_Architecture_Reference_Man_Jul88.pdf)

~~~
twic
Oh, brilliant, thanks. I suspect this evaded my searches because it doesn't
say '960mx' in any machine-readable way!

There's some fascinating stuff in here. For example:

> 8.4 Object Lifetime

> To support the implicit deallocation of certain objects while preventing
> dangling references, the object lifetime concept is supported. The lifetime
> of an object can be local or global. "Local" objects have a lifetime that is
> tied to a particular program execution environment [...]. "Global" objects
> are not associated with a particular execution environment.

> Each job has a distinct set of local objects. No two jobs can have ADs
> [access descriptors] that reference the same local object. The processor
> does not allow an AD for a local object to be stored in a global object.
> Thus, when a job terminates, all the local objects associated with a job can
> be safely deallocated, and there cannot be any dangling pointers.

The machine was designed, to run Ada programs. Does this line up with how
memory management works in Ada? It certainly resembles how it works in Rust!

~~~
nickpsecurity
Ada was designed for environments where dynamic allocation wasn't allowed. So,
it doesn't have safe scheme for that. It offers regions, GC, or fast/unsafe
deallocation. This is one area where Rust has dramatic improvement. I agree
i960 is reminiscent of schemes like Rust. Very forward thinkjng.

Btw, since Im on mobile w/out links, type these into Google: capability
computer systems book Levy; Army Secure Operatimg System ASOS. First is great
book with many capability architectures like i432. Second is another HW and OS
combo designed as secure foundation for embedded, Ada apps. It was
interesting.

------
sillysaurus3
This paper's thesis is that W^X does not work, and not because of any of the
reasons presented in this thread:
[https://cseweb.ucsd.edu/~hovav/dist/geometry.pdf](https://cseweb.ucsd.edu/~hovav/dist/geometry.pdf)

The paper says that to bypass W^X protection, you can simply scan an
executable for "the instruction you want to use, followed by a RET". The paper
calls these "gadgets."

You can write any function you want by using these gadgets: simply call them.
When you call a gadget, it executes the corresponding instruction, then
returns. This allows you to write arbitrary functions, since real-world
programs are large enough that they have a massive number of gadgets for you
to choose from.

Can someone provide a counterargument?

~~~
cosarara97
How do you scan the executable when the memory layout is randomised?

EDIT:
[https://news.ycombinator.com/item?id=11789653](https://news.ycombinator.com/item?id=11789653)

~~~
tptacek
With any program behavior or bug that leaks memory layout. They're common.

------
jtchang
Does this mean to successfully exploit a program I need to write to an area in
memory that the program will later turn the page in memory to "Execute"?

~~~
PeCaN
Yep! That's the idea here: it makes it less likely to accidentally treat data
as instructions. If you can figure out some bug in a JIT compiler to get it to
write your payload to a page that will get turned executable, then you still
have an exploit, but the attack surface is smaller.

You _can_ try return-oriented programming, but OpenBSD makes that hard: the
stack is never executable, everything is position independent, objects get
shuffled around inside libraries and executables¹, maybe more.

¹ At least this was proposed, and I _think_ it landed in one of the newer
releases, but I can't seem to find it for sure.

~~~
notaplumber
Yep: [http://marc.info/?l=openbsd-
cvs&m=146168291000757&w=2](http://marc.info/?l=openbsd-
cvs&m=146168291000757&w=2)

~~~
PeCaN
Thanks, that's exactly what I was thinking of. Does it re-link only libc on
startup, or other libraries as well?

Found the slide that I remembered:
[https://twitter.com/justincormack/status/577005049374601217](https://twitter.com/justincormack/status/577005049374601217)

Do you know if any of the rest of that is in-progress or complete?

~~~
notaplumber
My observations..

* binutils 2.17 (last GPLv2) - complete.

* all archs transitioned to pie - complete as of 5.7.

* /sbin/init static pie - complete.

* ROP gadgets are being hunted down systematically, many nop sleds changed to int3/illegal instr.

* BROP - many base daemons now fork + exec, not all.

* SROP - mitigation committed - complete?

* shuffle - libc objects get shuffled at boot by rc(8), stack object order shuffling as a gcc extension.

..

Progress continues.

------
nightcracker
What about JIT compilation and other forms of code generation?

~~~
dchest
Firefox works fine with it: [http://jandemooij.nl/blog/2015/12/29/wx-jit-code-
enabled-in-...](http://jandemooij.nl/blog/2015/12/29/wx-jit-code-enabled-in-
firefox/)

 _After that, the performance overhead was pretty small on all benchmarks and
websites I tested: Kraken and Octane are less than 1% slower with W^X enabled.
On (ancient) SunSpider the overhead is bigger, because most tests finish in a
few milliseconds, so any compile-time overhead is measurable. Still, it 's
less than 3% on Windows and Linux. On OS X it's less than 4% because mprotect
is slower there._

------
bch
NetBSD is going through some similar security moves currently (extending
PaX[0]), and iiuc, there are special considerations required for Java/jvm,
because of the bytecoding process. Does anybody know if my understanding is
correct (that a page will have to be both writable and executable) and if so,
what are OpenBSDs considerations for this ?

[0] [http://mail-index.netbsd.org/current-
users/2016/05/15/msg029...](http://mail-index.netbsd.org/current-
users/2016/05/15/msg029374.html)

~~~
koenigdavidmj
Allocate the page write-only. Write write write. Change the page to
executable-only. Run.

Firefox already got this treatment for its JavaScript compiler.

------
malkia
I dunno why, but this quote from Benjamin Fraklin came to m mind - “Those who
surrender freedom for security will not have, nor do they deserve, either
one.”

i'm just kiddin ;)

------
fithisux
Can someone provide an introduction for dummies like me?

~~~
terminalcommand
I am not an expert on the field. But | means or and ^ means execute.

In the traditional W(rite)|(or)E(xecute) model, the processes can both write
and execute instructions in their address space in memory. I might be wrong
but I think this could result in security leaks, because you cannot determine
what the code evolves into at execution time.

W(rite)^(xor)E(xecute) model doesn't allow the processes to write AND execute
instructions at the same time.

The wikipedia page states:

    
    
      W^X requires using the CS code segment limit as a "line in the sand",
      a point in the address space above which execution is not permitted and data is located,
      and below which it is allowed and executable pages are placed.
    

This means using W^X is an improvement for security, but it's hard to adapt to
from a technical viewpoint and limits your computer's ability in meta-
programming.

For further information:

[https://en.wikipedia.org/wiki/W%5EX](https://en.wikipedia.org/wiki/W%5EX)

[http://marc.info/?l=openbsd-
misc&m=105056000801065](http://marc.info/?l=openbsd-misc&m=105056000801065)

Disclaimer: I myself am a beginner in the field. Apologies for
oversimplification and possible faulty assessments.

------
anfroid555
Anyone know if Erlang is good?

~~~
yellowapple
I haven't tried it on -current yet, but I don't recall ever hearing about BEAM
(Erlang's VM) being affected by this.

It probably helps that BEAM inherits a lot of Erlang semantics like
immutability and process isolation; I reckon this eliminates a lot of the need
for a given portion of memory to be both writable and executable at the same
time.

