
Writing a Self-Mutating x86_64 C Program (2013) - Cieplak
https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
======
simias
I've written "self-modifying" (really JITed) code for several architectures,
mainly ARM, and I when I had to do it for amd64 I was very much surprised by
how straightforward it was.

On ARM you have to be very careful to handle the cache correctly when you
write self-modifying code, because when you access memory using a regular load
or store it's obviously treated like data and goes through the data cache
while the instructions are fetched through the instruction cache. So when you
write an opcode you have to be careful to flux it out of the data cache (at
least up to the point where the caches unify, typically L2 on ARM) and then
invalidate the icache to make sure that you get the opcode back.

On modern x86-64 architectures, which typically have a very advanced cache
system, I expected to have to deal with that as well. As this article shows,
you don't. You just write whatever and you can execute it straight after. When
you think about it it's a rather complicated thing for the hardware to
implement. I wonder why they do it that way instead of relying on the software
to issue flushes in the (relatively rare) situations where a hazard is
possible.

~~~
jakobdabo
Modern x86s may be Harvard architecture internally (with separate code and
data L1 caches), but they still present themselves to the developer as classic
Von Neuman, which is easier to program.

~~~
simias
So did they keep it that way purely for historical reason and back-compat? Few
people write self-modifying code (or even code loaders) these days, it seems
like a tricky feature to implement in hardware for relatively little gain.

~~~
jandrese
Depends if the self-modifying code in question is stuff like kernel drivers
for popular hardware on widely used operating systems. The x86 manufacturers
do bend over backwards for compatibility. You can run 16 bit code on the
latest i9 processor if you like, including MSDOS if your motherboard still
supports legacy boot modes. That is a remarkable level of backwards
compatibility.

------
benj111
Are there any languages that could actually make use of self modifying code?

Machines wouldn't have the same problems reasoning about it as humans would.

Or is it a question of compilers not being good enough until processor tech
made the optimisation not worth it?

~~~
saagarjha
The real issue with self-modifying code is that it's basically a security bug
waiting to happen, plus you have other complications like having to flush the
instruction cache on architectures like ARM. Generally, the benefits are not
worth it unless except for very specific cases.

~~~
AnanasAttack
I've seen points like these made before, but never really understood how. Can
you give an example of self-modifying code becoming a security issue?

~~~
vbezhenar
There's a security feature called W^X [0] (also called DEP in Windows).
Basically you can use special mode which prevents memory pages to be writeable
and executable at the same time, so self-modifying code is not allowed, but it
prevents exploits from modifying memory containing executable code. OpenBSD
uses it as well.

0: [https://en.wikipedia.org/wiki/W%5EX](https://en.wikipedia.org/wiki/W%5EX)

~~~
Avery3R
That's not what DEP is, but it is dependent on DEP. DEP is just another word
for the NX bit, that allows a page to be marked as non-executable. With DEP
you can still have a page that is RWX if you set its permissions that way.

------
rootbear
I once read about a clever use of self-modifying x86 code. The 8086 and 8088
are nearly identical chips, with the difference being that the 8086 has 16-bit
I/O and the 8088 has 8-bit. The only way for a program to know which chip it's
running on is to write a bit of self modifying code that takes advantage of
this difference in I/O size. Both chips use prefetch, but they prefetch words,
not bytes, and 8086 words are 16-bit, so it fetches twice as many bytes as the
8088. Thus, one can modify a location in RAM just after the current
instruction and that change will be seen on the 8088, but not the 8086, which
has already prefetched the previous value.

This is all from memory of something I read, probably on Usenet, ages ago. My
apologies in advance if I messed up the details.

I wonder if any of the various x86 emulators out there get this difference
right.

~~~
tyingq
I think you probably read it in Dr Dobbs: [http://www.drdobbs.com/embedded-
systems/processor-detection-...](http://www.drdobbs.com/embedded-
systems/processor-detection-schemes/184409011)

Not exactly what you described, but very similar:

 _" Differentiating between 8088s and 8086s is trickier. The easiest way I've
found to do it is to modify code that's five bytes ahead of IP. Since the
prefetch queue of an 8088 is four bytes and the prefetch queue of an 8086 is
six bytes, an instruction five bytes ahead of IP won't have any effect on an
8086 the first time around"_

~~~
vbezhenar
Interesting! Does this effect work on modern CPU? I'd imagine that if CPU
modifies memory which was consumed by prefetcher, it should reset and reread
everything or something like this.

~~~
tyingq
I don't think modern prefetch is linear/fifo.

------
mempko
This is super cool. Though makes me want to go back and write some lisp.

------
crimsonalucard
Isn't this technique used for viruses to escape detection?

------
Avery3R
I don't understand why people still use AT&T syntax asm. It's so much harder
to read than Intel

------
snek
I was on board until `PROT_READ | PROT_WRITE | PROT_EXEC`

~~~
orclev
I mean, how else are you going to write self-modifying code? It's kind of
implicit in the name of the thing that it's going to need to at a minimum have
write and exec permission, and read is kind of important to make sure you're
actually modifying the right stuff. Yes it's a terrible idea to have write and
execute permissions on the same piece of memory, but that's part of why
literally the very first line of the article says it's a terrible idea to
write self-mutating code.

