Writing a Self-Mutating x86_64 C Program (2013)

simias · on Jan 11, 2019

I've written "self-modifying" (really JITed) code for several architectures, mainly ARM, and I when I had to do it for amd64 I was very much surprised by how straightforward it was.

On ARM you have to be very careful to handle the cache correctly when you write self-modifying code, because when you access memory using a regular load or store it's obviously treated like data and goes through the data cache while the instructions are fetched through the instruction cache. So when you write an opcode you have to be careful to flux it out of the data cache (at least up to the point where the caches unify, typically L2 on ARM) and then invalidate the icache to make sure that you get the opcode back.

On modern x86-64 architectures, which typically have a very advanced cache system, I expected to have to deal with that as well. As this article shows, you don't. You just write whatever and you can execute it straight after. When you think about it it's a rather complicated thing for the hardware to implement. I wonder why they do it that way instead of relying on the software to issue flushes in the (relatively rare) situations where a hazard is possible.

jakobdabo · on Jan 11, 2019

Modern x86s may be Harvard architecture internally (with separate code and data L1 caches), but they still present themselves to the developer as classic Von Neuman, which is easier to program.

simias · on Jan 11, 2019

So did they keep it that way purely for historical reason and back-compat? Few people write self-modifying code (or even code loaders) these days, it seems like a tricky feature to implement in hardware for relatively little gain.

jandrese · on Jan 11, 2019

Depends if the self-modifying code in question is stuff like kernel drivers for popular hardware on widely used operating systems. The x86 manufacturers do bend over backwards for compatibility. You can run 16 bit code on the latest i9 processor if you like, including MSDOS if your motherboard still supports legacy boot modes. That is a remarkable level of backwards compatibility.

kazinator · on Jan 12, 2019

> I expected to have to deal with that as well. As this article shows, you don't.

Because x86 has to be compatible with 80286 and 80386 forever.

If special flushing instructions were suddenly needed for self-modifying code to work right, all that ancient MS-DOS and Windows code would break.

benj111 · on Jan 11, 2019

Are there any languages that could actually make use of self modifying code?

Machines wouldn't have the same problems reasoning about it as humans would.

Or is it a question of compilers not being good enough until processor tech made the optimisation not worth it?

shakna · on Jan 11, 2019

Any _good_ reason? Probably not.

Any use? ... Yes.

Obfuscation. It might be a terrible practice, but has spawned entire languages, including some that have been used on occasion by industry, as well as the puzzle-solving community at large. (Say, integrate it into a `compile --release` flag.)

Right off the back of obfuscation, DRM. If the code modifies itself, especially in unexpected ways, then breaking it becomes harder. (`compile --protect`?)

JoachimS · on Jan 11, 2019

A lot of the later demo effects on the c64 is using so called "speedcode" which basically is code generating code on the fly to achieve things like dynamically copying data areas. In effect intelligent loop unrolling i asm.

http://codebase64.org/doku.php?id=base:speedcode

darkmighty · on Jan 11, 2019

Fascinating! I assume this kind of technique has been obsoleted by branch prediction, or are there imaginable modern applications?

shakna · on Jan 12, 2019

One of the reasons we needed extremely aggressive unrolling was the size/lack of a stack, making recursion or anything that might look like it difficult/limited/impossible, due to needing to store a location to jump back to.

Anything you could do to minimise the amount of things needed to store, and their size, wasn't wasted time.

And though it might very ocassionally be similarly necessary on some chips today, it's more likely you'll work in assembly on those.

saagarjha · on Jan 11, 2019

The real issue with self-modifying code is that it's basically a security bug waiting to happen, plus you have other complications like having to flush the instruction cache on architectures like ARM. Generally, the benefits are not worth it unless except for very specific cases.

AnanasAttack · on Jan 11, 2019

I've seen points like these made before, but never really understood how. Can you give an example of self-modifying code becoming a security issue?

saagarjha · on Jan 11, 2019

Self modifying code is not a security issue in and of itself, but it requires memory that is writable and executable (though, not necessarily at the same time). This is usually a bad idea because it makes arbitrary code execution much easier if your code has unrelated bugs in it, because it provides attackers a place to write shellcode and get it to run.

vbezhenar · on Jan 11, 2019

There's a security feature called W^X [0] (also called DEP in Windows). Basically you can use special mode which prevents memory pages to be writeable and executable at the same time, so self-modifying code is not allowed, but it prevents exploits from modifying memory containing executable code. OpenBSD uses it as well.

0: https://en.wikipedia.org/wiki/W%5EX

Avery3R · on Jan 11, 2019

That's not what DEP is, but it is dependent on DEP. DEP is just another word for the NX bit, that allows a page to be marked as non-executable. With DEP you can still have a page that is RWX if you set its permissions that way.

benj111 · on Jan 11, 2019

Some (most?) of the problems with c stem from not checking array bounds.

Now if you break out of an array bound in read only memory, you cant do much damage, but what happens if you could rewrite the code to do what you want?

Theres also the issue that you can have viruses that hide what they're doing until they actually run, so virus scanners cant pick them up.

I'm no expert. There maybe other classes of attack.

benj111 · on Jan 11, 2019

Yes, I was thinking historical examples. When such things were a concern.

Don't forget there can be code size gains also.

BTW I think all modern processors have instruction caches. In ARM it might not automatically flush?

Someone · on Jan 11, 2019

Before COUs supported stacks, quite a few compilers produced self-modifying code.

Instead of using a stack, many CPUs stored the address to return to in a specific register. If a function wanted to call other functions and return afterwards, it had to store that return address somewhere. Popular solutions where “directly before the start of the function” and “in the jump instruction at the end of the function”. The former is easier for the compiler writer, the latter leads to faster code (the return becomes a simple unconditional jump. With the other approach, you have to load the return address and then jump to it)

And yes, you can’t have reentrant code or even recursion that way. That’s one reason early COBOL and early FORTRAN didn’t support recursion.

benj111 · on Jan 11, 2019

Thanks for that.

Presumably you could have tail call recursion. Although if this is pre stacks, tail call detection is probably asking a bit much.

Someone · on Jan 13, 2019

You could, but I’m not sure the idea had been discovered yet. https://en.wikipedia.org/wiki/Tail_call#History gives 1977 as its first mention (Steele’s ”Debunking the 'Expensive Procedure Call' Myth“ https://dspace.mit.edu/handle/1721.1/5753#files-area)

Also, many tail-calling functions eventually have to return.

woliveirajr · on Jan 11, 2019

I bet it would be the next level (if it doesn't exists already) to be used by top-level agencies, or by bad guys.

Some code already found in the wild couldn't be decrypted (e.g., stuxnet), has obfuscated connection (e.g. https://news.ycombinator.com/item?id=18864895) and is updated as needed (any C&C that sends new modules to the infected machine). Add some capability to self modify and you can have hard times generating signatures ou behaviours that are traced by "antivirus", firewall, and so on.

pjc50 · on Jan 11, 2019

Self-modifying Lisp, perhaps? Do Turing tapes count?

Really the main use of these techniques (and the article has a telltale in its use of the word "shellcode") is injecting code into a program that was not originally intended to be modifiable. Usually (although not always) across a security boundary.

benj111 · on Jan 11, 2019

Turing tape, good point! Now where's my infinitely long piece of paper...

By self modifying lisp, I take it you mean modifying the lisp data (/code? (/data?)) structure itself? On a lisp machine would that count as self modifying???

pjc50 · on Jan 11, 2019

Yes, that was to some extent the point of using the same representation for code and data (homoiconicity) in Lisp - the ability to edit the code from inside itself. I'm not sure how widely this is used outside the macro system and development environments.

masak · on Jan 11, 2019

The (old) MIX programs in The Art of Computer Programming by Knuth were often self-modifying, in the sense that they modified jump addresses at the end of routines to return to the right caller. (That is, they set the address to right after the call.)

The book chose that way to do it because it was standard practice at the time. It quickly fell out of fashion, though. Nowadays, we do it with a call stack, and each stack frame holds a return address. (Which is better anyway, since it allows routines to be re-entrant.)

The newer MMIX architecture (in some of the newer books) don't rely on such self-modification.

benj111 · on Jan 11, 2019

I may have found an esolang that is selfmodifying. Although there are probably many many more.

Befunge:

https://esolangs.org/wiki/Befunge

AnanasAttack · on Jan 11, 2019

>Are there any languages that could actually make use of self modifying code? JIT languages already do this.

saagarjha · on Jan 11, 2019

JIT languages don't use self-modifying code, they generate new code at runtime and execute that.

bloak · on Jan 11, 2019

Sometimes they will also make small changes to code that was generated earlier, typically by changing a branch instruction to point somewhere else.

Dynamic linkers may do that, too, though glibc doesn't normally do it, as far as I know: it prefers to update a pointer to code: same result without needing memory that is both writable and executable and without having to invalidate the instruction cache.

wahern · on Jan 12, 2019

Since the advent of ROP glibc's approach isn't much more secure, though it is saner. A few years ago OpenBSD added the kbind syscall which provides the linker equivalent of W^X. kbind is a kernel-mediated memcpy operation that restricts the code permitted to write to a memory block--specifically, the last piece of code to write to it, which is invariably the linker.

C++ virtual functions are problematic for the same reasons. In C code I've started to avoid function pointers altogether in favor of switch-based dispatch, limiting an attacker to invoking a small, statically defined set of functions, not any arbitrary code in the address space. If I feel the problem demands heavily polymorphic code I'll pull in a scripting language like Lua.

benj111 · on Jan 11, 2019

From my delve into such things, in Linux at least the linker is a binary called before the program runs, so its a separate program potentially modifying the code. Interesting philosophical question whether that's still self modifying though.

benj111 · on Jan 11, 2019

I appear to be getting downvoted.

Assuming its because you think I'm wrong about the separate program. Look up the manpage for ld.so .

If you run the strings program on a dynamically linked program the first thing it spits out should be the path to ld.so

If you run that program without arguments, it even gives you a usage message.

wahern · on Jan 12, 2019

The OP's point, though, was that the glibc linker (ld.so) doesn't modify code; it largely limits itself to modifying tables of pointers. For access to shared symbols the compiler emits code that indirects through these tables, such as through the global offset table (GOT). The machine code itself is mostly generated to use relative offsets, so the linker doesn't usually need to change static opcode operands.

It does this because ELF is a newer, more abstract executable format. By contrast Windows and AIX have evolved an older dynamic linking strategy which depends more heavily on the linker patching address constants embedded in the code, presumably because of the better backward compatibility. I'm too young to have had first-hand experience with the details, but I do vaguely remember the Linux transition from a.out to ELF and it seemed rather disruptive (though it was all magic to me).

But the Windows approach isn't rightly self-modifying code, either. It's more like a delayed compilation stage. Self-modifying code implies code that rewrites itself dynamically during runtime. Runtime normally means in the normal course of regular program execution, as opposed to link time. From the perspective of the code, link time is a one- or two-time event--static linking and, optionally, dynamic linking--that initializes the application code prior to its first run.

pantalaimon · on Jan 11, 2019

> Are there any languages that could actually make use of self modifying code?

Kernel livepatching (and by that ftrace) would come to mind.

benj111 · on Jan 11, 2019

Would they not just write a new patched page, and swap out the original?

Seems less error prone?

nickpsecurity · on Jan 11, 2019

It's almost always a better idea to create something new, verify it was created successfully, do any syncing, switchover, optionally re-check for correct syncing, and then delete or take down original. That's pretty much how robust clustering works.

pantalaimon · on Jan 11, 2019

As I understand it they make the compiler add a few noops in the beginning of each non-static function.

Then to add tracing or change the behavior of the function, the noops are overwritten by a jump to the new code.

monocasa · on Jan 11, 2019

Microsoft has stubs for self modifying code in their libraries to patch over compat issues.

https://blogs.msdn.microsoft.com/oldnewthing/20110921-00/?p=...

zozbot123 · on Jan 11, 2019

It's not hard to reason about from a formal POV-- by "self-modifying" code, you're basically editing the continuation of the current function/statement. It's just not very useful-- by definition, the extra performance that you might gain from directly self-modifying the program's binary code can't be any higher than the overhead of a good interpreter/JIT. That's just not interesting.

SomeHacker44 · on Jan 11, 2019

LISP macros are compile time self modifying code. So are many other macro systems.

kazinator · on Jan 11, 2019

Lisp macros are not self-modifying code in any shape or form. They calculate new syntax tree fragments from existing syntax tree fragments, often using purely functional techniques (no mutation at all, not even of local variables in the macro).

The compiler internals of pretty much any programming language do similar tree to tree transformations: just not ones that the program itself can specify as part of its code. We don't call C self-modifying because a for(;;) loop structure was changed by the compiler into if/goto with generated labels, and then changed again into assembly code.

A macro that mutates the input source code on which it operates wouldn't pass a code review in any competent Lisp shop.

It can happen. For instance (setf (car '(1 2)) 2) is self-modifying code; it tries to replace the 1 with a 2, and that 1 is embedded in the program code itself, in a literal list. This is undefined behavior according to ANSI CL, very similarly to how "abc"[1]++ (modifying a string literal) is undefined behavior in C. Note that this code isn't implementing a macro.

I've never heard of compiled Lisp object code being mutated.

Lisp programs as such can be self-modifying; a common example of that is updating while running: loading new versions of existing modules, replacing old functions with new ones. That doesn't involve modifying code.

SomeHacker44 · on Jan 15, 2019

While your use case (creation of new syntax tree in a deterministic fashion) is definitely the most common use case of Common Lisp macros - it is not the only one. Common Lisp macros are literally code executed at compile time on the source AST. It needs not make a deterministic change to that AST, needs not be side-effect free, and needs not use only and solely the context of its input AST to output the final AST.

Doesn't mean I suggest doing this, but please do not sell unhygenic CL macros short. :) In fact, in CL one has to go to some lengths to be careful to make macro outputs "safe."

lispm · on Jan 15, 2019

If macros are running at compile time, then we don't think that this particular code is self modifying.

If we run code in a Lisp interpreter, then one can write self-modifying code. It's possible to get visible effects. It's also possible to use it during debugging code.

kazinator · on Jan 15, 2019

Regardless of any lack of referential hygiene, or reliance on side-effects such that the same piece of code is not expanded the same way twice, CL macros must not mutate the input AST, period. It is undefined behavior, and easily avoidable.

benj111 · on Jan 11, 2019

Yes, but so are C macros.

I mentioned lisp machines in a prior answer, I don't know enough about them to say whether they would allow self modifying lisp code though.

rootbear · on Jan 11, 2019

I once read about a clever use of self-modifying x86 code. The 8086 and 8088 are nearly identical chips, with the difference being that the 8086 has 16-bit I/O and the 8088 has 8-bit. The only way for a program to know which chip it's running on is to write a bit of self modifying code that takes advantage of this difference in I/O size. Both chips use prefetch, but they prefetch words, not bytes, and 8086 words are 16-bit, so it fetches twice as many bytes as the 8088. Thus, one can modify a location in RAM just after the current instruction and that change will be seen on the 8088, but not the 8086, which has already prefetched the previous value.

This is all from memory of something I read, probably on Usenet, ages ago. My apologies in advance if I messed up the details.

I wonder if any of the various x86 emulators out there get this difference right.

tyingq · on Jan 11, 2019

I think you probably read it in Dr Dobbs: http://www.drdobbs.com/embedded-systems/processor-detection-...

Not exactly what you described, but very similar:

"Differentiating between 8088s and 8086s is trickier. The easiest way I've found to do it is to modify code that's five bytes ahead of IP. Since the prefetch queue of an 8088 is four bytes and the prefetch queue of an 8086 is six bytes, an instruction five bytes ahead of IP won't have any effect on an 8086 the first time around"

vbezhenar · on Jan 11, 2019

Interesting! Does this effect work on modern CPU? I'd imagine that if CPU modifies memory which was consumed by prefetcher, it should reset and reread everything or something like this.

tyingq · on Jan 12, 2019

I don't think modern prefetch is linear/fifo.

rootbear · on Jan 11, 2019

Ah, thanks, you're probably right about where I got it from.

mempko · on Jan 11, 2019

This is super cool. Though makes me want to go back and write some lisp.

crimsonalucard · on Jan 11, 2019

Isn't this technique used for viruses to escape detection?

Avery3R · on Jan 11, 2019

I don't understand why people still use AT&T syntax asm. It's so much harder to read than Intel

snek · on Jan 11, 2019

I was on board until `PROT_READ | PROT_WRITE | PROT_EXEC`

orclev · on Jan 11, 2019

I mean, how else are you going to write self-modifying code? It's kind of implicit in the name of the thing that it's going to need to at a minimum have write and exec permission, and read is kind of important to make sure you're actually modifying the right stuff. Yes it's a terrible idea to have write and execute permissions on the same piece of memory, but that's part of why literally the very first line of the article says it's a terrible idea to write self-mutating code.