On ARM you have to be very careful to handle the cache correctly when you write self-modifying code, because when you access memory using a regular load or store it's obviously treated like data and goes through the data cache while the instructions are fetched through the instruction cache. So when you write an opcode you have to be careful to flux it out of the data cache (at least up to the point where the caches unify, typically L2 on ARM) and then invalidate the icache to make sure that you get the opcode back.
On modern x86-64 architectures, which typically have a very advanced cache system, I expected to have to deal with that as well. As this article shows, you don't. You just write whatever and you can execute it straight after. When you think about it it's a rather complicated thing for the hardware to implement. I wonder why they do it that way instead of relying on the software to issue flushes in the (relatively rare) situations where a hazard is possible.
Because x86 has to be compatible with 80286 and 80386 forever.
If special flushing instructions were suddenly needed for self-modifying code to work right, all that ancient MS-DOS and Windows code would break.
Machines wouldn't have the same problems reasoning about it as humans would.
Or is it a question of compilers not being good enough until processor tech made the optimisation not worth it?
Any use? ... Yes.
Obfuscation. It might be a terrible practice, but has spawned entire languages, including some that have been used on occasion by industry, as well as the puzzle-solving community at large. (Say, integrate it into a `compile --release` flag.)
Right off the back of obfuscation, DRM. If the code modifies itself, especially in unexpected ways, then breaking it becomes harder. (`compile --protect`?)
Anything you could do to minimise the amount of things needed to store, and their size, wasn't wasted time.
And though it might very ocassionally be similarly necessary on some chips today, it's more likely you'll work in assembly on those.
Now if you break out of an array bound in read only memory, you cant do much damage, but what happens if you could rewrite the code to do what you want?
Theres also the issue that you can have viruses that hide what they're doing until they actually run, so virus scanners cant pick them up.
I'm no expert. There maybe other classes of attack.
Don't forget there can be code size gains also.
BTW I think all modern processors have instruction caches.
In ARM it might not automatically flush?
Instead of using a stack, many CPUs stored the address to return to in a specific register. If a function wanted to call other functions and return afterwards, it had to store that return address somewhere. Popular solutions where “directly before the start of the function” and “in the jump instruction at the end of the function”. The former is easier for the compiler writer, the latter leads to faster code (the return becomes a simple unconditional jump. With the other approach, you have to load the return address and then jump to it)
And yes, you can’t have reentrant code or even recursion that way. That’s one reason early COBOL and early FORTRAN didn’t support recursion.
Presumably you could have tail call recursion. Although if this is pre stacks, tail call detection is probably asking a bit much.
Also, many tail-calling functions eventually have to return.
Some code already found in the wild couldn't be decrypted (e.g., stuxnet), has obfuscated connection (e.g. https://news.ycombinator.com/item?id=18864895) and is updated as needed (any C&C that sends new modules to the infected machine). Add some capability to self modify and you can have hard times generating signatures ou behaviours that are traced by "antivirus", firewall, and so on.
Really the main use of these techniques (and the article has a telltale in its use of the word "shellcode") is injecting code into a program that was not originally intended to be modifiable. Usually (although not always) across a security boundary.
By self modifying lisp, I take it you mean modifying the lisp data (/code? (/data?)) structure itself? On a lisp machine would that count as self modifying???
The book chose that way to do it because it was standard practice at the time. It quickly fell out of fashion, though. Nowadays, we do it with a call stack, and each stack frame holds a return address. (Which is better anyway, since it allows routines to be re-entrant.)
The newer MMIX architecture (in some of the newer books) don't rely on such self-modification.
Dynamic linkers may do that, too, though glibc doesn't normally do it, as far as I know: it prefers to update a pointer to code: same result without needing memory that is both writable and executable and without having to invalidate the instruction cache.
C++ virtual functions are problematic for the same reasons. In C code I've started to avoid function pointers altogether in favor of switch-based dispatch, limiting an attacker to invoking a small, statically defined set of functions, not any arbitrary code in the address space. If I feel the problem demands heavily polymorphic code I'll pull in a scripting language like Lua.
Assuming its because you think I'm wrong about the separate program. Look up the manpage for ld.so .
If you run the strings program on a dynamically linked program the first thing it spits out should be the path to ld.so
If you run that program without arguments, it even gives you a usage message.
It does this because ELF is a newer, more abstract executable format. By contrast Windows and AIX have evolved an older dynamic linking strategy which depends more heavily on the linker patching address constants embedded in the code, presumably because of the better backward compatibility. I'm too young to have had first-hand experience with the details, but I do vaguely remember the Linux transition from a.out to ELF and it seemed rather disruptive (though it was all magic to me).
But the Windows approach isn't rightly self-modifying code, either. It's more like a delayed compilation stage. Self-modifying code implies code that rewrites itself dynamically during runtime. Runtime normally means in the normal course of regular program execution, as opposed to link time. From the perspective of the code, link time is a one- or two-time event--static linking and, optionally, dynamic linking--that initializes the application code prior to its first run.
Kernel livepatching (and by that ftrace) would come to mind.
Seems less error prone?
Then to add tracing or change the behavior of the function, the noops are overwritten by a jump to the new code.
The compiler internals of pretty much any programming language do similar tree to tree transformations: just not ones that the program itself can specify as part of its code. We don't call C self-modifying because a for(;;) loop structure was changed by the compiler into if/goto with generated labels, and then changed again into assembly code.
A macro that mutates the input source code on which it operates wouldn't pass a code review in any competent Lisp shop.
It can happen. For instance (setf (car '(1 2)) 2) is self-modifying code; it tries to replace the 1 with a 2, and that 1 is embedded in the program code itself, in a literal list. This is undefined behavior according to ANSI CL, very similarly to how "abc"++ (modifying a string literal) is undefined behavior in C. Note that this code isn't implementing a macro.
I've never heard of compiled Lisp object code being mutated.
Lisp programs as such can be self-modifying; a common example of that is updating while running: loading new versions of existing modules, replacing old functions with new ones. That doesn't involve modifying code.
Doesn't mean I suggest doing this, but please do not sell unhygenic CL macros short. :) In fact, in CL one has to go to some lengths to be careful to make macro outputs "safe."
If we run code in a Lisp interpreter, then one can write self-modifying code. It's possible to get visible effects. It's also possible to use it during debugging code.
I mentioned lisp machines in a prior answer, I don't know enough about them to say whether they would allow self modifying lisp code though.
This is all from memory of something I read, probably on Usenet, ages ago. My apologies in advance if I messed up the details.
I wonder if any of the various x86 emulators out there get this difference right.
Not exactly what you described, but very similar:
"Differentiating between 8088s and 8086s is trickier. The easiest way I've found to do it is to modify code that's five bytes ahead of IP. Since the prefetch queue of an 8088 is four bytes and the prefetch queue of an 8086 is six bytes, an instruction five bytes ahead of IP won't have any effect on an 8086 the first time around"