
Executing an array as a function - kahlonel
http://kahlonel.io/executing-array-as-a-function/
======
Someone
That can be done easier. You don’t (or at least, you didn’t back in 1984) even
have to write a function:

    
    
      short main[] = {
        277, 04735, -4129, 25, 0, 477, 1019, 0xbef, 0, 12800,
        -113, 21119, 0x52d7, -1006, -7151, 0, 0x4bc, 020004,
        14880, 10541, 2056, 04010, 4548, 3044, -6716, 0x9,
        4407, 6, 5568, 1, -30460, 0, 0x9, 5570, 512, -30419,
        0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0,
        4, 0, 0, 0, 0x8, 0, 4, 0, ',', 0, 12, 0, 4, 0, '#',
        0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712,
        'p', 072163, 'r', 29303, 29801, 'e'
      };
    

(example is for Vax-11 or pdp-11, and courtesy of
[http://www.ioccc.org/1984/mullender/mullender.c](http://www.ioccc.org/1984/mullender/mullender.c))

~~~
Animats
Would not have worked on the PDP 11/45, which had separate address spaces for
code and data.

------
lfowles
Very similar, here's main() as an array

[http://jroweboy.github.io/c/asm/2015/01/26/when-is-main-
not-...](http://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-
function.html)

------
Zarathust
Good find. I'll suggest "shellcode" as a name for this trick

~~~
zokier
Or less exploity "dlopen".

Sure, dlopen does tons of more stuff behind the scenes, but ultimately it is
about loading bunch of bytes into memory and executing those as functions.

~~~
ethnic_throw
> Sure, dlopen does tons of more stuff behind the scenes, but ultimately it is
> about loading bunch of bytes into memory and executing those as functions.

That's pretty much mprotect, though. dlopen _mostly_ does the other things,
and it does a lot.

------
mattbierner
I once (ab)used this approach to express x86 assembly directly in C++:
[https://github.com/mattbierner/Template-
Assembly](https://github.com/mattbierner/Template-Assembly) The library uses
standard C++ syntax to build up complex templated types that represent the
assembly code. These types are then passed through a simple templated
"assembler" at compile time, which spits out a c string of machine code which
you can invoke at runtime.

Now if only someone would write a compiletime C compiler in C++ templates...

------
flavio81
_> There was a section in the paper where a function apparently ran machine
code placed somewhere in data memory._

I have some home computer books from the late 70s/early 80s that use this
trick extensively.

On the era where home computers had only slow BASIC interpreters and no
assemblers (or compilers), the usual way for speed up was to type in a long
sequence of numbers (or characters) that were actually a machine language
program.

So you have a line like:

1000 DATA 100,32,65,12,44,32,52,11,255,12,55,22

and on and on, which hold a sequence of bytes (the machine language program)

and later, READ statements would read each byte (of the machine language code)
and POKE them into memory, that is, write it into a specific address of the
RAM...

... later you CALL to that specific address, which basically instructs the
BASIC interpreter to "jump" to the machine language code at that location.

~~~
pvg
It's not quite the same trick since this was a fairly standard use of DATA and
you control the exact absolute location of the code. There are also no address
space protections of any kind.

------
daotoad
I prototyped this sort of thing for an 8051 based project I was working on
where we had more RAM and external EEPROM space than we really needed, but
were regularly having to go back and grovel through the code looking for
optimizations to make things fit in code memory.

The important thing to know is that 8051s have separate CODE and RAM address
spaces. (Actually the RAM is divided up into multiple flavors too: direct,
indirect, external and bit addressable.)

It turned out it wasn't worth the overhead. The interpreter and ancillary bits
took up too much space and slowed things to a crawl. It was generally easier
to rewrite sections of the code in a way that made the compiler and optimizer
happy. By various techniques I managed to reduce the footprint of the system
code by at least fourfold after a number of refactorings. The application kept
needing new features, so it always barely fit in 16 code address space (some
of which was already consumed by the bootloader).

------
RcouF1uZ4gsC
> -fno-stack-protector

Enough said.

~~~
kbenson
From the article:

 _Remember in our scenario, we need the file to compile with a gcc on a 64 bit
system, without any special modifcations to the compiler flags, so that means
there is no special compile flags, nor can we include any custom linking steps
and we want to use GCC inline AT &T syntax._

Edit: whoops, that was from a link to a similar thing in the comments. I
apparently missed the real article, which I'm off to read now.

------
nemo1618
I wonder if the same trick is possible in Go. For starters, you can inspect
the guts of a function pointer (and dump them to a []byte, and re-execute the
function elsewhere):
[https://play.golang.org/p/kpSps1GC3e](https://play.golang.org/p/kpSps1GC3e)

This is a bit different from constructing the function "from scratch," though.
I tried to inspect the guts of the function body itself, but was met with lots
of segfaults.

~~~
zzzcpan
It is possible, but Go functions are not C functions:
[https://golang.org/s/go11func](https://golang.org/s/go11func)

~~~
nemo1618
neat resource, thanks

------
Animats
The executable bit is turned on for writable data space pages?

x86 will allow this if the OS does. There have been many CPUs which don't
allow it, such as PowerPC, which had separate instruction and data caches.
After loading code, the loader had to make the pages executable and cause a
cache flush before the code could run.

~~~
kahlonel
Yes, both gcc flags "-fno-stack-protector" and "-zexecstack" allow the stack
data to be executable. The stack is protected by default.

------
mpweiher
One of the reasons to love Postscript:

    
    
       [ (Hello World) /print cvx ] cvx exec 
    

Of course, procedures are just shorthand for executable arrays, so

    
    
       { (Hello World) print } exec 
    

does the same. Oh did someone say "C"/machine code? Never mind ;-)

------
israrkhan
Unless you are developing an exploit, such code should almost never be used in
production system. Most modern operating systems will prevent execution from
data buffers or stack. By having such "tricks" in your code you are opening
yourself to buffer overflow exploits.

------
eighthnate
I guess this is fine if he is new to this. But playing around with buffer
overflows and injecting a payload ( with a healthy dose of NOPs ) and having
that open a app ( maybe even a shell ) would be far better.

------
silur
What exceptionally new technique that we totally didn't know from like.... the
'80s. Welcome to shellcoding kiddo. -.-

------
pjc50
Sure you can do this, but (other than in the context of shellcode), why?

------
nopit
Its literally called shellcode

~~~
rurban
It's also called jit. In this case a static jit.

