
Manually Creating an ELF Executable - reversingftw
http://robinhoksbergen.com/papers/howto_elf.html
======
codezero
Very similar to several posts that have come up before:
[https://news.ycombinator.com/item?id=5508981](https://news.ycombinator.com/item?id=5508981)
and
[https://news.ycombinator.com/item?id=3158862](https://news.ycombinator.com/item?id=3158862)
as well as things mentioned therein:
[http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...](http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)
which is a 45 byte executable (it doesn't say Hello World), and
[http://asm.sourceforge.net/intro/hello.html](http://asm.sourceforge.net/intro/hello.html).

All the same, very cool, this is a great experiment for anyone to work on :)

~~~
breadbox
>
> _[http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...](http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm..).
> which is a 45 byte executable (it doesn't say Hello World)_

True, I needed 62 bytes to get the greeting in there.

~~~
lelandbatey
Man, that's an extremely interesting and thorough exploration of the nitty-
gritty of how programs work, and I found it totally satisfying to read. I've
had many little questions I barely knew I had answered by reading this. Thanks
very much for making it!

------
sdegutis
Awesome. My wife and kids asked me if you can write computer programs using
just 1s and 0s and I told them that assembly was the lowest you could use,
apparently mistakenly. Gonna have to try this out!

~~~
adestefan
Here's the DOS COM version. Copy this into HELLO.COM with a hex editor of your
choice. It will run in dosbox (that's what I tested on) and should run in
Windows COMMAND.COM on a 32-bit system.

    
    
      b409ba0d01cd21b44cb05dcd2148454c4c4f20574f524c442124
    

Here's the breakdown:

    
    
      b409    MOV AH, 09h    ; OUTPUT string
      ba0d01  MOV DX, 010Dh  ; address of output buffer (remember we're loaded into 0100h)
      cd21    INT 21h
    
      b44c    MOV AH, 4Ch    ; TERMINATE with return code
      b05d    MOV AL, 5Dh    ; return code
      cd21    INT 21h
    
      48454c4c4f20574f524c442124 "HELLO WORLD$" ; DOS strings are $ terminated
    

Back in the day we used to do this with the DEBUG.COM command. It's actually
not that bad to do this. The thing that sucks is hand assembling and going
back to fixup your addresses. It starts to get hairy if you need to flip
around the CS and DS registers to shift in different segments.

As an exercise for the reader use the RET instruction instead of doing the
return code stuff. If you want to really get into you you can look up how to
read from the console and then display that string back to the user.

~~~
drv
You can cheat a little bit and use RET rather than INT 21h to save a few
bytes.

The way this works is pretty clever - the COM loader places an INT 20h
instruction (the old-style terminate function) at offset 0 in the PSP (the
100h-byte-long structure loaded right before the contents of the COM file).
The loader also sets up the word above the initial stack pointer to 0 so that
a RET will return to address 0 and execute the INT 20h.

~~~
adestefan
And the RET saves you one byte over the INT 20h. Oh what a different world we
live in now.

------
abbeyj
Is it explained where the values for ecx and edx get filled in? I'm confused
as to why edx=0x080480B1 instead of the length of the string (14).

~~~
arh68
It's the HWLENADDR you're talking about? It's placed as the last byte in (not
in: after) the string. Down in the final product the edx part ends up BA B1 80
04 08 (0x080480B1). It's a pointer to that '14' stored 13 bytes past the start
of the string ("hello, world\n" is 13 bytes). The ecx points to (0x..A4),
exactly 13 less than B1.

edit: You're probably thinking it should be placed as an immediate value. It's
hard to say exactly how many bytes it'd take up that way, so using an address
to an immediate value is a bit simpler.

~~~
abbeyj
I see where I was off by one byte (14 instead of 13) but that still doesn't
explain things. The code doesn't load edx with 13 but instead loads it with
0x080480B1. If you wanted to fetch a byte from that address and load it into
edx you'd need to use something like "movzx edx, byte ptr [0x080480B1]" but
the code on the page just does "mov edx, 0x080480B1". In fact when you run
this the code actually prints out not just "Hello, World\n" but also prints
out the 0x0D length byte and then the entire rest of the page (all 0x00). Of
course these aren't visible in a terminal. I'm guessing the kernel tries to
keep printing but it can't since the next page isn't mapped so it just gives
up.

~~~
adestefan
I agree. This is definitely a bug. The correct instruction is 8b 15 B1800408
or a MOV EDX, [0x080480B1]. I hope I got my register number correct on that.

The movzx/movsx is only needed when you're doing something like movzx EAX, BX.
That way you're sure the CPU does what you expect when moving from a smaller
register to a larger one. Man do I hate x86.

------
alexhutcheson
I'll be more impressed when I see it done truly manually - with a magnetized
needle and a steady hand

~~~
robinh
Well... I'll see what I can do.

------
screeny05
this is.. Awesome! That's like the most amazing thing i've ever seen here on
HN

