
A Handmade Executable File (2015) - yinso
https://www.bigmessowires.com/2015/10/08/a-handmade-executable-file/
======
userbinator
I remember the previous record was 97 bytes:

[https://webserver2.tecgraf.puc-
rio.br/~ismael/Cursos/YC++/ap...](https://webserver2.tecgraf.puc-
rio.br/~ismael/Cursos/YC++/apostilas/win32_xcoff_pe/tyne-
example/Tiny%20PE.htm)

But I'm not sure if those will run on anything newer than WinXP.

~~~
ChrisSD
It might work on 32bit Windows 10 but I've not tested it. The smallest of the
Tiny PE files that runs on 64bit is 296 bytes, which is close to the 268 bytes
mentioned in the article.

It falls down when removing the data directories so I guess modern 64bit
Windows is stricter about that.

~~~
ChrisSD
After some investigation I'm pretty sure 268 bytes is indeed a hard limit for
reasons that are mostly explained on the tiny pe page

> The PE specification says that the number of data directories is specified
> in the NumberOfRvaAndSizes header field and the size of the PE optional
> header is variable. If we set NumberOfRvaAndSizes to 0 and decrease
> SizeOfOptionalHeader, we can remove the data directories from the file.

> Most functions that read the data directories check if NumberOfRvaAndSizes
> is large enough to avoid accessing invalid memory. The only exception is the
> Debug directory on Windows XP. If the size of the Debug directory is not 0,
> regardless of NumberOfRvaAndSizes, the loader will crash with an access
> violation in ntdll!LdrpCheckForSecuROMImage. We need to ensure that the
> dword at offset 0x94 from the beginning of the optional header is always 0.
> In our PE file this address is outside the memory mapped file and is zeroed
> by the OS.

In Windows 10 (and probably Vista & 7) this memory isn't "zeroed by the OS",
so we have to set it in the PE file. This appears to be a hard limit on the
minimum size.

------
rurban
I have a similar script which stuffs a minimal PE header in front of a perl
script which then executes said script.
[https://st.aticpan.org/source/RURBAN/C-DynaLib-0.61/script/p...](https://st.aticpan.org/source/RURBAN/C-DynaLib-0.61/script/pl2exe.pl)

------
Figs
Back when I was in high school and trying to wrap my head around assembly
language and how CPUs and operating systems worked and all that fun stuff, I
ended up learning _just enough_ x86 machine code to implement small DOS COM
programs directly in a hex editor.

I've long since forgotten how to do most of this, but just for fun, I went
back today and tried to work some of it out again. (So please forgive me if I
make a mistake in my write up! It's been ~15 years since I last did this...)

Here's a very short DOS COM program to print "Hello, world!" to the console --
annotations as to the assembly code and ASCII data are on the right. As you
can see, this is short enough to enter by hand into a hex editor without too
much trouble:

    
    
        B4 09                       mov AH, 09h
        BA 0B 01                    mov DX, 010Bh
        CD 21                       int 21h
        B4 4C                       mov AH, 4Ch
        CD 21                       int 21h
        48 65 6C 6C 6F 2C 20        Hello,<space>
        57 6F 72 6C 64 21           World!
        0D 0A 24                    \r\n$
    

DOS COM files are extremely simple. They simply contain the machine code and
data with no headers or other junk to worry about. Since it doesn't include
any information about where the program expects to be loaded in memory, they
are always loaded starting at the fixed address of 0x100. (i.e. 256 bytes into
memory) DOS puts other information into the first 256 bytes -- like the
command line arguments.

This program uses DOS's general purpose interrupt, 21h (i.e. hex 21 -- which
is 33 in decimal) to perform two operations. When int 21h is executed, the CPU
hands control to DOS's interrupt handler to determine what to do. For
interrupt 21h, DOS looks at the high byte of the AX register (AH). For the
first operation, this is 09h, which indicates that we want DOS to write a
$-terminated string to the output console. (I have no idea why DOS used
$-terminated strings instead of null terminated strings.) The address of the
first byte we want copied to the output is stored in the DX register. If you
count the number of bytes from the start of the program until we get to the
'H' (hex 48) in "Hello, World!" you'll see that it is 11 bytes from the start
of the program (bear in mind, we start counting from 0). In hex, the decimal
number 11 is writen as 0B. Since, as mentioned earlier, DOS COM programs start
executing at 0x100, the actual address that the string will be loaded into is
consequently 0x100+0x0B = 0x010B. Intel processors are little-endian, which
means that we need to write the bytes of this 16-bit address _backwards_ in
the file (i.e. the "little end" comes first). Thus, the code "BA 0B 01" to
load 0x010B into the 16-bit DX register. (BA being the "MOV DX" part of the
instruction.)

Ok -- short aside -- _technically_ what I wrote above is not quite the full
story, as the address that gets printed is actually DS:DX (DS being the "data
segment" register), not just the address in the DX register. Early x86 had
some unusual ideas about ways of addressing memory and used a segmented memory
model. Basically, the 8086 had 20 pins allocated for specifying a memory
address to future proof it up to the enormous memory size of _an entire
megabyte_ \-- admittedly, that probably seemed outrageously huge in 1976 --
but the registers were only 16-bits wide. In order to address the additional
memory, the designers took a 16-bit number from one register (such as DS),
shifted it by four bits (so that it filled up the top-most 16 of the 20 pins),
and _added_ another 16-bit address to that to get the full 20-bit address
accessible by the chip. So, DS:DX would refer to the 20-bit address (DS<<4 +
DX). This had some benefits for compiler writers at the time, but confused
_the hell_ out of a lot of other programmers since bytes were not uniquely
addressed (e.g. 0000:0010 and 0001:0000 both evaluate to 0x00010 in the 20-bit
linear address space). As memory sizes grew and grew, this didn't work out so
well, and has been abandoned in newer processor modes. Alright, that's enough
of an aside! The entire program here is small enough to fit in one segment
(it's only 27 bytes long!), so we don't need to worry about segments -- but I
thought it might be interesting to some people as a historical curiousity.

The second interrupt we use is 21h with AH set to 4Ch. Executing this
interrupt tells DOS that we want to end the program. Obviously, we need to
quit, otherwise, it will continue on and try to execute the data string, then
whatever other gargabe is in memory. Simple enough.

Well, that's about it. I don't know if this will still work on recent
incarnations of Windows natively -- I suspect the last version it would work
on is probably 32-bit Windows 7 -- but it can be run under DOSBox if not. Just
get a hex editor, and have at it. :D

Here are some links that might be interesting if you want to know more about
this stuff:

[https://en.wikipedia.org/wiki/X86_memory_segmentation](https://en.wikipedia.org/wiki/X86_memory_segmentation)

[https://en.wikipedia.org/wiki/Intel_8086](https://en.wikipedia.org/wiki/Intel_8086)

[https://en.wikipedia.org/wiki/Program_segment_prefix](https://en.wikipedia.org/wiki/Program_segment_prefix)
\-- i.e. what's in the first 256 bytes for COM programs on DOS

[https://en.wikipedia.org/wiki/COM_file](https://en.wikipedia.org/wiki/COM_file)

[http://spike.scu.edu.au/~barry/interrupts.html](http://spike.scu.edu.au/~barry/interrupts.html)
\-- list of DOS 21h interrupt commands

~~~
animal531
In the old days you could have just made an empty file and renamed it with a
.com extension, and it would have executed.

------
Izmaki
Why would it return 44?? What madness is this?!

~~~
jwilk
The code in the .TEXT section looks like this:

    
    
      push 44
      pop eax
      ret
    

So it assigns 44 to eax (in a slightly convoluted way) and returns.

------
vadimberman
Is it organic and grass-fed though?

Anyway, I wonder if that's how the 1980s - 1990s viruses were replicating.

