Last November, I looked into this problem a bit more thoroughly, studying all the conditions that the Linux kernel actually checks. My conclusion was that the absolute smallest possible 64-bit ELF file in Linux was 80 bytes. (The smallest 32-bit ELF file was indeed 45 bytes, as in the original article.) This version was written to simply call _exit(0), but I recall that there are a few more opcode bytes to spare:
I wonder if this version could be modified to create a Hello World smaller than 105 bytes? Unfortunately, I can't find where I put all my notes for this, though I did keep the snippet I used to generate it:
The main annoyance is that I have to use a 7-byte instruction to load the RIP-relative address of the string into RSI, since the program can't be placed any lower than 0x000100000000 in virtual memory with this offset. There may yet be some way to shorten it that I haven't thought of, though. The four blocks of instructions are
Perhaps I should make a write-up about the exact constraints imposed by the kernel, and how they limit the possible offsets between the ELF header and program header.
I no longer think it's within my ability to get this any shorter, at least not without wackier tricks, like using the fixed header bytes within instructions, or interleaving instructions within each other, or pushing some wild code snippet onto the stack. (Any variation of pushing the string onto the stack takes too many bytes to pad the immediate operands and to put RSP into RSI; running any trivial snippets on the stack simply doesn't make the code any shorter; and in both the last variation and this one, there's no way I can find to put the string address 0x10000004d into RSI and jump to the next block in fewer than 9 instruction bytes. Note that 1 instruction byte in the first block can be saved in both of these variations by popping argc into RAX rather than moving a literal 1 into it, but that extra byte can't be used to save any instructions in later blocks.)
As it turns out, the idea of using the fixed header bytes in the instructions isn't so wacky after all! We can move the 0x00000001 at 0x18 into EDI, then use the next two bytes as a JMP to the next block. (Also, I'd totally forgotten that the SYSCALL instruction saves the return address in RCX, which lets us use a no-op read(2) to shave a byte off the address calculation and avoid the assumption that argc == 1.) This technique takes the program down to 86 bytes, the smallest it can become without breaking up the "Hello, world!\n" string:
I can't think of any possible trick that would allow the string to be broken up while also saving bytes overall. Perhaps there exists some wild self-modifying code that somehow produces the correct behavior, or perhaps this program truly is the shortest.
The guys at Be Inc created a compressed ELF format for BeIA (the internet appliance OS they created just before the company folded.) I believe this is the patent (there may be others) that covers the process: https://patents.google.com/patent/US6883087
I think the Patent is just a general method for creating "crushed" executable code... The BeIA ELF implementation used the magic number "CEL" rather than "ELF" and is used a common table to look up certain symbols - so the exe only had the non common parts in it (or something like that, I haven't got a good handle on how it functioned, just a developers view of what it was like to use.) The process was called "crushing" and a "crushed" exe was massively smaller than the "uncrushed" version. I went over a load of the info I have in this thread over on the Haiku OS discussion forums: https://discuss.haiku-os.org/t/imagining-beia-like-concept-i...
> I haven't been able to track down the original date, but it was already around in the early 2000's.
I remember reading the original article when working on our 4k intro Sesamstr [1] for a demoparty in September 1999. (The ELF Kickers [2] changelog lists July 1999 for the initial release.) We wanted to dynamically link libvga.so, which requires some more features of the ELF header to play with. It turns out that the linker hash table can overlap with other parts of the header, to shave off some bytes.
As an additional shot of nostalgia, I remember some other sceners (Wedge, Viznut, XeF4?) trying to avoid SVGAlib entirely, so that one could use the smaller "a.out" format. Because one could not simply use movl $0x13, %eax / int $0x10 in Linux, there were some crude attempts to set all the required VGA registers "by hand". I think that required a couple of hundred bytes to pull off. Back then, you could probably get away with not restoring text mode at the end :)
Nice. One thing I liked about the original MuppetLabs article is that it included a dl example with a call to libc.so (which adds a whole additional layer of complexity.) I'd be very interested in seeing a 'modern', 64 bit nasm example doing this in "raw" binary mode (nasm -f bin), tiny or not so tiny.
its not too bad, but you do need much more bytes for some additional elf structures like for example dynamic table entry for the library, dyn strings to go in there for the lib name and perhaps some relocation and things like GOT/PLT depending on if you want PIC code or not etc. fun excersize you can do yourself with this kind of thing as a base / template. i'd recommend to start with the non-overlapped-headers version :') saves some headaches. readelf and objdump can help a lot. (compare with normal compiled binary which runs 'printf' or some libc function...). once it works properly u can see what stuff u can overlap into places with bunches of 0s. (not a lot :P). everything u need is in elf specifications and x86_64 abi specification.
I agree with the pun potential, but surely on DOS the smallest binaries would be COM executables? So perhaps for exe we should restrict ourselves to modern Windows?
Ah interesting, thank you.
Agree with sibling "excersize" is a really good one.
I remember having a lot of fun excersizing with PE files under Windows95. More recent OS's, well, they are different.