Hacker News new | past | comments | ask | show | jobs | submit login

It's possible to make the code slightly smaller, by relying on Linux zeroing registers when the program starts. That's part of the Linux ABI, and couldn't be changed without breaking programs, so it's safe to rely on.

Reducing the size of the code allows embedding it in less of the header, giving more options for code layout.




Using this, I managed to get the file down to 114 bytes, while still printing "Hello, world!\n" and returning 0:

    [bits 64]
    file_load_va: equ 4096 * 40
    
    db 0x7f, 'E', 'L', 'F'
    db 2
    db 1
    db 1
    entry_point:
      mov al, 1
      mov esi, file_load_va + message
      jmp code_chunk_2
    dw 2
    dw 0x3e
    dd 1
    dq entry_point + file_load_va
    dq program_headers_start
    code_chunk_2:
      mov edi, eax
      mov dl, message_length
      syscall
      mov al, 60
      xor edi, edi
      syscall
    db 0 ; usable
    db 0 ; usable
    dw 0x38
    dw 1
    ; We simply deleted the three two-byte fields that used to be here. The only
    ; one that mattered, the number of section headers, will still be zero due to
    ; the upper two bytes of the field at the start of the program header being
    ; zero.
    
    program_headers_start:
    ; These next two fields also serve as the final six bytes of the ELF header.
    dd 1 ; Program header type: must be 1 (loadable segment)
    dd 5 ; Program header flags: must be 5 (readable and executable)
    dq 0 ; Offset of loadable segment in the file
    dq file_load_va ; Address in memory to load the segment into ; could change
    message_length: equ 14
    message:
    db `Hello, w`
    ; size in file then size in memory; can be anything non-zero and equal
    last_bytes: equ `orld!\n`
    dq last_bytes
    dq last_bytes
    dq 0 ; alignment; usable
This compiles and runs, and it's 114 bytes:

    $ nasm -f bin hello.asm -o hello && chmod a+x hello && ./hello
    Hello, world!
    $ ls -l hello
    -rwxr-xr-x 1 josh josh 114 Oct 13 10:28 hello
Getting the file any smaller would require finding a way to overlap the program header further inside the ELF header. As the article observes, that seems challenging given the validation the kernel does.


Managed to get it down to 105 bytes by further overlapping the program header into the ELF header:

    [bits 64]
    file_load_va: equ 4096 * 40
    
    db 0x7f, 'E', 'L', 'F'
    entry_point:
      inc al
      mov esi, file_load_va + message
    pass2:
      xor edi, 1
      jmp code_chunk_2
    dw 2
    dw 0x3e
    code_chunk_2:
      mov dl, message_length
      jmp code_chunk_3
    dq entry_point + file_load_va
    dq program_headers_start
    code_chunk_3:
      syscall
      mov al, 60
      jmp pass2
    db 0 ; usable
    db 0 ; usable
    db 0 ; usable
    program_headers_start:
    dd 1 ; Program header type: must be 1 (loadable segment)
    db 0x5 ; Program header flags: low bits must be 5 (readable and executable); high bytes don't matter
    dw 0x38
    dw 1
    ; High 7 bytes of offset of loadable segment
    db 0
    db 0
    db 0
    db 0
    db 0
    db 0
    db 0
    dq file_load_va ; Address in memory to load the segment into ; could change
    message_length: equ 14
    message:
    db `Hello, w`
    ; size in file then size in memory; can be anything non-zero and equal
    last_bytes: equ `orld!\n`
    dq last_bytes
    dq last_bytes
    dq 0 ; alignment; usable


Better yet, another commenter [1] found that you can clobber the number of section header entries, as long as the size of a section header entry is 0. So, now the smallest size is two bytes shorter: 112 bytes for a full "Hello, world!", with an 8-byte "alignment" field to spare!

I'll need to update this article. The only annoying part will be scribbling over the hexdump output again.

[1] https://news.ycombinator.com/item?id=28849023


I wonder if there are some nice tools for "scribbling over hexdump" somewhere, and also rendering pretty output based on that. It tends to be really helpful both when synthesizing/assembling some binary formats, as well as debugging/decoding/disassembling existing ones (and then ideally also writing blogposts based on that). I saw some "annotation" tool like this in one disassembler I tried once, but it wasn't super great, and didn't allow for easy tweaking & moving of annotation groups after doing some changes in the output. I'm pretty sure this is something that's done very often by reverse-engineering people, so I'd assume tools like this should already be popular, just I don't know how to find them? I know there's also some Lua API with support for disassembling many protocols in WireShark, but I don't suppose it's easy to prototype & quickly iterate new formats in it (?)

For some really beautiful hand-made annotated binary format hexdump, see e.g.: https://github.com/corkami/pics/blob/master/binary/DalvikEXe...

If someone knows of tools fitting more or less what I described above, I'd be super grateful for some recommendations!!


chuckles did that daily in 1986-87 on 68020 asm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: