Hacker News new | past | comments | ask | show | jobs | submit login
How is a binary executable organized? Let’s explore it (jvns.ca)
163 points by jvns on Sept 6, 2014 | hide | past | favorite | 31 comments

A nice treatise on a well-charted ocean.

Another great article along these lines for those interested in assembly and execution environments and so on .. is of course the famous muppetlabs exploration to build the smallest linux executable:


A kind of love what this author has been doing for the past several months: taking personal deep dives into systems technology and then just enthusiastically writing up what they learned. I wish more people would write this way. I know I have a hard time doing it; I want to know that anything I write is novel, which is a crippling and stupid limitation I put on myself.

The reality is that one great way to get to novel value is to apply your own take on something that people have done before; you may find a way to teachably express a concept better than others have before you.

Her next article should be on how to make a kernel module for a new executable file format. Bring it around town.

It probably has to do with heritage, but coming from a DOS/Windows background it feels like many things on *nix systems, including executable formats, are rather more complex than they really need to be; ELF is no exception to this, and the linker/compiler defaults tend to create far more sections than necessary.

But interestingly enough, the smallest (known) Win32 executable is actually still more than double the size of that Linux one: http://www.phreedom.org/research/tinype/

...and both of them don't do much more than exit with a defined value. I think for simplicity, nothing beats the DOS COM format - pure code, up to 64k, loaded in a single segment at offset 100h, entry point is right at the beginning. A "Hello world" is on the order of 20 bytes. (Of which 7 bytes are actual code and the rest is the message.)

The sub-1k categories at Pouet (http://www.pouet.net/prodlist.php?type%5B0%5D=32b&type%5B1%5... ) are some of the more amusing things possible in a tiny executable.

Goes very well with some Assembler (GAS - GNU Assembler, ie. Linux Assembly) knowledge -- and, there's a great free ebook for that!


Too bad it's AT&T syntax, which is pretty incompatible with the majority of x86 reference material out there (including Intel's own manuals)... something like fasm/nasm would probably be much better in a *nix enviroment. This gives some of the reasons why: http://x86asm.net/articles/what-i-dislike-about-gas/

It's also an odd order for that book to defer binary/hex and data representation until far beyond other more complex/higher-level material. The fact that the code listings have a "compiler-generated" feel to it, complete with redundant instructions, suggests that this isn't a great way to learn Asm; see https://news.ycombinator.com/item?id=8248172 for a more detailed explanation.

Also another good one:

"Creating Really Teensy ELF Executables for Linux"


I've read a number of jvns articles on HN [1], and each one impresses me with how she is able to balance technical discussion with readability. I really enjoy her writing, and look forward to her next post.

I applied to Hacker School solely on the basis of reading her blogs and being impressed with the work she was doing.

[1] Favourite: http://jvns.ca/blog/2013/10/16/day-11-how-does-gzip-work/

Wiki [1] and OSDev [2] both have pretty solid articles if you want to know how to load an ELF into memory.

I used both when implementing ELF for my hobby kernel [3]

[1] http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

[2] http://wiki.osdev.org/ELF_Tutorial

[3] https://github.com/lexs/rust-os/blob/master/rost/exec/elf.rs

Do you think it would be possible to describe a binary file in such a way that some git extension of sorts could be made to understand the file so that only the minimum delta would have to be saved by git?

Probably, but what for? It doesn't seem feasible to support every common binary type, and are all common binaries even understood well enough to do this?

I wish this didn't have so many exclamation points in it! There are literally 41 (!!!) exclamation points in this short blog post! It is exhausting to read! Please consider not using so many! There should be ways to convey enthusiasm about a topic without peppering your writing with exclamation points!

We're discussing a pretty solid technical article, one deeper than the typical HN programming article, and this is a very superficial criticism. It's like you're proudly advertising that you're so disengaged with executable file formats that you can be derailed by idiosyncratic punctuation.

I'm actually very interested, which is why I bothered to comment in the first place. I had to download the HTML of the article and search and replace all exclamation points with periods to be able to read it.

That is a crippling disability you're grappling with and I am sorry I made light of it.

> pretty solid technical article, one deeper than the typical HN programming article

Wow. Seriously? I'm kinda surprised to hear it from you.

Why? It's true. 90% of programming articles lately are "Check out my port of <some marginally interesting library> to Go!" Julia does awesome work and is kind enough to break things down and share them with us.

Well, it's true that average HN article level may be somewhat low, but this one is just telling stuff that every 1-year CS student should know without actually going into details. "ELF is a file format like any other, and you can read it" — I hope everybody knew that, right? "There exist static and dynamic linking" should be quite widely known as well. But when it comes close to something actually interesting — "_start <..> does a bunch of Very Important Things that I don’t understand very well, <..> so I won’t explain them." What's the point then?

I'm not saying that article is bad, actually I believe that it can be useful for somebody, because there are some who doesn't know what computer program is even to this extent, but I'm merely surprised that somebody presumably competent would call such an superficial article "deep".

2/3rds of all bad nerd programming message board comments seem, for some reason, to start with "every 1-year CS student should know...".

No, they don't. Thinking that they do suggests to me that you're still in school (at least emotionally), because technology is vast, everybody specializes somehow, and not everything you crammed for in CS 100 remains memory resident and actionable once you're actually working.

I work in software security, where file formats are especially relevant; I write a new debugger an average of once a year. This was a good post. And file formats are in fact arcana to most working programmers, as I've learned by actually having conversations with working programmers that touched on how executable loaders work.

First-year computer science sounds like a lot of work. I've been told, at various times, that every first-year CS student should know Java, Python, C, assembly, data structures, algorithms, operating systems, and compilers.

I guess the second through fourth years are spent sleeping, to make up for the sleep debt from that first year?

Anyway, CS students may very well be polymaths who shoot lightning from their fingertips, but those of us who spent our wayward college years solving the Schrödinger equation and eating pizza are happy to read a fun article about the executable formats that we've never really taken the time to look into.

I know that you work in software security, that's why I'm surprised in the first place. In the other case I wouldn't comment at all, because, as I said, I don't think article is bad or good. It's superficial, but nothing new here. It's ok to write superficial articles, because they are useful for somebody as well. The word "deep" is what worried me, not the article itself.

It's perfectly natural that not everyone knows how exactly executable loaders work. The same goes for what you called "file formats". That article isn't about that, because, as I said, all interesting parts are skipped. All the facts revealed are that computer executes some files, that are not arcane magic inside, but can be read and decompiled (which isn't even always true, but author doesn't mention it as well, which is natural). Static and dynamic linking are terms everyone should know as well, because even if you use only interpreted languages like Python, once in a while you find some library that requires manual compilation and run into all these nasty problems like required library versions don't match. Or maybe you could notice that the same Qt app compiled for Windows takes a lot more space on the disk than that does when compiled for Linux and ask yourself why. So, it isn't something that only specialists know, it's a basic fact about how computers work.

I don't know about today, but 15 years ago you couldn't be considered programmer without knowing that much at least. It's surprising to hear that it is considered deep.

I know what you mean. I think any person with proper CS knowledge should know these things, sure. What is nice about this article is it is 1) short, 2) gives people that know roughly how executables work knowledge about tools like readelf and really basic things like objdump. This is not a deep article, at all, not even in the ballpark. But it can give someone a short bite to see and decide if they want to explore deeper references, presumably linked to by the article.

> I don't know about today, but 15 years ago you couldn't be considered programmer without knowing that much at least. It's surprising to hear that it is considered deep.

60 years ago, you couldn't be a programmer without knowing the exact binary code of every opcode your machine executed, and how all of the peripherals worked at the lowest possible level.

Back then, of course, all programs were trivially small to fit in hilarious amounts of memory and mass storage, and graphical programming was a specialized topic, to put it mildly. Networking in the form we know it now flatly didn't exist.

I'm not convinced that the amount of knowledge programmers know has changed, but the kind of knowledge surely has.

That makes me think of the 1986 vintage Mac SE sitting on one of my tables, more as a decoration than anything. It's new enough that it's almost sorta kinda possible to get it on the internet, yet also old enough that it's somewhere between an adventure and a PITA to do so, and pretty challenging to do anything vaguely useful with it once you do.

I don't know as much about the nitty gritty details as I'd like, but it's damn cool that I can now write 1 line of Ruby that fires off a query at a web server somewhere, gets a JSON reply, parses it into a Hash, and delivers it back to me, doing roughly a kazillion hugely complex things along the way. It lets us all spend a lot more time building things that are useful to customers instead of scrounging around with bits, fun as that can be sometimes.

> Well, it's true that average HN article level may be somewhat low, but this one is just telling stuff that every 1-year CS student should know without actually going into details.

Based on the hundreds of programmers I've met with CS degrees, I would say that 90% of them have probably never even written C, most write Java which this still applies to I suppose. And of those 10% I doubt any of them had the curiosity to use a tool like readelf to understand symbols and static linking. Given that, I highly doubt most first year CS students have any idea what this article is even about, much less care.

If everyone in the world shared your attitude about sharing knowledge, the world would have one smart person and a bunch of morons. Get off your pedestal.

In my experience, no, you can't assume a first-year CS student knows that. You can't even assume CS grads who've been working for 10 years know that, or remember it if they did. You're right that it isn't rocket science, but it's more down in the weeds now than ever before.

As somebody who works on operating systems, I share your disappointment that it's not common knowledge, but it really is trivia insofar as it's related to the sorts of things that most working programmers do these days.

However, I do not share your disappointment that somebody had the nerve to write a perfectly fine article about something you already knew about. How dare they! How will you ever deal with the shame you feel on their behalf? These are truly tragic days we live in. :)

Actually, no. No first-year CS students should know about ELF. They should know about:

1. basic data structures (variables, arrays, simple binary trees)

2. static control flow (sequential execution and control structures)

3. dynamic control flow (the call stack and how exception handling works (in C: setjmp()/longjmp()))

4. basic program structuring and hygiene (functions, named constants, picking good names)

Focusing on that instead of the details of machine-level knowledge is what separates CS from IT; we need both, so we should not try to make our CS programs bad copies of our IT programs.

I think ideally a first-year CS student has been programming and learning prior to freshman year, is probably what the parent meant.

> I think ideally a first-year CS student has been programming and learning prior to freshman year, is probably what the parent meant.

It's nice when people come in to a class warm, as opposed to completely cold, but it's bad pedagogy to assume specialized knowledge beyond what's explicitly listed in the course prerequisites.

Incidentally, Julia's use of exclamation points in her blog actually inspired the name of a programming conference: http://bangbangcon.com/

I like them. I'd imagine that's how I'd feel as I hit those "a ha" moments while demystifying something. There's so many because the article is only covering those moments, which is another nice feature.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact