> I show how ELF-like features can be safely retrofitted onto executable formats contemporary with ELF’s debut.
I am surprised at the naïvité of the author on this question as from his photo he probably lived through this transition in real time. The statement ignores the whole point, sort of liking saying “all modern computers are basically Turing machines.” True, but not insightful.
It was no surprise to anyone at the time that various other required features could be jammed into other approaches (well, not a.out which is too simple) and often were in ad hoc and incompatible ways. In fact I designed the bfd library specifically with this in mind, to try to give some generality to object file generation and manipulation.
ELF was designed by committee, but it is not a camel. It addresses a number of complex issues in a standard and extensible way. Issues that didn’t arise on a time shared PDP-11 in the 1970s.
I worked with a.out on early unixes and elf on later ones - not addressed here is the main reason for switching: ELF allowed you manage pages contiguously on disk so that they could be paged in directly (stuff can be page aligned within the file at it's natural offsets), while a.out was designed for swapped kernels where you would just read() data into a text section, address 0 might be 32 bytes from the start of a file (disk space was more of a concern back then).
There are also segmentation differences. Only one copy of the read-only data and code (aka text) segments are mapped into all programs which load the library. However, each program instance gets its own unique pages of read-write data. Those pages are always located at a fixed offset from their code segment. That way, modern CPUs can execute PC-relative instructions to generate their addresses. The global data segments for a module are always located at a fixed offset relative to the code pages.
See ARMv8 `adrp` (address of PC-relative page), RISC-V `auipc` (add upper immediate to program counter), and x86-64 PC-relative addressing for some modern examples.
Then go backwards in time and see how ARMv7 does it (literal pools) and how some earlier RISCs did it (Itanium and Alpha global pointer aka "gp", PowerPC table-of-contents register).
You can load multiple pages in a single read operation, yes, and do less computation when trying to load/purge on a per-page basis (instead of a lookup table you just do a single addition)
That's what we did on the old swap based systems, on modern paged systems we don't read pages in untill there is a page fault, to do that efficiently we need to align page boundaries with disk block boundaries - elf (and coff before it) allowed you to do this while a.out doesn't
This is all well and good, but in reality everyone is using all the rich features of ELF -- dynamic linking obviously, but also symbol versioning, constructors, symbol interposition (probably the worst one for performance), symbol visibility, preload, etc. The question is how to make it fast in the common case, and actually the Linux and glibc authors are doing a pretty good job here.
I have looked quite closely into glibc in the past and there are great many things that could be improved and not using some of the fancy ELf features is probably one of them. For example symbol versioning doesn't appear to have a good cost/benefit balance.
Symbol versioning is the feature which has kept libc at major ABI version 6 for all these years. The libc5 -> libc6 major version bump gave users trouble for many years afterwards.
Some of the features you mentioned are not really features of ELF itself but rather tacked on by tools and libraries.
For instance, symbol versioning is just an ad-hoc convention for embedding version numbers into the symbol strings themselves. And the convention assumes that the source language is C, which requires other languages to mangle their symbols to make them fit.
Some features that systems rely on are not even officially documented. I've read accounts from a developer of a linker who had to read through code, blogs and old mailing lists to figure out the actual format for a feature.
One thing I saw in KDE applications (KDE has split its base libraries into a bazillion small libraries) is that looking up a symbol in a dynamic library is fast, but linearly trying all the libraries is costly - (IIRC) 20-40% of startup time. A possible solution would be to store with each symbol where it's expected to come from. Required libraries are listed as an array, so the "where" could be an index into the array. Symbol overriding would have priority over that lookup mechanism, but apart from overrides, lookup could be just one hash lookup in the already known library.
I am still hoping for someone else to implement that so I occasionally mention the idea :)
Michael Meeks did an implementation of Direct Binding for GNU binutils and glibc but it got rejected to be in the mainline with "prelink is more efficient". (<https://lwn.net/Articles/192082/>)
I haven't found anything about the Direct Binding in GNU that is more recent than 2006. Maybe time to revisit?
Just submitting part 2 of this as I stumbled across it after having played around with write ELF files by hand. Never knew about the Hunk format and I think the author makes some interesting points.
There are OS internals books that document this kind of thing (e.g., various books on FreeBSD, Solaris, MacOS, etc.). You can also learn a lot just by reading the specifications for various object file formats, or the source code to tools like linkers and debuggers. There are also some dated books that are specifically on linkers and loaders (I've only leafed through them in a bookstore, don't know how good they are).
But on the whole, I'd say you learn on the job. Project, anyway.
In days of yore these systems-level structures often arose out of a project's specific needs and the individual experience of the people on staff. For example, I sat next to the person who designed the GEMDOS executable format (we needed one, the old one in CP/M-68K was terrible), and I think it was done in a day or two. The engineer in question had maybe 15 years industry experience, including some time as a systems programmer at IBM; I think the format would have been different (maybe better, maybe worse, how would we know?) if a different engineer had decided to do that work.
I used a couple tricks from the GEMDOS executable format to do some rather nifty runtime work at Apple (it's not like the format was secret or anything). That's cross-pollination for you.
Off topic — does anyone know more about what happened to the Kestrel project? It sounds from the note in the archived GitHub project[0] like the maintainer shut it down because a company released a similar product with the same name. Seems odd (and sad).
Reminds me that I have an unfinished project to write a 68k ELF binary loader for the Atari ST that I never finished. Now that I've quit my job maybe I can return to this.
System programming isn't a dark art, it's just programming with programmers as an audience. Application programming focuses on end-users who may not be programmers. I've done both.
I've used and developed on X11 since 1990. I'm well aware of its numerous limitations and I'm glad new technology such as Wayland is being developed to replace it. But it wasn't "a mistake".
I am surprised at the naïvité of the author on this question as from his photo he probably lived through this transition in real time. The statement ignores the whole point, sort of liking saying “all modern computers are basically Turing machines.” True, but not insightful.
It was no surprise to anyone at the time that various other required features could be jammed into other approaches (well, not a.out which is too simple) and often were in ad hoc and incompatible ways. In fact I designed the bfd library specifically with this in mind, to try to give some generality to object file generation and manipulation.
ELF was designed by committee, but it is not a camel. It addresses a number of complex issues in a standard and extensible way. Issues that didn’t arise on a time shared PDP-11 in the 1970s.