Hacker News new | past | comments | ask | show | jobs | submit login
I wrote a linker everyone can understand (briancallahan.net)
208 points by ingve 14 days ago | hide | past | favorite | 55 comments

I wouldn't lead with symbols in an article focused on understandability.

A linker in first approximation just concatenates files.

In second approximation, it splits every input file into N parts (code and data, basically), then concatenates the corresponding parts in each input file, and writes all concatenated parts ("sections") to the output file.

It needs to fix up references at the end and it needs symbols for that, but that's kind of an implementation detail.

Even diagnosing duplicate symbols is a convenience feature if you look at things this way.

Symbols become more important if you want to write a "real" linker that supports weak symbols and symbols imported from dynamic libraries. But for the understanding the basics, it helps to deemphasize symbols, in my opinion.

The way I would describe a linker is backwards, kind of like what you did. As you say, you start with a linker as concatenating files, which themselves consist of multiple streams to be independently concatenated (i.e., sections). Then you point out that references between sections need to be resolved, and this is where you introduce relocations and symbols. Now you bring up the symbol resolution process, and the fact that the linker is going to (in some modes) choose to include object files only when it provides a necessary symbol. For bonus points, talk about segments, the dynamic loader, and/or debugging information.

This description is a reverse process of how the linker goes through and does all of it: first it loads all the input files, then resolves the symbols to build a link plan, then it fixes up necessary relocations, and finally it streams it all into the output file. I don't think it helps to omit any part of the process from the description--symbols are not "kind of an implementation detail," they are a core feature of what linkers provide.

Indeed, if I were to provide this as a blog post, I see no reason to not immediately reach for describing this using the full complexity of ELF and "real linkers" as the basis of description. You don't really gain anything from a simpler description, and building a linker that can build even relatively complex libraries such as glibc or Firefox's libxul is likely to be pretty simple--the really hard part of linkers is supporting things like linker scripts, which is not common operation.

Sounds like we'd describe a linker the same way :)

A simple linker doesn't need symbols for a link plan, or even a link plan (assuming you just load all input files and ignore static library indices). It doesn't even need a symbol table, it can do a full scan of all input files every time it needs a symbol during relocation processing. The symbol table is just a performance optimization.

Again, this changes for a real linker with weak symbols, comdats, section gc, etc.

I baselessly claim that a simple linker that doesn't need to read elf can be < 200 lines :) If you want to do elf, I think that becomes difficult.

The type of linker that was created is a symbolic linker whose very purpose is to resolve symbols. It's best not to approximate the functions of the linker, since it's purpose is so very straightforward.

Brian Callahan also wrote a very cool series of blog posts called "Demystifying programs that create programs" in which he builds an assembler (and a disassembler).

Part 1: https://briancallahan.net/blog/20210407.html

It’s an ambitious claim, but I must admit that, I read the whole article, and didn’t quite understand all of it.

As alternative, you can try to understand how Project Oberon linker works, given it is quite basic.

Chapter 6


Or the re-implementation of Turbo Pascal one for MS-DOS,


I found reading the code cleared up a lot because it's well commented and is really two functions, and a bit of boilerplate for reading/writing files.

'collect1' collects pairs of (name, address) and puts it in a table called 'collect'. Then 'process1' finds any names and substitutes the address found in the 'collect' table.

Smart linkers (and most linkers these days are smart linkers) are a lot like tracing, compacting garbage collectors.

A GC has a series of roots which point at blobs of memory. The GC needs to understand where further references are located in the pointed memory; a simple GC follows these references, avoiding cycles, until it has traced through the live set. It calculates new addresses for each blob of memory, then goes through the live set a second time, moving each object into its final position and adjusting all the references to point to their final locations.

A linker starts out with a main() function or equivalent which is a blob of code, contained in an object file produced by the compiler. The object file contains a table of references to all the external symbols used by the code, as well as the locations in the code that the references are made, classified by type of reference (e.g. absolute vs relative references). The references in the main() function are like the set of GC roots. The object file also contains a table of exports, all the symbols which it exposes externally.

The linker then locates the object code or data for each unresolved external reference (in other object files supplied on the command line, or in archive library files - indexed concatenations of object files). This process is followed recursively until the live set - the set of blobs of code and data - to be assembled into the final executable is determined.

Once all these blobs have been found, they can be copied into the final executable, and all the references to symbols can be filled in with appropriate offsets. There's details here: relative offsets may be baked in, but the locations of absolute offsets need to be marked in the final executable so that the operating system loader can adjust them if it needs to load the executable somewhere other than its preferred address.

The operating system loader is itself a linker. It, too, needs to follow a similar process, but instead of object files, it's dealing with shared objects (DLLs in Windows), and it's chasing down external references, loading things, and adjusting offsets as necessary (address space layout randomization (ASLR), load address conflicts (a lot more common in 32-bit address spaces)). The OS loader can also be lazy; instead of resolving a reference, it can fill it with a stub which is only resolved on first call. This delays linkage failures, but if the application tests for versions etc. before invoking, the developer gets the ergonomics of early bound linking with the features of late-bound linking that would otherwise need manual use of dlsym / GetProcAddress.

If you're interested in the topic, I highly recommend Linkers and Loaders by John Levine, long-time moderator of comp.compilers.

GC traverses graph, linker traverses graph. GC = linker. I disagree. Also if that's the description of a "smart linker", what is a "dumb linker" then?

Not just traversing a graph, but determining the location of blobs of data, and adjusting offsets inside those blobs of data to point at the relocated blobs of data. If your GC is simple, there's almost a 1:1 correspondence with every write to memory.

Smart linker relates to eliminating dead code. Depending on how symbols are packaged, or how well the compiler is integrated with the linker, there's a risk of including private (i.e. non-exported) code/data in the final executable which is never referenced but is simply co-located with code or data which is linked in. But it's not really a technical term, it's a marketing term from Delphi.

> But it's not really a technical term, it's a marketing term from Delphi.

Like 'optimising compiler' - are there really any non-niche compilers that do not optimise? Maybe some first-tier JITs.

Well, if the linker is very dumb, and just wholesale includes everything in an object file if there's any reference to it, the optimizing compiler can try and work around that by producing itty bitty object files, one for every symbol.

Linkers have more scope for smartness. The size of a block of code can be quite sensitive to the addressing mode (near vs far for e.g. 16-bit x86, relative offsets of various widths vs absolute etc.) used for each reference. An intelligent linker can adjust the code depending on where the target ends up, altering the addressing mode. The scope for optimization here depends on the ordering in the final executable, density of references and so on.

But doesn't this "smartness" require accurate information about internal symbol usage? For exmaple:

    static int foo(void) {
        return 4;

    int bar(void) {
        return 42 + foo();
The compiler, I believe, may emit no symbol entry for foo() and use a relative call inside bar() to call it, so if you only include "used" symbols in your resulting object file, bar() will end up calling into who knows where. That's why there are special flags for enabling "function-level linking" and all that jazz, right?

And this is where Delphi's smart linker is smart: it's always using function-level linking.

(I can't answer your question in detail because it's been a long time since I've been poking around in object files and I don't know what C/C++ compilers and linkers have been doing with them recently. I was a compiler engineer at Borland for more than 5 years, but it was years ago.)

The kind where the user has to chase dependencies.


> I felt enough rage about your comment

This isn't a normal response to a comment on the internet.

I thought LLVM's LD was pretty good, but I'll have to give this a read.

I don't think it's competing with lld as a general purpose linker. If you're interested in advances in those, I suggest looking at mold and the in-progress work in zig, which both seem to tackle link speed in different and interesting ways.

I always get tripped up with ld failing unless the object files are listed in the right order on the command line. I was curious if this simple linker would overcome this historic limitation, and was happy to see it does.

> Another downside is that you do have to arrange the object files in the command line in an order that makes sense for this linear approach. The first object file must be the entry point of the program. Different arrangements of object files may work for linking your executable but will produce an organizationally different executable. You may or may not care about this.

Good book on linkers (though a bit dated, and definitely over-priced - worth buying S/H): https://www.amazon.co.uk/Linkers-Kaufmann-Software-Engineeri...

One thing why I still like reading programming blog posts is because every now and then, I come across some way to implement things that I would never, ever could even think about. I understand how it works, it's basically equivalent to storing cross-references inside a text file like this:

    [chapter1]\s\e\e\ {chapter2}\ \f\o\r\ \m\o\r\e[chapter2]\g\o\ \b\a\c\k\ \t\o\ {chapter1}
no problem, it's just... I wouldn't even think to think about trying to store this information inline, and if pressed to, I would not think about quoting every single verbatim piece of data individually.

And arguably, it kinda makes more sense than trying to quote the { and [ instead: you've already lost the ability to just copy a blob from one file to another with a couple of places patched at known offsets, so you may as well simplify your life further.

Yeah I agree that the real insight here is firstly that three datatypes is enough to product a linker, and secondly that using this crazy serialization format makes life simpler.

It somehow hurts my brain to almost double your file size, but who cares? That's the computers problem, not yours.

As an iOS dev the hardest part for me to learn has definitely been how all the parts of the build chain work together: linking, symbolication, etc. Swift tries to be high-level but Xcode still relies on a bunch of low-level tools to build the app, and often you'll be diagnosing why `ld` failed for something. Pretty interesting stuff.

"can" can mean a lot

Doesn't he need to deal with relocation types? Otherwise you can't do a lot of interesting stuff like relative jumps, or loads computed from a base address. x86 has dozens of different relocation types.

Are you perhaps confusing it with SPARC? That one does have dozens of different relocation types because its instructions can have displacements of about 10 different widths.

As for interesting stuff... relative jumps are normally not used for jumping to a symbol defined in a different object, and base address computations are trivially supported by sticking a special symbol at the beginning of every object file.

x86-64 has 39 relocation types (not counting R_X86_64_NONE, which isn't really a relocation type, also not counting 3 deprecated relocation types).

"I wrote a linker everyone can understand" -> It doesn't work.

"I wrote a linker that works!" -> No-one can understand all of it.

I personally had no idea what a linker is, so I failed pretty quickly


Linkers are the lymph nodes of systems. Developers will wax rhapsodic about the virtues of Rust but know nothing about the (necessary) evils of linkers.

So the OP wrote a linker. But did they write one which understands linker script? Does it understand shared libraries, multiple architectures, multiple object file formats, ...? Is it 'smart'? Can it link itself? Does it support zero pages, ASLR, non-executable stacks+heaps? Is it fast?

Linkers are hard. Relocs and fixups are really hard. The Oracle® Solaris 11.4 Linkers and Libraries Guide is 500+ pages [1]. But if the development environment is done well, you won't have to worry about that, which is pretty cool.

[1] I always wanted a Linker Alien t-shirt.


> Using vim as the objectively correct editor.

We all know that scanned paper is the correct editor.

Paper, really? Rune stones still link us to our Viking ancestors, cuneiform tablets still link us to Gilgamesh, there is just no way paper can beat silicon (plus a few other elements).

But does that mean that silicon is the preferred editing environment, or merely the most reliable storage format?

Scanned paper can be mimicked with these newfangled tablets and pens. Your best bet for the one true editor is ed, the standard text editor!

I can't understand it.

You are not everyone, obviously.


The fact that the linker is written in D probably invalidates the claim that "everyone can understand it".

Can you elaborate? I find D quite readable, personally - and I've never written a line of D.

I agree, it was pretty clear. My one bit of confusion was the use of '~=' which seems to mean something like append.

On the "following the code in the logic" part, D makes it even easier for most than if it was in C or C++.

There is a style rule on the site that makes everything (including text) almost white. I am using Chrome 91 on Mac.

* { background: #fbfcff; }

Disabling that rule or using reader mode helps to read the article.

Are you sure you don't have an extension that tries to force dark mode or something?

Works fine for me, both in Firefox and Chromium.

I'm not an expert in any web technologies, so I'm interested in learning more about this. You quote this rule:

> * { background: #fbfcff; }

You also say:

> ... makes everything (including text) almost white.

Doesn't that rule just force the background to (nearly) white? Why do you say it forces everything including text to white?

There’s an issue with this comment that makes the text nearly the same colour as the background.

Even more annoying is that people downvote you making it unreadable so I have to upvote hopping I can read it.

This elitist UI thing of forcing deliberately blind schemes is trash

And then I get downvoted and can't read it, thanks anonymous loser!

Why would "everyone can understand this" be a valuable property for a specialist tool? I would much rather use tools that enable skilled professionals to do their best work than try to appease the layman.

It is useful educationally. People aren't born experts.

It's also a breath of fresh air in an industry whose practitioners can't distinguish technology from magic.

>It's also a breath of fresh air in an industry whose practitioners can't distinguish technology from magic.

And sadly often don't want to. My biggest pet peeve is helping someone figure out a bug, and when I see the problem and start explaining what's happening they give me some sort of "I don't want to know about all that, just tell me how to fix it!" response. If you want to spend my time to learn something so you can contribute to the project better - cool I'm in as along as it takes. If you just want to use my knowledge and time so you can work less - fuck off, I have my own deadlines for my own work.

>Why would "everyone can understand this" be a valuable property for a specialist tool? I would much rather use tools that enable skilled professionals to do their best work than try to appease the layman.

First, who said "everyone" here refers to the "layman"?

To me the odd question is why in a language with almost no popular use.. why JavaScript, fortran, basic or c?

How does this handle endian, word sizes, and dreaded alignment? Is packing enough?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact