Hacker News new | past | comments | ask | show | jobs | submit login
Everything You Never Wanted to Know About Linker Script (2021) (mcyoung.xyz)
118 points by thunderbong 7 days ago | hide | past | favorite | 27 comments





I tried to learn this arcane art and use it in order to do some special things. I failed miserably. Long story short, I wanted to append some null program header entries to the otherwise standard linker output so that my tool could easily target them for patching. Discussed this a bit on GNU mailing lists but this approach just didn't seem to be going anywhere.

https://sourceware.org/pipermail/binutils/2023-November/1305...

In the end, I just bypassed the linker script. I wrote a dedicated ELF patching tool that does what I want, and suggested a linker feature to let users add spare program headers via command line option. The maintainer of the mold linker quickly accepted the feature request. The maintainers of GNU ld apparently didn't. The maintainers of LLVM lld probably never even saw it.

https://sourceware.org/pipermail/binutils/2023-November/1306...

To this day I wonder if it's possible to do this via linker script. I wonder if the author of this article would know.


Honestly for your use case, what if you made the interpreter executable a ZIP archive instead? Then any zip archiver could add new files/programs.

Many people suggested that I do that. I believe it's the standard way to embed files into executables. Open the executable and read backwards from the end until the zip archive metadata is found.

My goal was to do it in such a way that the Linux kernel memory mapped the embedded file section before the process had even started running. I explicitly wanted to avoid doing things like opening /proc/self/exe and reading its contents into memory. My interpreter is freestanding and has zero dependencies. This solution fits the minimalist spirit of the thing.

I wrote an article detailing how it works:

https://www.matheusmoreira.com/articles/self-contained-lone-...

Long story short: Linux passes a pointer to the ELF program header table to the process via the auxiliary vector which means the program segments are already reachable from inside the program, and I took advantage of that by adding custom load segments and making the interpreter search for them at runtime.

This little mechanism allows embedding arbitrary data, including zip files. I made a little lisp S-expression file format which the interpreter simply parses but it could have been anything. Zip archivers should be able to edit the data in place, it should work fine provided the segments are updated with the new size of the zip archive.

The practical application of this is to enable the creation of a freestanding, self-executing application by making a copy of an existing interpreter and adding the desired code and data to it, even though the interpreter has already been compiled and linked:

  cp lone my-app
  lone-embed my-app my-app.ln
  ./my-app
The interpreter introspects, discovers there's an embedded segment inside itself and evaluates it. Unfortunately I had to write a tool to append those segments. Never figured out how to do it with linker script.

Plenty of assembly files do not use the C preprocessor. It turns out that assemblers have their own macro system.

Large projects like the Linux kernel, which are mostly C, use the preprocessor in .S files quite heavily.

Linux has hardly any .s files.

  ~/linux$ find . -name '*.s'
  ./arch/x86/kernel/asm-offsets.s
  ./kernel/bounds.s
  ./scripts/mod/devicetable-offsets.s
But there are plenty of .S files with #includes

  ~/linux$ $ find . -name '*.S' | xargs grep -l '#include' | wc -l
  1217
Edit: simplify command as suggested

FYI, `grep -l` will print just the matching filenames, no need for the sed/sort -u. `git grep` and `rg` also support this option. (I say this because I also used to overcomplicate this in my typical usage.)

Many such cases.

Folks might also find the book Advanced C and C++ Compiling by Milan Stevanovic useful here.

linkers are a hot spot for innovation in infrastructure now.

LTO (link time optimisation) is the hot topic for Apple’s new generation of linkers, enabling them to optimise across functions, etc.

All languages are trying FFI (foreign function interfaces) and need to bridge different calling conventions (mostly collapsing to the arch-dependent c convention).

Some AI Companies like Modular are combining pipeline services with optimisations at low levels. Python has legs because it can achieve targeted performance by lowering to c or GPU/TPU/NPU but it can hard to combine them.

But these are hard and mostly hidden challenges. Most developers and leads will top out at understanding how to recognise and manage link time errors and bugs.

(My hope is that inter-op being mostly intractable could drive languages to agree on something more principled than legacy c conventions, but it would take a large player to shepherd agreement, and they have little interest in inter-op)


> LTO (link time optimisation) is the hot topic for Apple’s new generation of linkers, enabling them to optimise across functions, etc.

I was using LTO in 2013 on apple machines... what exactly is new about it now?


I like the look and feel of this page. Can someone spot a hint to the generator/stack or maybe even the page source?

It's Jekyll with the Hyde theme. Source: https://github.com/mcy/mcy.github.io

Linking is a huge topic: subtle nuances on the concept of topological sort represent the difference between success and failure for a build system that constructs an interesting link line. This is for lack of a better term “undergraduate link theory”.

But the real sweat equity starts at dynamic linking on a quasi-platform with all the best qualities of an identity crisis like an M Night Shyamalan story. Don’t take my word for it, some combination of Thompson and Pike and generally what’s left of the Labs legacy is a much better source than I am [1].

glibc is an amazing software artifact and has nonetheless become a liability in the default case. Everyone knows Drepper is an OG force of nature like they don’t make anymore, but copying the Win32 playbook has starved out musl and other strictly better ways to live. The consensus is to pour billions of good money after bad into the coffers of the Docker Industrial Complex to get a static link they had on a PDP-11.

For the first time in decades there’s a sane take on Python courtesy of the Astral folks and they have a whole page explaining the minimum viable technical debt risk landmine needed to use Torch with a better strategy than hope [2].

But really it’s the Nix people who have failed in their duty here: nix-ld is an awesome piece of software and it’s not the author’s fault, but crippling it on purpose via whitelisting rather than blacklisting eligible .so when they’re all already at the same privilege level could possibly kill Nix in the era of extreme performance vector compute via accelerator as the cover charge on being part of the software business [3]. I’m rolling Ubuntu boxes and patching it up with Homebrew when I don’t have a week and a PhD to get Triton taking to the driver these days and god damn I hate mutable state guess abd check and pray on my science.

I hate to rant with no actionable proposal, but the action item here is a very big ask. Almost all people are reasonable if they don’t fear for what they regard as essential and existential. Vi/Emacs wars were old when I was young but the idea that anyone would prosper or perish IRL who took this stuff seriously couldn’t even be spelled in the alphabet at that time, and it’s not AI or offshoring, or any of that. Wildly unique mathematicians and engineers and scientists were grateful for what little they had and exceedingly cautious of standing next to a dissident in the USSR for decades when the stakes wer3 nuclear hellfire.

That’s what central committees of insiders produce. You can call it communism or a16z. Unaccountable insiders who answer to no one but each other walks and talks and quacks like Thiel-world whatever you call it.

[1] https://harmful.cat-v.org/software/dynamic-linking/

[2] https://docs.astral.sh/uv/guides/integration/pytorch/

[3] https://github.com/nix-community/nix-ld


not gonna lie, you had me in the first half.

How did you get from the topic of linking to communism & a16z!?


Google search is no longer up to the task of making it quick on a mobile phone to find this, but the first time I read it out well was on Steve Yegge’s OG blog. I’m paraphrasing here but he basically made the point that spending your life mastering software doesn’t leave much scope for horizontal career mobility, and that getting heavily wired into a given programming language or operating system or whatever makes it existential to see it succeed. If I talk shit about a certain programming language here it’ll mobbed by people who owe their paycheck to its adoption in under an hour, please don’t make me prove that.

I appreciate that I elided a lot of the conjectures on the way from “Drepper wants static glibc not to work” to “mafia capitalist RICO shit is the upstream cause”.

Scandal-ridden mafia capitalist shit is the upstream problem.


I mean, Unit 5 of Thiel’s startup 101 lectures at Stanford is titled “Competition is for Losers”. He leads off with making sure everyone is crystal clear that “attaining a monopoly” aka anti-trust fraud is his “idee fixé”.

This isn’t hosted on InfoWars. It’s hosted on YC’s YouTube channel.

It’s not a secret what these guys are doing and plan to do more of: https://youtu.be/3Fx5Q8xGU8k?si=I04EIvC-8GGV-LXf


For what it’s worth Gary, Khan’s FTC and Gensler’s SEC are looking pretty DOA in February, but a lot can happen before then and Google is under enough pressure that people are talking seriously about Chrome being broken off. The fine print about what constitutes a monopoly that damages consumer outcomes as narrowly construed bench legislating in Texas doesn’t matter to the optics.

Even if distancing YC from sama isn’t realistic on a dime, it wouldn’t hurt to step up to the same yard-line Nadella has on “we do business, we’re not married”.

And yanking the Thiel lectures and other “you will do nothing because you can do nothing” stuff from the official channel isn’t just ethically mandatory (IMHO), it’s also basic risk management.

You get to be a hothead on Twitter because you’re too big to fail, but I actually empathize because I lose my temper on a text sometimes too.

Any time doing the right thing is also +EV for the portfolio? That’s the move.


> it’ll mobbed by people who owe their paycheck to its adoption in under an hour, please don’t make me prove that.

I kneel.


HN fails to see the intersectionality between corpo power complex politics and marshalling LUTs in firmware space. Shame, but that's life. Rob's point is bang on albeit from a technical stance. This is also ultimately why Rob failed, but with a deep groaning sigh and not a bequeathed resignation (a true nerd is at the end of the day only known by their yawp). Power junkies will try to convince you otherwise because they benefit from it. But you have to live to believe it. The life of a kernel driver maintainer of something that a large corpo entity is financially vested in is one of a diplomat saint. Newly minted VP wants to leave a mark and be goody two shoes. Good luck. The news you hear on the internet are superficial because the map is not the territory.

I’ve known a couple of people to write in this style and I still like it! Antonio had a bit of this style in ~2009, Ribbon Farm’s contributors have been throwing stuff like this off for at least a decade now, and Nikhil Suresh (The Rightful Emperor of Mecatol Rex, natch) is maybe the most anti-biotic resistant evolved form of SARS-COV-FSCK-TEH-MAN to emerge recently.

Of course it could be a modern GPT with a context longer than my dick stuffed more full of better men than my ex-wife, so maybe I’m taking to a bot, but if it gets me a Jumbotron with Ana de Armas asking if I like real girls I’m still happy to talk to a machine.


This is a fairly high level overview of linkers, most of which you can find for yourself if you look up `ld` documentation [0] (mentioned in the article) and walk through it step by step in a tutorial fashion.

A more comprehensive look would be, at the very least one would hope, to have a reference manual of your favorite MCU and write the corresponding linker script for the memory sections it expects and provide stubs/thunks and other things of that nature. It's the praxis and application that brings the scary things out.

[0] https://sourceware.org/binutils/docs/ld.pdf (nb - after you've gone through the page count of some IC and MCU datasheets, 164 pages of ld will seem like a walk in the park)


I don't really find this argument convincing. You don't need to know how to build a car from first principles to drive one. If you're into it, sure, it can be fun and teach you plenty about its failure modes, the scary things as you put it, but most people really don't need nor want to go into that detail.

This is nowhere close, nowhere near close, not even within the same 100 mile radius close, of first principles.

This is literally starting from `man ld` and going one step further. The manual literally starts with a `man ld` printout and then goes to describe it from concepts onwards.

If you want first principles we can try to maybe start the discussion at maxwell's equations and take it a handful of hundred steps from there until we arrive at how logic gates, diodes and transistors work.


Except in reality 99% of people will just take the example (or auto-generated) linker scripts from the MCU manufacturer without understanding a thing about them and call it a day.

And all will work fine.


This worked fine for me for a little while, until I needed to do things that the manufacturer's example script didn't do, such as place certain globals into specific sections of memory. I had to piece the syntax of the linker file together myself, which wasn't too difficult, but a straightforward introduction to the file format would have been appreciated.

Implying that you can just read that documentation and "walk through it in a tutorial fashion" is a peak HN comment probably only rivaled by the famous Dropbox comment.

Not everyone is an expert in these things, and I'd appreciate if there were more articles like this.


This is what a lot of us have had to do early on because we had no other resources than first-hand manuals (mostly only printed; Hackers scene with books was painfully true), and maybe someone on a BBS or Usenet (later IRC) who could answer a question or two if they felt generous enough with their time to not heckle us for being dumb. If you're looking into `ld`, you've crossed a point of no return.

And the `ld` manual is great and there is nothing wrong with it.

The dropbox comment is people who are completely misguided as to the market value of a product that solves a real pain point for users and the recommended alternatives of using rsync was comically out of touch with the technical acumen of an everyday person. (see also the great "Less space than a nomad. Lame." comment from that other site)

Using `ld` is on the opposite swing of that, and that discussion is very much res de facto technical one with no punches held back. This is what the "Hacker" in "Hacker News" actually stands for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: