
Beginner's Guide to Linkers (2010) - antognini
http://www.lurklurk.org/linkers/linkers.html
======
briansteffens
One thing I found really interesting while playing around with gdb on a
program with dynamic linking is that functions are linked lazily when they're
first called. So rather than hooking up calls to printf/fopen when the program
starts, those calls instead get hooked up through an intermediate table to
jump into the dynamic linker's resolution code. The first time each
dynamically-linked function is called, it goes through a process of looking up
the symbol. Once it finds the actual address of the called function, it writes
a jump to that resolved address over the spot in the intermediate table so
subsequent calls won't repeat the lookup logic.

Pretty cool to watch it happen. The call lookup code in the dynamic linker
involves a bunch of strcmp calls, which makes sense, but I still found
surprising for some reason.

You can set the environment variable LD_BIND_NOW to make it do all these
lookups at startup time.

~~~
bogomipz
Indeed. You are referring to the PLT and the GOT, the procedure linkage table
and global offset table. This is basically how PIC code works. I have seen it
referred to in literature as a "thunk" or a trampoline as well. In case anyone
is curious. This is a good post on it:

[http://eli.thegreenplace.net/2011/11/03/position-
independent...](http://eli.thegreenplace.net/2011/11/03/position-independent-
code-pic-in-shared-libraries/)

~~~
throwaway91111
Yup, a thunk is a good term. It's also arguably the main primitive of
Haskell—the entire program being a tree of thunks that continually allocate
and evaluate.

------
CalChris
There was a comment more than ten years back on _gcc-help_ by Ian Lance
Taylor, author of gold and the gnu linkers:

    
    
      Passing commands directly to the linker requires
      that you know what you are doing.
    

I added this to my makefiles. It doesn't make the linker work any better. But
it does make me think a little harder about what the linker is doing with my
incantations.

Linker scripts [1] are a particular pain to get write and if you want to do
something complicated on Linux or BSD, you'll be using linker scripts. These
are a gnu linker thing. The OSX linker doesn't have the same. Indeed, linkers
are different or duplicated on each platform.

    
    
      gnu ld
      osx ld
      gold ld (Google)
      lld (llvm)
      whatever Windows uses because I don't know
      ...
    

I'm not embarrassed to admit this: I get my linker command lines and scripts
working and that's about it. But linkers should be easier to understand and
use. There is even a security bug that relies on our not understanding them
[2].

I can't sugar coat this: _linkers are a pain_. Taylor is right when he says
_Passing commands directly to the linker requires that you know what you are
doing._

[1]
[https://sourceware.org/binutils/docs/ld/Scripts.html](https://sourceware.org/binutils/docs/ld/Scripts.html)

[2]
[http://www.cse.psu.edu/~trj1/papers/ndss17.pdf](http://www.cse.psu.edu/~trj1/papers/ndss17.pdf)

------
dugmartin
One of the most fun work projects I had early in my career was writing a
linker for PE files for an embedded PLC. Another team was writing some gcc
compiler extensions and an emulator for the PLC that ran on Window's PCs with
all the emulator code stored in DLLs.

The idea was pretty cool - you could write and test ladder logic in a Window's
GUI and then click a button to generate an .exe file and upload it to the PLC.
The PLC's RTOS would then call my linker code which would link up the symbol
tables for the DLLs with the embedded code in the PLC and do some other minor
relocations. By putting all the ladder logic code in the loaded .exe they were
able to optimize for size/speed (using gcc and heuristics of the ladder logic
diagram) instead of running a ladder logic interpreter in the PLC.

If you ever get a change to play around with a linker I'd suggest you do. It's
one of those "aha" moments in CS about the userland/OS boundary.

------
pjmlp
The "Turbo Pascal Compiler Internals", is also a nice source for it, for an
alternative view not C based.

Beware that the content is not related to the original Turbo Pascal.

[http://turbopascal.org/linker](http://turbopascal.org/linker)

Or Oberon's dynamic linker based on strong typed packages, chapter 6 on
[http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.Syste...](http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf)

~~~
carussell
Fair warning for anyone about to spend much time looking into Wirth's personal
material: much of it is unfortunately neither comprehensive nor up to date. If
you start to dig in to the Oberon code, you'll find (non-trivial) gaps in the
documentation and stuff that's downright untrue.

------
gbrown_
Was going to throw the obligatory mention of John Levine's Linkers and Loaders
but I see the author lists it as an additional reference along with some other
interesting nuggets.

------
TokenDiversity
Can someone explain to me what the addresses mean in the output of ld? Each
process would load them in their own randomized address space, no?

