
Inside the Linker (2012) - bear_child
https://opensource.apple.com/source/ld64/ld64-136/doc/design/linker.html
======
userbinator
Replace "atom" with "section" throughout this article, and you'll have the
same thing as what other linkers can do and have done for a very long time,
without the additional obfuscatory language:

[https://elinux.org/Function_sections](https://elinux.org/Function_sections)

[https://docs.microsoft.com/en-us/cpp/build/reference/opt-
opt...](https://docs.microsoft.com/en-us/cpp/build/reference/opt-
optimizations)

[http://www.drdobbs.com/cpp/the-most-underused-compiler-
switc...](http://www.drdobbs.com/cpp/the-most-underused-compiler-switches-
in/240166599?pgno=6)

~~~
CalChris
You'll need some more explanation for your claim. Second sentence:

    
    
      It is not "section" based like traditional linkers
      which mostly just interlace sections from multiple
      object files into the output file.

~~~
comex
As input to a linker, Mach-O and ELF files look pretty similar: they both
split their contents into named "sections", and then there's a symbol table,
which is basically a list of (name, address) pairs.

In C code, each symbol generally represents (the start of) a separate function
or variable. All references between them are explicitly marked as relocations;
thus, the linker should be free to reorder them, and any function/variable
that isn't explicitly referenced is unused and can be removed (unless you're
linking a shared library and it's meant to be publicly exported from that).

On the other hand, in assembly code, symbols are not necessarily independent.
You can have things like a function that "falls through" into into another
function. For example, here's a hypothetical assembly file that implements
both bzero() and memset(), and has the former fall through into the latter:

    
    
        // void bzero(void *s, size_t n)
        // (zeroes memory)
        _bzero:
            // Set up arguments for memset
            mov r2, r1 // 3rd argument to memset = 2nd argument to bzero
            mov r1, #0 // 2nd argument to memset is 0
            // (1st argument to memset = 1st argument to bzero, no move needed)
            // Fall through into memset
        // void *memset(void *b, int c, size_t len);
        _memset:
            ...memset implementation...
    

As for sections: traditionally, all code gets put into the same section (named
".text" for ELF, "__text" for Mach-O); all data gets put into the same section
(".data" / "__data"); etc.

The Darwin (Mach-O) linker is optimized for the properties of C code, and
implicitly treats the data between each symbol and the symbol following it as
a separate unit ("atom" or "subsection"). Or, more specifically, it does this
if the SUBSECTIONS_VIA_SYMBOLS flag is set in the Mach-O header, which is
always the case for object files compiled from C (as of 2005 or so). Thus, it
always has the ability to remove unused functions/variables, though it doesn't
actually bother to do so unless you pass -dead_strip.

ELF linkers are more traditional and treat each section as an indivisible
unit; symbols aren't taken into consideration at all. So, by default, it's not
possible to strip unused functions and variables, nor to reorder multiple
symbols that came from a single object file. However, you can pass
"-ffunction-sections -fdata-sections" to GCC (the compiler, not the linker) to
make it put every single function and variable, respectively, in its own
section in the .o file. For example, a function named "foo" would appear in a
section called ".text.foo". Then the linker will coalesce all the ".text.*"
sections back into a single ".text" output, and similarly for other types of
sections. But first it can strip unused sections (if you pass --gc-sections) –
which is equivalent to stripping unused functions/variables, since each
section contains only a single function/variable.

These are basically two different ways to accomplish the same thing, which
probably explains what userbinator said. Both approaches feel kind of hacky to
me. On the ELF side, object files with a bazillion sections are annoying to
look at (if you examine them with readelf or other tools), and putting
everything in its own section is not really how sections were originally
intended to work. On the Mach-O side, well, the symbol table wasn't originally
meant to to be used to split up the input data; in particular, unlike with
ELF, Mach-O symbols don't have a size field (which is why the atom implicitly
lasts until the next symbol). And it feels wrong that object files compiled
from assembly have to be treated differently from everything else (they don't
have SUBSECTIONS_VIA_SYMBOLS, unless you explicitly ask for it in the assembly
file).

Personally I prefer the Mach-O approach just because it requires fewer flags
to enable stripping of unused functions/data. Heck, I don't understand why
-dead_strip isn't enabled by default.

But if you were to design a new object file format from scratch, it could
probably handle this much more elegantly than either ELF or Mach-O.

~~~
johncolanduoni
> Heck, I don't understand why -dead_strip isn't enabled by default.

My guess would be to facilitate debugging, where you might want to invoke a
symbol that wouldn't otherwise be used at runtime (e.g. some sort of debug
print).

~~~
jacobush
Can it be also for something dynamically load a symbol and call it? Like for
.so making

~~~
userbinator
Yes, as someone with a Windows background I think this is one of the most
unusual aspects of the dynamic linking system on Unix-likes --- everything is
"exported" by default and the basic system has no concept of linking to
symbols from a specific module, it only looks at the symbol name itself.

In Windows you have to explicitly specify which symbols to export, and only
those exported ones can be imported.

~~~
klodolph
On Windows this has the consequence that if you have a DLL, it might use a
different malloc than your application. As a result, you can’t safely free
objects that were created in a DLL unless you go to all sorts of trouble. This
can mean passing objects back into a DLL to free them or using some other
mechanism to ensure that you use the same allocator everywhere. This is
especially problematic with C++ because of inlining.

The “everything is exported” default can be fixed with a flag, you then set
symbol visibility with attributes like __declspec or use a linker script. This
doesn’t change the calling convention, unlike with DLLs.

~~~
comex
What you said about malloc on Windows is true, but I wouldn’t blame it on
symbols not being exported by default – but rather on Windows’ choice to ship
N different C runtime libraries, one for each MSVC version, as separate DLLs.
Most applications and libraries link to one of those DLLs, and if two images
link to the same DLL, they do share C runtime state, including the malloc heap
[1]. But that’s only possible if they were compiled with the same MSVC
version. (There is also an option, -MT, to fully statically link the C
runtime, but it’s less commonly used.)

In contrast, both Linux and Darwin have only one C runtime library for the
system, in libc.so.6 (typically) or libSystem.dylib respectively, which
maintains backwards compatibility over a longer time period. Sometimes there’s
a need to make changes to libc that would normally be ABI-breaking, e.g. when
off_t was changed to be 64 bits to support >4GB file sizes on 32-bit
platforms. But to avoid breakage, special mechanisms are used to provide two
different versions of the same symbol within the same library, for each
affected symbol; existing binaries will use the old version, while programs
compiled and linked on a newer system will automatically use the new version.
(On Linux, a complicated mechanism called symbol versioning is used for this,
while Darwin just renames symbols with the asm() directive in header files.)

On Linux, it _is_ possible to statically link any of various libcs, but
programs that do that typically can’t use shared libraries * at all*, so the
issue of malloc across library boundaries doesn’t come up.

Oh, and for the record, apparently Windows 10 has a new “universal” CRT DLL
for new code that will be maintained in place going forward, but it’s still
separate from all the pre-existing versioned DLLs, and there’s a debug variant
that’s a separate DLL from the normal one and probably doesn’t share state
with it.

[1] [https://msdn.microsoft.com/en-
us/library/ms235460.aspx](https://msdn.microsoft.com/en-
us/library/ms235460.aspx)

~~~
userbinator
There is one CRT that has existed since Win95 and is still there in Win10,
it's called MSVCRT.DLL and getting versions of MSVC other than 6 to link to it
is possible and has been done (although not trivial). MS rather strongly
discourages this, but as evidenced by all the apps out there that do it and
continue to work, it's pretty much the only way to get a single small
dynamically-linked binary that'll run on every 32-bit version of Windows ever
released.

In contrast the "universal" CRT is not very universal at all, and a horrible
bloated mess. But certainly not unusual of MS...

------
ma2rten
I work on C++ code at a large tech company. My workflow is such that I compile
and run tests often after I made incremental changes. I feel like I often wait
for the linker for ages.

I am wondering if linking could be sped up. I feel like a lot of the steps
mentioned here could be cached for instance.

~~~
pjmlp
Depends what you use as C++ compiler.

Visual C++ supports incremental compilation and linking.

[https://blogs.msdn.microsoft.com/vcblog/2018/03/14/build-
tim...](https://blogs.msdn.microsoft.com/vcblog/2018/03/14/build-time-
improvement-recommendation-turn-off-map-use-pdbs/)

[https://blogs.msdn.microsoft.com/vcblog/2014/11/12/speeding-...](https://blogs.msdn.microsoft.com/vcblog/2014/11/12/speeding-
up-the-incremental-developer-build-scenario/)

[https://blogs.msdn.microsoft.com/vcblog/2018/01/04/visual-
st...](https://blogs.msdn.microsoft.com/vcblog/2018/01/04/visual-
studio-2017-throughput-improvements-and-advice/)

[https://blogs.msdn.microsoft.com/vcblog/2016/10/05/faster-c-...](https://blogs.msdn.microsoft.com/vcblog/2016/10/05/faster-
c-build-cycle-in-vs-15-with-debugfastlink/)

------
AceJohnny2
This is the default linker on macOS, LD64.

Its "atom"-based concept carried over into LLVM's experimental ATOM-based ld:
[https://lld.llvm.org/AtomLLD.html](https://lld.llvm.org/AtomLLD.html)

However as AndyKelley notes elsewhere in these comments, that variant of LLD
hasn't seen progress in years.

That said, that AtomLLD appears to be a side-project of LLD, LLVM's LD
project, which appears well-supported on Linux (ELF) and experimental on
Windows (PE/COFF).

I wonder what Apple's plans are.

------
User23
This is really beautiful HTML

~~~
ljcn
Indeed, retro. Not quite valid though unfortunately. Apart from the annoying
character encoding, doctype, etc. problems, there's a </p> without a <p>, and
a <ul> missing a preceding <li>.

------
gok
Needs a (2012), this was written with the Xcode 4.4 release.

~~~
comex
And these days ld64 is on the way toward being obsoleted by LLVM’s lld,
although the latter still uses the same basic model.

~~~
AndyKelley
I wish. In reality the Mach-O LLD code is unmaintained and can not compile a
simple hello world program without asserting.
[https://bugs.llvm.org/show_bug.cgi?id=32254](https://bugs.llvm.org/show_bug.cgi?id=32254)

