
LLD is included in the upcoming LLVM 4.0 release - AndyKelley
https://reviews.llvm.org/D29539
======
mioelnir
I'm just going to shamelessly quote the FreeBSD Quarterly Status Report,
Q4/2016 that came out a few days ago.

    
    
       LLD developers made significant progress over the last quarter. With
       changes committed to both LLD and FreeBSD we reached a major milestone:
       it is now possible to link the entire FreeBSD/amd64 base system (kernel
       and userland) with LLD.
    
       Now that the base system links with LLD, we have started investigating
       linking applications in the ports tree with LLD. Through this process
       we are identifying limitations or bugs in both LLD and a number of
       FreeBSD ports. With a few work-in-progress patches we can link
       approximately 95% of the ports collection with LLD on amd64.
    

That is 95% of ~27000 ports. Which means for very many things, lld as linker
will just work once those patches have been applied at the appropriate places.

[1]: [https://lists.freebsd.org/pipermail/freebsd-
announce/2017-Fe...](https://lists.freebsd.org/pipermail/freebsd-
announce/2017-February/001781.html)

------
AndyKelley
Posting this news article is my way to get the attention of Debian, Arch,
Gentoo, Homebrew, NixOS, and other package maintainers. As an upstream
compiler developer [1], I want to depend on LLD instead of the system linker,
and the sooner LLD becomes ubiquitous in the various package managers, the
sooner I can depend on it. (Of course I could ask my users to compile it from
source, but that increases overhead of people getting started.)

Further evidence that it is time for LLD to be distributed along with LLVM:
[http://lists.llvm.org/pipermail/llvm-
dev/2017-February/11030...](http://lists.llvm.org/pipermail/llvm-
dev/2017-February/110302.html)

[1]: [http://ziglang.org/](http://ziglang.org/)

~~~
umanwizard
Package maintainers aren't like the President. You don't need news media to
get their attention. You can just e-mail them.

~~~
AndyKelley
That sounds like an O(N) solution.

More seriously, being a package maintainer myself I know that the goal is to
serve the users and create packages that users expect. So, the best way to get
package maintainers to include LLD is to get users to expect it.

------
ktta
Link to page explaining LLD

[http://lld.llvm.org/NewLLD.html](http://lld.llvm.org/NewLLD.html)

~~~
haberman
Wow, this is claiming 1.2x to 2x faster than gold at 10% of the code size.
That is a remarkable achievement. Gold was already faster than GNU ld.

I am really surprised that there was this much room for optimization. Ian
Lance Taylor who wrote Gold is a really smart guy, and speed was one of the
primary goals.

~~~
jblow
Linkers are mind-bogglingly slow. I don't understand why they are so slow.

lld is still slow, it is just less slow than the other linkers.

This is not to disparage anyone working on linkers or say they are not smart.
I think they just don't tend to be performance-oriented programmers, and
culturally there has become some kind of ingrained acceptance of how much time
it is okay for a linker to take.

~~~
comex
Linking is also very easy to do incrementally, but for some reason incremental
linking is not popular in the Unix world. GNU ld and Apple ld64 can't do it.
GNU gold can do it, but only if you pass a special linker flag, which typical
build systems don't. LLD can't do it, despite being the spiffy new thing.

So you end up with big projects where most of the time taken by incremental
debug builds is spent linking - relinking the same object files to each other
over and over and over. Awful. I don't use Windows, but I hear Visual Studio
does the right thing and links debug builds incrementally by default. Wish the
rest of the world would catch on.

~~~
chisophugis
Every incremental linking technique I'm aware of involves overwriting the
output file and does not guarantee that identical input files and command line
lead to identical (bit-exact) output files.

Incremental linking is not so easy under that constraint, since the output
depends on the previous output file (which may not even be there).

(and considering the previous output file to be an "input file" follows the
letter of the requirement but not the spirit; the idea is that the program
invocation is a "pure function" of the inputs, which enables caching and
eliminates a source of unpredictable behavior)

We have had to reject certain parallelization strategies in LLD as well
because even though the result would always be a semantically identical
executable, it would not be bit-identical. See e.g. the discussions
surrounding parallel string merging:
[https://reviews.llvm.org/D27146](https://reviews.llvm.org/D27146) <\--
fastest technique, but non-deterministic output
[https://reviews.llvm.org/D27152](https://reviews.llvm.org/D27152) <\-- slower
but deterministic technique
[https://reviews.llvm.org/D27155](https://reviews.llvm.org/D27155) <\-- really
cool technique that relies on a linearly probed hash table (and sorting just
runs of full buckets instead of the entire array) to guarantee deterministic
output despite concurrent hash table insertion.

~~~
comex
As I said in a different reply, I think nondeterminism is an acceptable
sacrifice for development builds, which is where incremental linking would be
most useful. That said, it's definitely possible to get some speedup from
incrementality while keeping the output deterministic; you'd have to move
symbols around, which of course requires relocating everything that points to
them, but (with the help of a cache file that stores where the relocations
ended up in the output binary) this could probably be performed significantly
more quickly than re-reading all the .o files and doing name lookups. But
admittedly this would significantly reduce the benefit.

~~~
chisophugis
I agree. It's definitely possible. It's just that the actual benefit is far
from reducing link time to "O(changes in the input)" and it would introduce
significant complexity into the linker (and keeping LLD simple and easy to
follow is a high priority). It's definitely an open research area.

> That said, it's definitely possible to get some speedup from incrementality
> while keeping the output deterministic; you'd have to move symbols around,
> which of course requires relocating everything that points to them, but
> (with the help of a cache file that stores where the relocations ended up in
> the output binary) this could probably be performed significantly more
> quickly than re-reading all the .o files and doing name lookups. But
> admittedly this would significantly reduce the benefit.

Yeah. It's not clear if that would be better in practice than a conservative
padding scheme + a patching-based approach.

"move symbols around, which of course requires relocating everything that
points to them" sounds a lot like what the linker already spends most of its
time doing (in its fastest mode).

In its fastest mode, LLD actually spends most of its time memcpy'ing into the
output file and applying relocations. This happens after symbol resolution and
does not touch the input .o files except to read the data being copied into
the output file. The information needed for applying the relocations is read
with a bare minimum of pointer chasing (only 2 serially dependent cache misses
last I looked) and does not do any hash table lookup into the symbol table nor
does it look at any symbol name string.

~~~
comex
> It's just that the actual benefit is far from reducing link time to
> "O(changes in the input)"

Not sure exactly what you mean by this. If you give up determinism, it can be
O(changes) - except for time spent statting the input files which, at least in
theory, should be possible to avoid by getting the info from the build system
somehow. I can understand if LLD doesn't want to trade off determinism, but I
personally think it should :)

One practical problem I can think of is ensuring that the binary isn't still
running when the linker tries to overwrite bits of it. Windows denies file
writes in that case anyway… On Unix that's traditionally the job of ETXTBSY,
which I think Linux supports, but xnu doesn't. I guess it should be possible
to fake it with APFS snapshots.

> In its fastest mode, LLD actually spends most of its time memcpy'ing into
> the output file and applying relocations. This happens after symbol
> resolution and does not touch the input .o files except to read the data
> being copied into the output file.

Interesting. What is this mode? How does it work if it's not incremental and
it doesn't read the symbols at all?

~~~
chisophugis
> Not sure exactly what you mean by this. If you give up determinism, it can
> be O(changes) - except for time spent statting the input files which, at
> least in theory, should be possible to avoid by getting the info from the
> build system somehow. I can understand if LLD doesn't want to trade off
> determinism, but I personally think it should :)

Not quite. For example, a change in the symbols in a single object file can
cause different archive members to be fetched for archives later on the
command line. A link can be constructed where that would be O(all inputs)
changes due to a change in a single file.

Even though a practical link won't hit that pathological case, you still have
to do the appropriate checking to ensure that it doesn't happen, which is an
annoying transitive-closure/reachability type problem. ( If you need a
refresher on archive semantics see the description here:
[http://llvm.org/devmtg/2016-03/Presentations/EuroLLVM%202016...](http://llvm.org/devmtg/2016-03/Presentations/EuroLLVM%202016-%20New%20LLD%20linker%20for%20ELF.pdf)
Even with the ELF LLD using the windows link.exe archive semantics (which are
in practice compatible with traditional unix archive semantics), the problem
still remains. )

In practice, with the current archive semantics, any change to symbol
resolution would likely be best served by bailing out from an incremental link
in order to ensure correct output.

Note: some common things that one does during development actually do change
the symbol table. E.g. printf debugging is going to add calls to printf where
there were none. (and I think "better printf debugging" is one of the main use
cases for faster link times). Or if you use C++ streams, then while printf-
debugging you may have had `output_stream << "foo: " << foo << "\n"` where
`foo` is a string, but then if you change to also output `bar` which is an
int, you're still changing the symbol table of the object file (due to
different overloads).

> Interesting. What is this mode? How does it work if it's not incremental and
> it doesn't read the symbols at all?

Compared to the default, mostly it just skips string merging, which is what
the linker spends most of its time on otherwise for typical debug links (debug
info contains tons of identical strings; e.g. file names of common headers).
[1]

To clarify, there are two separate things:

\- the fastest mode, which is mostly about skipping string merging. It's just
like the default linking mode, it just skips some optional stuff that is
expensive.

\- the part of the linker profile that the linker spends most of its time
doing in its fastest mode (memcpy + relocate); for example, I've measured this
as 60% of the profile. This happens after symbol resolution and some
preprocessing of the relocations.

Sorry for any confusion.

[1] The linker has "-O<n>" flags ( _totally_ different from the "-O<n>" family
of flags passed to the compiler). Basically higher -O numbers (from -O0 to -O3
just like the compile, confusingly) cause the linker to do more "fancy stuff"
like string deduplication, string tail merging, and identical code folding.
Mostly these things just reduce binary size somewhat at a fairly significant
link time cost vs "just spit out a working binary".

------
AceJohnny2
This is not the "atom-based" LLD
[http://lld.llvm.org/AtomLLD.html](http://lld.llvm.org/AtomLLD.html)

While conventional linkers work at the compilation-unit level (one source
file, usually), placing that whole source file's functions adjacently in
memory [1], an atom-based linker is able to take the smallest linkable units
(individual functions, each static/global variable...), and arrange those
optimally.

As I recall, the OS X ld is based on this model. However it remains more
limited as it doesn't support GNU ld's linker scripts and only limited
command-line parameters, so it's not exposing all the power of the flexibility
it would provide.

As far as I know, AtomLLD remains an experimental project with only 1 or two
people working part-time on it.

[1] although modern linkers also add LTO (Link-Time Optimization) to rearrange
things after everything's been integrated/

~~~
floatboth
Heh, speaking of linking individual functions, GHC Haskell has a flag to emit
one object file per function and do that thing with a traditional linker.

It's horribly slow.

But it produces smaller binaries, I've got 2x smaller binaries on my projects
with that option.

~~~
ThisIs_MyName
Doesn't LTO produce even smaller/faster binaries anyway?

~~~
floatboth
I'm not sure what's the state of LTO in GHC…

By the way: using LLD helped _massively_ with linking all these small Haskell
function objects. It's really much faster than even gold!

------
smlacy
lld is the LLVM linker. More details here:

[https://lld.llvm.org/](https://lld.llvm.org/)

------
wyldfire
Of note is the fact that George Rimar, Dmitry Golovin et al are working on
getting Linux to boot linked by LLD. They've making a lot of progress -- when
he started it didn't even successfully link.

------
fithisux
Even though I don't care about usable Linux linker I care about a replacement
for Microsoft's linker for windows. LDC is an example where you need the
closed link.exe in order to link.

Hopefully all the llvm projects (lldb, libc++, ...) mature for windows in
order to have a replacement for msvc's tools. Until then (and beyond then) GNU
tools gcc + ld will be my go to tools if msvc compatibility is not a
constraint. They work perfectly well with Golang. But it happens people to
release msvc link libraries. Even dmd for windows-x64 needs microsoft linker
to work). Unilink is an option, but its closed source. And openwatcom-v2 will
take time to get there.

------
alvarop
"The ELF support is in progress and is able to link large programs such as
Clang or LLD itself. Unless your program depends on linker scripts, you can
expect it to be linkable with LLD."

:( The one feature I need most for embedded development

~~~
rui314
The document is outdated and needs updating. Linker script support has
improved a lot since 3.9, and that's why it can now link the entire FreeBSD
base system including kernel. It's still in active development (e.g. it can't
link Linux kernel yet) but it's not that bad now.

~~~
alvarop
Nice! I mostly want to use it on Cortex-M0/3/4 microcontrollers, which are
much less complex.

------
ThisIs_MyName
Woohoo, I've been using lld instead of GNU ld and gold for a while now and I'd
love to have everyone who builds my code do the same.

------
nneonneo
The LLD webpage touts a big speed improvement over gold, which is extremely
commendable.

What I'm curious about, though, is memory usage. On relatively constrained
systems, linking large projects can take ages due to swapping - anecdotally,
linking is responsible for upwards of 80% of the compilation times I see for a
certain very large software project on my developer machine (which, at 16GB
RAM, isn't huge, but fairly typical). Worse, memory pressure from linking
makes the process rather non-parallelizable, which also hurts throughput.
Having a memory-efficient linker could significantly speed up compilation
beyond just a 2x improvement in such environments.

~~~
rui314
I shouldn't be bad, although I didn't actually measure its heap usage.
Allocating memory and copying data is slow, so in LLD we mmap input object
files and use the files directly as much as possible. This increases process
virtual memory size (because large files are kept mapped to memory), but the
actual memory usage is much smaller than that.

------
nightcracker
This can not come too soon. Looking forward to it.

------
gigatexal
For the uninitiated what is LLD? What does it provide/ are it's benefits?

