Hacker News new | past | comments | ask | show | jobs | submit login
Using the mold linker for fun and 3x-8x link time speedups (productive-cpp.com)
123 points by ingve on June 3, 2022 | hide | past | favorite | 41 comments



mold seems very cool! The design notes [1] are fabulous, exactly the kind of documentation I look for for stuff like this. This part, in particular, is genius:

> As we aim to the 1-second goal for Chromium, every millisecond counts. We can't ignore the latency of process exit. If we mmap a lot of files, _exit(2) is not instantaneous but takes a few hundred milliseconds because the kernel has to clean up a lot of resources. As a workaround, we should organize the linker command as two processes; the first process forks the second process, and the second process does the actual work. As soon as the second process writes a result file to a filesystem, it notifies the first process, and the first process exits. The second process can take time to exit, because it is not an interactive process.

Never heard about this trick before, but it's the rare combination of "genius" and "obvious in retrospect".

[1]: https://github.com/rui314/mold/blob/main/docs/design.md


I wonder why the cleaning up of those mmaps is synchronous


I would imagine it is just because it would require more complexity in the kernel. One would need to create a queue of tasks to be performed during cleanup and have the path through the exit system call know to push work onto that queue. It sounds like there is an opportunity for someone to submit a diff to the Linux kernel for improving exit userspace latency.


I’m guessing a linker like this is also a VERY extreme case when it comes to mmap’ing a huge number of files. For most software, it’s just not that big of an issue.


But given how universally the linker is required, it might be worth looking into optimizing.


Using all cores to link is a very welcome change to the linker world.

I wish blender folks would try to switch to mold, linking Blender is the singler biggest bottleneck to dev speed when writing Blender code.


It looks like you can already use mold to build blender.

https://github.com/blender/blender/commit/8b3d798374a2c6b502...


Is it not a drop-in replacement?


It's a drop-in replacement, but you still have to tell the build system to have it use an alternative linker instead of the default one, unless you replace /usr/bin/ld with mold entirely.


Rui, thank you! mold is a life saver! My compile debug cycle improved a lot since I started using it!

(I tend to prefix PATH with the folder I build it into for the specific project I need it for)


It got to 1.0 recently and still is in dev.

But off the top - No lto support, missing some flags that we use when building drivers, valgrind (cachegrind etc) fail to load symbols, some linker stuff is case senstive etc.

Can also add it to PATH, instead of hardcoding it like op did here.

But it's well worth it in some projects - In my case it shaved over a minute from incremental builds, resulting in sub 10 second builds!

(Alsp note you can cache indexes in gdb by default)


It's improving pretty quickly. Now it got LTO support, and I also believe that Valgrind issue has been resolved.


Oh great! I'll fetch and try a new build, Thank you! :)


Mold is excellent. I would like to point out that if you (or your employer) would like to support the work, then you can sponsor Rui on Github https://github.com/rui314


Err, isn't he getting a top tier salary from Google?

What the fresh hell is this, giving him more money. You should be contributing to Zig or serenityOS


I left Google two years ago and have been working on my open source project full time with no salary.


From mold I learned this amazing trick of intercepting the exec() family through LD_PRELOAD. I was experimenting with it in the Spack package manager to force a uniform compiler and linker, and let it inject flags. Useful since packages sometimes override CC/CXX variables or don't propagate -fuse-ld=... (and that flag is not always supported).


Perhaps this is a ridiculous question but is it possible to implement a language twice: both as compiled and as interpreted or JITted, the latter giving you a quicker feedback loop at the cost of runtime performance, size, etc. ?


GHC (with GHCI) gives you this for Haskell. After moving from Haskell to C++ work, I really miss the interactive feedback loop (among many other things).


Have you tried https://root.cern/cling/ ? It works great in a jupyter notebook for quick prototyping of C++.


If you do a good compiler you can get this for free almost trivially.

The important thing is to have people using both so you make sure the jit path can be dumb enough to be fast.


I'm woefully rusty on my compiler knowledge. Would the direction matter? That is, would it be easier or harder to build an interpreter for C or Rust versus compiling Python or JS to machine code? Or would it all be the same?


Of course it is! These guys made an interpreter and JIT for C: https://developers.redhat.com/blog/2021/04/27/the-mir-c-inte...


Yes. See Dart as an example of a language that has done this in practice.


Clojure is also basically interpreted but can also be compiled ahead of time for reasons including performance.


I believe Clojure is always compiled.


Start a REPL and you can type code in that is immediately executed, and has no side-effects in terms of classfiles being generated or otherwise. How is that not interpreted?


Generating class files is an optional compilation mode in Clojure. You can also load bytecode and define classes without involving files on the JVM, which is what eval (used by the repl) does

https://clojure.org/reference/compilation

(also see the eval definition at https://github.com/clojure/clojure/blob/master/src/clj/cloju... - it calls into the compiler).

The JVM could still just interpret the JVM bytecode instead of JIT compiling first of course.

Python is also technically compiled like this


Clojure uses a custom classloader to compile and load bytecode at runtime.


I wonder if it would be a good idea for Rust to switch to mold for debug compilation.


For reference, it's trivial to configure it on one's projects:

    mkdir -p .cargo
    cat > .cargo/config.toml <<EOF
    [target.x86_64-unknown-linux-gnu]
    linker = "/usr/bin/clang"
    rustflags = ["-Clink-arg=-fuse-ld=/usr/bin/mold"]
    EOF
(the above assumes a Linux target, and that the binaries are under `/usr/bin`)

It's certainly true that the default linker is sloooooooow; I wonder which are the challenges in making mold/lld the default.


This can also be set user-wide (and thus apply to all projects automatically) by placing the `config.toml` into `~/.cargo`.


Which version of clang should i have for this?


Good question! I can't answer, unfortunately. v13 and v14 surely work, but I think 12 will work as well (by that time, I was using lld, with mold is compatible).


For more on switching Rust to mold, see https://github.com/rust-lang/rust/issues/94347


I don't completely understand how LTO works - is LTO 100% done in compiler plugins? Or is some of it done in ld/gold/lld which mold doesn't have implemented yet?


With LTO when you "compile" a source file it really only processes it into an intermediate language (IL) format and stores that in the object (.o) files. When you invoke the linker on those object files it runs the compiler again with all the IL from the inputs and generates a temporary object file with the actual executable object code, which is then linked into the final executable binary.

I would guess that mold doesn't have the necessary support for the second step, running the compiler again on the IL during linking.


Thank you for the info.

It does have support for linker plugins now (I was following the github issue somewhat), I just wasn't sure whether I shouldn't bother testing it to see if I can use it without performance regressions because I didn't understand how it works.

Might give it a shot one of these days.


I keep seeing people trying to use different linkers than ld with gcc but I was under the impression that it doesn't work as well as they think it does. GCC is built with a bunch of options that assume ld and it's very unlikely that these other linkers support all of that. So I'm not sure what benefits you're actually getting.


Gnu Gold, llvm’s LLD, and now Mold all work just fine with GCC for mainstream Linux development for mainstream targets, and reasonably well for secondary targets.

Dozens of companies use them. Google has an extremely demanding linking environment and developed and used Gold for a long time for it’s primary linker. It has since moved on to LLD. It is unlikely to adopt mold at the moment due to licensing issues.

All the compatibility issues were solved a long time ago with LLD and Gold. Probably only a matter of time for Mold.

The more obscure the target OS and architecture, the more likely gnu ld is the only linker capable of doing the job. But most users won’t notice. Gnu ld has the best linker script support by a wide margin too, so if you are doing really really wacky things with your layout where you need 100% total control, you have to use gnu ld also.

But those are obscure edge cases these days, and very uncommon in normal practice.


Mold is a Big Freakin' Deal.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: