mold seems very cool! The design notes [1] are fabulous, exactly the kind of documentation I look for for stuff like this. This part, in particular, is genius:
> As we aim to the 1-second goal for Chromium, every millisecond counts. We can't ignore the latency of process exit. If we mmap a lot of files, _exit(2) is not instantaneous but takes a few hundred milliseconds because the kernel has to clean up a lot of resources. As a workaround, we should organize the linker command as two processes; the first process forks the second process, and the second process does the actual work. As soon as the second process writes a result file to a filesystem, it notifies the first process, and the first process exits. The second process can take time to exit, because it is not an interactive process.
Never heard about this trick before, but it's the rare combination of "genius" and "obvious in retrospect".
I would imagine it is just because it would require more complexity in the kernel. One would need to create a queue of tasks to be performed during cleanup and have the path through the exit system call know to push work onto that queue. It sounds like there is an opportunity for someone to submit a diff to the Linux kernel for improving exit userspace latency.
I’m guessing a linker like this is also a VERY extreme case when it comes to mmap’ing a huge number of files. For most software, it’s just not that big of an issue.
It's a drop-in replacement, but you still have to tell the build system to have it use an alternative linker instead of the default one, unless you replace /usr/bin/ld with mold entirely.
But off the top -
No lto support, missing some flags that we use when building drivers, valgrind (cachegrind etc) fail to load symbols, some linker stuff is case senstive etc.
Can also add it to PATH, instead of hardcoding it like op did here.
But it's well worth it in some projects - In my case it shaved over a minute from incremental builds, resulting in sub 10 second builds!
(Alsp note you can cache indexes in gdb by default)
Mold is excellent. I would like to point out that if you (or your employer) would like to support the work, then you can sponsor Rui on Github https://github.com/rui314
From mold I learned this amazing trick of intercepting the exec() family through LD_PRELOAD. I was experimenting with it in the Spack package manager to force a uniform compiler and linker, and let it inject flags. Useful since packages sometimes override CC/CXX variables or don't propagate -fuse-ld=... (and that flag is not always supported).
Perhaps this is a ridiculous question but is it possible to implement a language twice: both as compiled and as interpreted or JITted, the latter giving you a quicker feedback loop at the cost of runtime performance, size, etc. ?
GHC (with GHCI) gives you this for Haskell. After moving from Haskell to C++ work, I really miss the interactive feedback loop (among many other things).
I'm woefully rusty on my compiler knowledge. Would the direction matter? That is, would it be easier or harder to build an interpreter for C or Rust versus compiling Python or JS to machine code? Or would it all be the same?
Start a REPL and you can type code in that is immediately executed, and has no side-effects in terms of classfiles being generated or otherwise. How is that not interpreted?
Generating class files is an optional compilation mode in Clojure. You can also load bytecode and define classes without involving files on the JVM, which is what eval (used by the repl) does
Good question! I can't answer, unfortunately. v13 and v14 surely work, but I think 12 will work as well (by that time, I was using lld, with mold is compatible).
I don't completely understand how LTO works - is LTO 100% done in compiler plugins? Or is some of it done in ld/gold/lld which mold doesn't have implemented yet?
With LTO when you "compile" a source file it really only processes it into an intermediate language (IL) format and stores that in the object (.o) files. When you invoke the linker on those object files it runs the compiler again with all the IL from the inputs and generates a temporary object file with the actual executable object code, which is then linked into the final executable binary.
I would guess that mold doesn't have the necessary support for the second step, running the compiler again on the IL during linking.
It does have support for linker plugins now (I was following the github issue somewhat), I just wasn't sure whether I shouldn't bother testing it to see if I can use it without performance regressions because I didn't understand how it works.
I keep seeing people trying to use different linkers than ld with gcc but I was under the impression that it doesn't work as well as they think it does. GCC is built with a bunch of options that assume ld and it's very unlikely that these other linkers support all of that. So I'm not sure what benefits you're actually getting.
Gnu Gold, llvm’s LLD, and now Mold all work just fine with GCC for mainstream Linux development for mainstream targets, and reasonably well for secondary targets.
Dozens of companies use them. Google has an extremely demanding linking environment and developed and used Gold for a long time for it’s primary linker. It has since moved on to LLD. It is unlikely to adopt mold at the moment due to licensing issues.
All the compatibility issues were solved a long time ago with LLD and Gold. Probably only a matter of time for Mold.
The more obscure the target OS and architecture, the more likely gnu ld is the only linker capable of doing the job. But most users won’t notice. Gnu ld has the best linker script support by a wide margin too, so if you are doing really really wacky things with your layout where you need 100% total control, you have to use gnu ld also.
But those are obscure edge cases these days, and very uncommon in normal practice.
> As we aim to the 1-second goal for Chromium, every millisecond counts. We can't ignore the latency of process exit. If we mmap a lot of files, _exit(2) is not instantaneous but takes a few hundred milliseconds because the kernel has to clean up a lot of resources. As a workaround, we should organize the linker command as two processes; the first process forks the second process, and the second process does the actual work. As soon as the second process writes a result file to a filesystem, it notifies the first process, and the first process exits. The second process can take time to exit, because it is not an interactive process.
Never heard about this trick before, but it's the rare combination of "genius" and "obvious in retrospect".
[1]: https://github.com/rui314/mold/blob/main/docs/design.md