The section on relaxation should be triggering any software engineer's "overcomplicated" alarm. Forget static vs dynamic linking; it is insane that in 2022 we are using linkers at all. They are an ugly hack to make a kind of incremental compilation work for C and related languages.
Relaxation basically means having the linker redo a bunch of work the assembler otherwise would have done. Having them separate is then counterproductive.
Being able to link pre-compiled object files to a new program is certainly a bit more than "a kind of incremental compilation", in my opinion. It's also a fairly clean solution to the problem of not recompiling code that hasn't changed.
The problem is of course that it's kind of a fairy tale. The compiler (or assembler) can generate object code, but it needs placeholders for function calls obviously. The problem is, depending on where exactly the called function ends up, the function call might be more or fewer instructions long. That means, depending on what exact kind of call sequence is required, all the other code in the function has to be shifted around, since all jump targets shift around. Even worse, the length of the instruction sequence necessary for a relative jump depends on how far the jump is, so all intra-function jumps may be longer or shorter based on which instruction sequence is required for the inter-function jumps.
Linking sounds like a clean solution, but it really, really isn't. It's messy as all hell.
I think maybe, if I was to design a solution from scratch, I'd make object files basically a binary encoding of assembly code rather than true machine code; notably, I would keep jumps as abstract "jump to this label" instructions rather than proper machine code. This would put the assembler in the linker, where we actually have enough information to do instruction selection, rather than in the assembler where we don't.
We could also just put straight-up compiler IR in the object files, which puts a whole compiler back-end in the linker. This opens the door for LTO. Obviously, there are performance implications of this; if most of the hard work happens in the linker, linking can't be a stupid single-threaded serial operation anymore. But we could just make multi-threaded linkers and we get back our parallelism. We do lose some of the incrementalness of incremental compilation though.
> The problem is, depending on where exactly the called function ends up, the function call might be more or fewer instructions long. That means, depending on what exact kind of call sequence is required, all the other code in the function has to be shifted around, since all jump targets shift around.
Linkers actually don't change instruction lengths. If they need to jump further away than what the instruction allows, they add trampolines, which are chunks of code with the appropriate instruction for the long jump, and place the trampoline at a good distance of the relocatable short jump.
Edit: well, except in the proposed case of relaxation, where they'd want to shorten jump instructions.
My understanding is that ThinLTO (and other existing forms of LTO) does put IR in the object file and a compiler back-end in the linker, yes. But as far as I understand, the linker relaxations mentioned in the linked article are about when you emit traditional object files with machine code + relocation tables, and that's what I described as messy.
Depends on the definition of "linking". Having multiple source files each compiled to LLVM bitcode and not re-doing this if another file changes is fine. Turning the bitcode to native code is where traditionally the compiler back-end, assembler and linker are involved, and these are steps you want to re-do to enable global optimization (e.g. inlining functions from another source file, followed by dead code elimitation -- these two steps give you no-cost "runtime" parameter checks, for example). The traditional 3-step split is just becoming cumbersome there, and linker relaxation is treating that symptom.
This seems like a radical position. On the web, Webpack is a sort of linker with affordances for streaming late-loaded code segments over the network, and it's indispensable. I have trouble seeing how doing away with linkers altogether would be feasible or desirable.
The related argument there is relocations. Tables of data the assembler emits so the linker can recreate enough context to combine object files.
The assembler is often spliced into the back end of the compiler to avoid the round trip through text on the way to machine code.
My conviction is that lowering to machine code in pieces then stitching them together is no longer necessary or advisable. Emit files containing some richer format - most obviously a compiler IR - and combine those before lowering the single blob to machine code.
My conviction is that lowering to machine code in pieces then stitching them together is no longer necessary or advisable. Emit files containing some richer format - most obviously a compiler IR - and combine those before lowering the single blob to machine code.
I am not a compiler expert but this notion that there are optimisations to be had in the more expressive higher-level compiler IR and then delaying the emitting of machine code rings true. I would be grateful were anyone able to direct me to some papers etc on the subject.
A compiler is a pipeline with different representations between each stage. Each representation makes different trade offs, e.g. source tries to be human readable. Translation between them is usually lossy, and that's sometimes a feature - discarding information later stages don't need makes them faster.
Relocations are instructions executed by the linker (or loader). Write the address of this symbol to this offset from the start of the data section, stuff like that. Relaxations cover things like a branch target turned out to be closer than it might have been so change the instruction sequence to a smaller/faster one.
So if you don't eagerly convert to machine code, you don't need the link time relocation data, so the assembler and linker don't need the code to deal with it.
Of course if you stay in a friendly IR you can also do optimisations easily at link time, so most articles on this approach call it 'link time optimisation'.
Also, see "Global Register Allocation at Link Time" by David W. Wall. An amazing paper, but you can't help to notice that it would have been a much, much easier if he didn't insist on producing valid object files that could be linked together into a working program by a "standard" linker; but he does and so has to resort to linker relaxation too.
Just curious, how would a high-level IR processor knows beforehand that a certain piece of code was longer/shorter than 4kB/1MB, if it do not try to compile it in pieces first.
(Not the GP, but) I use GCC for nested functions. Also, for not being an obvious ploy by Apple etc. to replace a GPL project with a permissively licensed one.
Actually both of them have moved away from their clang involvement, hence why clang has slowed down in ISO C++ compliance, as compiler vendors using clang forks don't seem that keen in helping upstream.
If only it stayed permissive, it wouldn’t be a problem. But hardware companies making proprietary compilers based on LLVM is a problem if the users of the hardware then have no free compiler available. If there was no LLVM to fork and take proprietary, the users would have free GCC available to them, since the hardware company would have used GCC, which requires the company to release the hardware support. They would release it, since it does not affect their primary source of income, which is hardware. But if LLVM is available, they will get dollar signs in their eyes, and have visions of selling expensive proprietary compilers to a captive market (i.e. owners of their hardware).
> But if LLVM is available, they will get dollar signs in their eyes, and have visions of selling expensive proprietary compilers to a captive market (i.e. owners of their hardware).
This is literally not what is happening in the real world though. Counter example: NEC SX Aurora (literally developed in the open https://github.com/sx-aurora-dev/llvm-project), Fujitsu A64FX (free LLVM based compiler, $$$$ proprietary home grown compiler), AMD with hipSYCL. The reason for this if there was no LLVM available, the compiler would be totally proprietary. This from scratch compiler would have been more expensive to make leading it to not being available for free but instead licensed at huge cost. There has been no uptake of GCC for this role since it's inception. We both know why.
And even if a $$$$ using LLVM is made i would prefer that over a fully custom compiler. I can link against it like a LLVM compiler and you can learn a lot about a unknown platform from how LLVM compiles for it, making reverse engineering a lot easier compared to a proprietary compiler.
> This is literally not what is happening in the real world though.
Give it time. You know what happened with GCC on the NeXT computer; the only reason the NeXT compiler became free is because NeXT had to release it, since they based it on GCC.
> There has been no uptake of GCC for this role since it's inception. We both know why.
We do not. I was under the impression that the GCC stance on modularity has mellowed in recent times.
> using LLVM is made i would prefer that over a fully custom compiler.
Relaxation basically means having the linker redo a bunch of work the assembler otherwise would have done. Having them separate is then counterproductive.