LLVM 3.9 Release Notes

exDM69 · on Sept 2, 2016

Out of these, I found the ThinLTO [0] most interesting. It's an effort to solve issues with long build times and high memory consumption associated with link time optimization. Early benchmarks show perf results that are almost on par with full LTO and compile times near plain old -O2.

LTO is the compiler feature I'm waiting to become mainstream. Without it you have to make compromises with translation units (put more stuff in headers than you'd want to) to get the performance you want.

[0] http://blog.llvm.org/2016/06/thinlto-scalable-and-incrementa...

vvanders · on Sept 2, 2016

Oh man LTCG on MSVC was brutal on build times. I remember we had to use it extensively in games since we'd usually be within 3-5mb of system limits on a release console.

If LTO is similar that sounds like a massive win and something to look forward to.

[edit] Looks like LTO is more about performance than perf+binary size. Still sounds like awesome stuff.

DannyBee · on Sept 2, 2016

Just to clarify: LTO is what most other compilers (including LLVM) call what LTCG is in MSVC. ThinLTO is a very different LTO model than "standard LTO". The ThinLTO slides go over the various LTO models that exist.

Previously, before Google built ThinLTO, we built LIPO (and WHOPR) for GCC, https://gcc.gnu.org/wiki/LightweightIpo Google isn't going to use GCC as it's compiler anymore, and in fact, has mostly moved off it. We needed to replace LIPO, so we built ThinLTO.

The problem with LTO models in general (as the thinlto slides go over) is how to make it scalable. Part of that is memory, part of it is parallelization, part of it is optimization/analysis speed.

THe memory issues are about how memory efficient the IR is and how demand driven IPO is. IE can your whole-program analysis go function by function, without it taking a lot of time and memory to import individual functions (and throw them away if you need to).

Most LTO is slow because it tries to keep large parts of the program in memory at one time, and has no good way of being as lazy as possible about pieces of the program.

The parallelization issue is "what is being serialized to make LTO work". Some compilers parallelize the backend code generation, some parallelize nothing. There is often still a lot of serial stuff (global analysis, etc).

The CPU time issue is how fast you perform analysis and can generate code.

ThinLTO is meant to allow you to parallelize basically every step, import exactly the parts of the program you want at a given time in a quick and memory efficient way, etc.

vvanders · on Sept 2, 2016

Thanks for the clarification and detailed breakdown. Makes a lot of sense.

Back when we'd use LTCG it was a ~30 minute link time penalty which prohibited it from all but the final release builds. Seeing this stuff move forward to be more effective to use is a great improvement.

AceJohnny2 · on Sept 2, 2016

> Clang can now self-host itself using LLD on AArch64.

So I've been having some "fun" with LLVM targeting AArch64, and was rather, er, disappointed when I discovered the toolchain didn't use GNU ld, but Mach's ld64.

Now in theory, ld64 is a next-gen linker that can link code not at the "module" (one compiled C file) level, but at the individual function and (global) variable level. Cool! More granularity! Except it doesn't support anything equivalent to GNU ld's extremely useful linker scripts, the stuff that says "put this code in SRAM, that data in Flash. etc..." and its command-line is... poorly documented. Also, it only works with Mach-O binaries, which is incompatible with objdump, that swiss-army knife of object code [1].

And anyhow, what does LLVM refer to as LLD? Because there are actually two separate projects under that umbrella: The ELF/COFF one (http://lld.llvm.org/NewLLD.html) and the ATOM-based one (http://lld.llvm.org/AtomLLD.html)

As I understand, the ATOM-based linker is still experimental, so they're referring to the ELF/COFF one?

TL;DR: LLVM doesn't hold a candle to binutils

[1] I recommend trying out using objdump to turn your app data into object data you can link directly into your application. This is what modern toolchains call "resources", but it's really interesting to test it out at the C/"bare-metal" level to see how it works. Bonus: you get to really learn the difference between pointers and arrays ;)

j4_james · on Sept 3, 2016

LLD is really three different linkers combined. You specify which type you want with the "flavor" command line option [1], but there are also versions of the linker with the flavor hard-coded (these vary depending on the platform).

My understanding is that the darwin flavor uses the atom-based architecture, while the gnu and windows/link flavors use a section-based architecture. There was a thread on the llvm-dev mailing list that explains this in more detail. [2]

I don't know much about the linker script side of things, but I was under the impression that they were working on that.

[1] http://lld.llvm.org/Driver.html [2] http://lists.llvm.org/pipermail/llvm-dev/2015-May/085088.htm...

emaste · on Sept 7, 2016

> when I discovered the toolchain didn't use GNU ld, but Mach's ld64

You can of course use GNU ld with Clang/LLVM if you so choose; this is the default on FreeBSD/arm64 today. The new point with 3.9 is that the full Clang/LLVM + lld toolchain can self-host (using the lld ELF support). ELF lld supports a significant subset of the linker script syntax, and additional functionality is being added as actual uses are found. (It's sufficient to link the FreeBSD kernel today.)

Ericson2314 · on Sept 3, 2016

I wish somebody would just make binutils multi-target...

AceJohnny2 · on Sept 4, 2016

Assuming you're not being sarcastic: they already are, the purpose of libbfd[1] was exactly that. After all, they already support A.OUT, ELF, COFF, PECOFF, Linux on various HW targets, Windows...

...but not Mach-O, Apple's format.

[1] https://en.wikipedia.org/wiki/Binary_File_Descriptor_library

https://sourceware.org/binutils/binutils-porting-guide.txt

https://sourceware.org/binutils/docs-2.27/bfd/index.html

Ericson2314 · on Sept 4, 2016

I mean no longer pick a single target at compile time. It looks like one still needs to?

AceJohnny2 · on Sept 4, 2016

What do you mean by "target" in this case? For me, "target" means the machine you're compiling for. In which case, there is no reason to want ld/other binutils tools to generate output for multiple targets at a time.

And if you do need to build for multiple targets, you just run the binutils tools multiple times with the right options for each, because they're never identical.

Ericson2314 · on Sept 5, 2016

I mean target in the sense of auto tools. LLVM for example is multi-target. LLVM binaries by default support every possible platform LLVM supoorts.

Compiling per-target toolchains is an annoying chore, and binutils is the last holdover bugging me with this when I work with Rust.

emaste · on Sept 7, 2016

> And if you do need to build for multiple targets, you just run the binutils tools multiple times

The question here is whether you can run the same tool (the same binary) for two different targets, or if you have to run a different version of e.g. ld built specifically for each target.

rurban · on Sept 2, 2016

ThinLTO sounds good in theory, but is still too hard to use in practice compared to gcc -flto=4.

While clang -flto can produce up to ~15% faster binaries than with gcc, it's still not usable for bigger projects which put a lot of their API into a shared library.

Some of those functions in a shared library are eventually inlined, but with clang -flto the exported copies of the functions are optimized away, whilst gcc keeps the copies additionally to the inlined variants. So you can try to keep those inlined API calls in the shared library with __attribute__((used)), but then they are not inlined anymore with performance regressions up to %50.

Not fixed in clang 3.8 and not in 3.9. I hope it will become usable in 4.0 as the performance benefits would be dramatic and the implementation efforts to keep the ((used)), i.e. exported copy, minimal.

kev009 · on Sept 2, 2016

A couple more interesting components:

* http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes...

* http://llvm.org/releases/3.9.0/tools/clang/tools/extra/docs/...

* http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.h...

euyyn · on Sept 2, 2016

I don't understand this fixed bug: https://llvm.org/bugs/show_bug.cgi?id=26774

How could foo() had ever printed "X"?

yurymik · on Sept 2, 2016

Both t0 and t1 loads are atomic, meaning you'll always get the whole object. But there's no synchronization between them and another thread can preempt the execution and change value of %ptr right after the first load.

euyyn · on Sept 3, 2016

Then the CSE optimization that removed the printf would be just plain wrong. Why create a more complicated scenario just to point that out?

agumonkey · on Sept 2, 2016

and what is %ptr ? I'm having troubles googling it.

erichocean · on Sept 2, 2016

I assume %ptr is a generic "variable" name[0] for a pointer reference, i.e. the address of the value being loaded atomically. You can substitute %ptr in your head for any atomic address you might want to load from.

The code itself isn't well-formed because %ptr isn't actually defined anywhere—just pretend that it was. ;)

[0] Technically, LLVM does not have variables (as in, names you can assign to more than once), since it's in SSA form, but I don't know what a better name would be here. :)

agumonkey · on Sept 3, 2016

Aight, I'll need to read their docs exhaustively, I had to anyway. Thanks nonetheless.

_isus · on Sept 2, 2016

"Swift calling convention was ported to ARM."

Edit: simplify

nialv7 · on Sept 3, 2016

LDC got mentioned out of nowhere.

Looks like D is gaining some traction recently :)