
LLVM 3.9 Release Notes - okket
http://llvm.org/releases/3.9.0/docs/ReleaseNotes.html
======
exDM69
Out of these, I found the ThinLTO [0] most interesting. It's an effort to
solve issues with long build times and high memory consumption associated with
link time optimization. Early benchmarks show perf results that are almost on
par with full LTO and compile times near plain old -O2.

LTO is _the_ compiler feature I'm waiting to become mainstream. Without it you
have to make compromises with translation units (put more stuff in headers
than you'd want to) to get the performance you want.

[0] [http://blog.llvm.org/2016/06/thinlto-scalable-and-
incrementa...](http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-
lto.html?m=1)

~~~
vvanders
Oh man LTCG on MSVC was brutal on build times. I remember we had to use it
extensively in games since we'd usually be within 3-5mb of system limits on a
release console.

If LTO is similar that sounds like a massive win and something to look forward
to.

[edit] Looks like LTO is more about performance than perf+binary size. Still
sounds like awesome stuff.

~~~
DannyBee
Just to clarify: LTO is what most other compilers (including LLVM) call what
LTCG is in MSVC. ThinLTO is a very different LTO model than "standard LTO".
The ThinLTO slides go over the various LTO models that exist.

Previously, before Google built ThinLTO, we built LIPO (and WHOPR) for GCC,
[https://gcc.gnu.org/wiki/LightweightIpo](https://gcc.gnu.org/wiki/LightweightIpo)
Google isn't going to use GCC as it's compiler anymore, and in fact, has
mostly moved off it. We needed to replace LIPO, so we built ThinLTO.

The problem with LTO models in general (as the thinlto slides go over) is how
to make it scalable. Part of that is memory, part of it is parallelization,
part of it is optimization/analysis speed.

THe memory issues are about how memory efficient the IR is and how demand
driven IPO is. IE can your whole-program analysis go function by function,
without it taking a lot of time and memory to import individual functions (and
throw them away if you need to).

Most LTO is slow because it tries to keep large parts of the program in memory
at one time, and has no good way of being as lazy as possible about pieces of
the program.

The parallelization issue is "what is being serialized to make LTO work". Some
compilers parallelize the backend code generation, some parallelize nothing.
There is often still a lot of serial stuff (global analysis, etc).

The CPU time issue is how fast you perform analysis and can generate code.

ThinLTO is meant to allow you to parallelize basically every step, import
exactly the parts of the program you want at a given time in a quick and
memory efficient way, etc.

~~~
vvanders
Thanks for the clarification and detailed breakdown. Makes a lot of sense.

Back when we'd use LTCG it was a ~30 minute link time penalty which prohibited
it from all but the final release builds. Seeing this stuff move forward to be
more effective to use is a great improvement.

------
AceJohnny2
> _Clang can now self-host itself using LLD on AArch64._

So I've been having some "fun" with LLVM targeting AArch64, and was rather,
er, disappointed when I discovered the toolchain didn't use GNU ld, but Mach's
ld64.

Now in theory, ld64 is a next-gen linker that can link code not at the
"module" (one compiled C file) level, but at the individual function and
(global) variable level. Cool! More granularity! Except it doesn't support
anything equivalent to GNU ld's extremely useful linker scripts, the stuff
that says "put this code in SRAM, that data in Flash. etc..." and its command-
line is... poorly documented. Also, it only works with Mach-O binaries, which
is incompatible with objdump, that swiss-army knife of object code [1].

And anyhow, what does LLVM refer to as LLD? Because there are actually two
separate projects under that umbrella: The ELF/COFF one
([http://lld.llvm.org/NewLLD.html](http://lld.llvm.org/NewLLD.html)) and the
ATOM-based one
([http://lld.llvm.org/AtomLLD.html](http://lld.llvm.org/AtomLLD.html))

As I understand, the ATOM-based linker is still experimental, so they're
referring to the ELF/COFF one?

TL;DR: LLVM doesn't hold a candle to binutils

[1] I recommend trying out using objdump to turn your app data into object
data you can link directly into your application. This is what modern
toolchains call "resources", but it's really interesting to test it out at the
C/"bare-metal" level to see how it works. Bonus: you get to really learn the
difference between pointers and arrays ;)

~~~
Ericson2314
I wish somebody would just make binutils multi-target...

~~~
AceJohnny2
Assuming you're not being sarcastic: they already are, the purpose of
libbfd[1] was exactly that. After all, they already support A.OUT, ELF, COFF,
PECOFF, Linux on various HW targets, Windows...

...but not Mach-O, Apple's format.

[1]
[https://en.wikipedia.org/wiki/Binary_File_Descriptor_library](https://en.wikipedia.org/wiki/Binary_File_Descriptor_library)

[https://sourceware.org/binutils/binutils-porting-
guide.txt](https://sourceware.org/binutils/binutils-porting-guide.txt)

[https://sourceware.org/binutils/docs-2.27/bfd/index.html](https://sourceware.org/binutils/docs-2.27/bfd/index.html)

~~~
Ericson2314
I mean no longer pick a single target at compile time. It looks like one still
needs to?

~~~
AceJohnny2
What do you mean by "target" in this case? For me, "target" means the machine
you're compiling for. In which case, there is no reason to want ld/other
binutils tools to generate output for multiple targets at a time.

And if you do need to build for multiple targets, you just run the binutils
tools multiple times with the right options for each, because they're never
identical.

~~~
Ericson2314
I mean target in the sense of auto tools. LLVM for example is multi-target.
LLVM binaries by default support every possible platform LLVM supoorts.

Compiling per-target toolchains is an annoying chore, and binutils is the last
holdover bugging me with this when I work with Rust.

------
rurban
ThinLTO sounds good in theory, but is still too hard to use in practice
compared to gcc -flto=4.

While clang -flto can produce up to ~15% faster binaries than with gcc, it's
still not usable for bigger projects which put a lot of their API into a
shared library.

Some of those functions in a shared library are eventually inlined, but with
clang -flto the exported copies of the functions are optimized away, whilst
gcc keeps the copies additionally to the inlined variants. So you can try to
keep those inlined API calls in the shared library with __attribute__((used)),
but then they are not inlined anymore with performance regressions up to %50.

Not fixed in clang 3.8 and not in 3.9. I hope it will become usable in 4.0 as
the performance benefits would be dramatic and the implementation efforts to
keep the ((used)), i.e. exported copy, minimal.

------
kev009
A couple more interesting components:

* [http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes...](http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes.html)

* [http://llvm.org/releases/3.9.0/tools/clang/tools/extra/docs/...](http://llvm.org/releases/3.9.0/tools/clang/tools/extra/docs/ReleaseNotes.html)

* [http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.h...](http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.html)

------
euyyn
I don't understand this fixed bug:
[https://llvm.org/bugs/show_bug.cgi?id=26774](https://llvm.org/bugs/show_bug.cgi?id=26774)

How could foo() had ever printed "X"?

~~~
yurymik
Both t0 and t1 loads are atomic, meaning you'll always get the whole object.
But there's no synchronization between them and another thread can preempt the
execution and change value of %ptr right after the first load.

~~~
agumonkey
and what is %ptr ? I'm having troubles googling it.

~~~
erichocean
I assume %ptr is a generic "variable" name[0] for a pointer reference, i.e.
the address of the value being loaded atomically. You can substitute %ptr in
your head for any atomic address you might want to load from.

The code itself isn't well-formed because %ptr isn't actually defined
anywhere—just pretend that it was. ;)

[0] Technically, LLVM does not have variables (as in, names you can assign to
more than once), since it's in SSA form, but I don't know what a better name
would be here. :)

~~~
agumonkey
Aight, I'll need to read their docs exhaustively, I had to anyway. Thanks
nonetheless.

------
augb
"Swift calling convention was ported to ARM."

Edit: simplify

------
nialv7
LDC got mentioned out of nowhere.

Looks like D is gaining some traction recently :)

