What difference does that make for an end user? You have two copies of the same OS on the 2 PCs (same hardware) but one compiled on gcc, the other on LLVM/clang. How will this affect me? What differences I will notice? Better performance? Like real world performance
You could theoretically see better real-world performance. As someone who is currently working on very high-performance C++, I have noticed Clang being much more 'confident' in optimising code than its counterparts (especially compared to MSVC++).
That being said, in the grand scheme of things any performance improvement would probably only show up on benchmarks.
The only benefit I can think of is increased security now or over time. Most people developing compiler-based mitigations work on LLVM. Especially the practical ones. HardenedBSD is an example of a project making use of them. SVA is an example of one that might be applied to Linux:
More research style projects are indeed based on LLVM, but GCC is extremely serious about security too, (and contains more Linux-specific optimizations & code too).
For the end user, it comes down to - do you care about copyleft and what FSF/Stallman are advocating for, or do you just want a free (as in beer) code, (LLVM).
I'd argue as a user, the GPL cares more about your freedom, so unless you have a specific reason not to use GCC, go with that over LLVM, (yes, am aware of the exception re binaries compiled with GCC not having to themselves be GPL).
This is possibly a silly question but... when it comes to the threat of backdoor vulnerabilities inserted in the compiler itself, which would you say is the safer option? From what I can tell LLVM has a simpler codebase so that's a point in its favor, but I think GCC being a GNU project is less likely to have developers who could be pressured to insert malicious code.
Am I crazy in worrying about this? I know initiatives like reproducible builds are supposed to help solve that kind of threat, but it's still not clear to me how it all fits together.
"when it comes to the threat of backdoor vulnerabilities inserted in the compiler itself, which would you say is the safer option?"
Neither. They're equally unsafe with relatively low risk of this specific attack outside maybe distribution. Although clever idea, AsyncAwait's ideology doesn't work since spies will pose as them. Good news is the Karger compiler-compiler attack has only happened two or three times that I know of.
What's burned projects many more times and most worth worrying about are security-related compiler errors. They transform your code in a way that removes safety/security checks or adds a security problem (eg timing channel). So, the real problem is compiler correctness more than anything. That requires so-called certifying compilers that put lots of effort into ensuring each step or pass is correct. CompCert and CakeML are probably the champions there with formal verification. You could also do rigorous testing, SQLite-style, of each aspect of the compiler on top of using a memory-safe language. If you restrict features, then the bootstrapping can be done in an interpreter written and tested the same way before being ported by hand to designed-for-readability assembly.
It didn't stop there, though. Recent work in verification camp is doing compilation designed to be secure despite using multiple, abstraction levels such as source or assembly or mixed languages. Here's a nice survey on that:
"What's burned projects many more times and most worth worrying about are security-related compiler errors. They transform your code in a way that removes safety/security checks or adds a security problem (eg timing channel)."
I'm not sure if the removal of safety or security checks were caused by compiler correctness issues rather than misunderstanding of language semantics / memory model of a programming language.
If we take removal of a memset (to erase sensitive data) or of erroneous integer overflow checks (because of relying on undefined behavior) as an example, they are based on language / programmer error than compiler errors. These issues should be fixed at the language level so that first, a programmer can express his intentions more easily and second, that it is hard to write code which doesn't align with the programmers intention.
One of the main ones that can do it without undefined behavior, or at least what I was told was undefined behavior, is optimizations getting rid of "dead" code. It doesn't have to be using memset: just an assignment. That assignment would sometimes get removed because the compiler thought nothing would be done with the assigned data. I never read if that was in C specification since it seemed to be a common problem in optimizations. Here's a recent solution just in case you find it interesting:
"These issues should be fixed at the language level so that first, a programmer can express his intentions more easily and second, that it is hard to write code which doesn't align with the programmers intention."
Being a fan of Ada, SPARK, and Rust, I can't agree with you more. The problem is legacy code, esp useful FOSS, that isn't getting ported any time soon. The OS's, web browsers, and media players come to mind. We need ways to analyze, test, and compile them that mitigate risks. Hence, all these projects targeting things like C.
LLVM sources are equally, if not more so, complicated than GCC's sources. To make matters worse, LLVM functionality is broken up into dozens of libraries, so there's a fair bit more tracing that needs to be done to understand what's going on in Clang vs GCC.
Several *BSD projects have been using the Clang toolchain by default for many years, and have been a massive driving force in getting these fixes upstreamed so that other systems, incl. Linux distributions can benefit from greater choice of compilers.
FreeBSD since 10.x on i386/amd64, not sure about the status on other platforms.
OpenBSD since 6.1 for arm64, 6.2 for i386/amd64. This is both default for the base system, meaning kernel and userland. And also the ports tree, for compiling 3rd party packages, very few ports still depend on gcc.
And also, while not the default system compiler yet, LLVM/clang is compiled and installed on macppc/sparc64 and mips64 systems.
That’s neat but the large differentiation seems to be PGO+LTO. As pointed out below both of Google’s distributions of Linux are optimized that way (actually with AutoFDO/SamplePGO+ThinLTO), but I don’t think there is a community distribution that is properly optimized. It could be significantly better.
That’s interesting but kinda highlights the difficulty of shipping an entire OS with profile-guided optimizations. What they need is very broad sample coverage and SamplePGO instead of instrumented FDO. This is what ChromeOS does with Quipper: they collect perf data samples from the entire fleet of customer devices and they build the distribution with AutoFDO/SamplePGO.
Really great work by the OpenMandriva team. I've spoken to some of their contributors before and even looked into one of the bugs they reported. Now that I think about it, I need to send that patch for fixing asm goto detection in glib.
Android and ChromeOS are also built with Clang. I'm curious about the distinction of "first." Does anyone know the timelines here?
Not sure how wide Apple has adopted it or how much of a 'distro' you could call their operating systems, but much is built using LLVM with clang where publicly visible.
> Python has been updated to 3.7.3, and we have successfully removed dependencies on Python 2.x from the main install image (for now, Python 2 continues to be available in the repositories for people who need legacy applications);
This is cool. I wish ubuntu/debian would move towards this.
That seems like a mistake to turn on, given that clang also has options to error on uses of potentially uninitialized variables. Zero is often just as wrong as any other value, so this flag will hide real bugs.
In fact, I'd rather have a flag that clobbers values with random data, to make sure that uses of uninitialized values are caught as soon as possible.
> error on uses of potentially uninitialized variables.
This is absolutely the best option. That said,
> I'd rather have a flag that clobbers values with random data.
That's roughly what happens in practice as is, doesn't it? Barring the first option I'd rather have an option that fails predictably and reproducibly. An arbitrary but deterministic garbage number maybe? Like --set-uninitialized=0xdeadbeef. That might be getting too elaborate, haha.
If you can I would at least advise running tests using memory sanitizer, which is also built in to newer clang versions. They're much more precise, but only catch problems occurring at runtime. also adress sanitizer for the out of bounds accesses, use after free bugs, memory leaks etc.
Am I correct in assuming you are connected to the project ? If so I was wondering what the packages are where you employ PGO, Firefox, Chromium and x64 are examples of applications with built in compile-support, are you using PGO on other packages as well ?
Userland has been built with clang since 7.0. The kernels of some Android devices (Pixel 2/3 for example) are built with clang but seems like most still use GCC.
> Python has been updated to 3.7.3, and we have successfully removed dependencies on Python 2.x from the main install image (for now, Python 2 continues to be available in the repositories for people who need legacy applications);
I wasn't saying there was a disadvantage, I'm just giving a potential look into why it took longer than, say, Arch. Yum is a huge Python project, and it took a while to comb through it all.
It wasn't YUM that held back OpenMandriva's switch (OpenMandriva never used YUM), but some of the build infrastructure tools that wound up being replaced as part of the migration from urpmi to DNF.
Those legacy tools never were updated for Python 3 because they had no maintainers or developers. When the distribution switched to DNF, they were able to adopt actively maintained software that replaced those that were ported to Python 3.
See, when a daddy loves a mommy very much, they usually get married. However, sometimes the daddy meets another woman, who is faster, uses less memory, with a significantly less complicated code base, and then the daddy decides to compile his Linux with her instead.
Daddy is a dick, he and mommy have grown up together. They share their complex internals with each other and were made for each other. They share even their philosophical stance on code freedom. How can you turn your back on that. Why jump to another woman just because she is thinner and more in demand with with researches?
> They share even their philosophical stance on code freedom
arguable.
BSD and commercial Unices used PCC and PCC derivitives for much of their history; by this token, GCC is the 'other woman', and this is itself ignoring the clear differences in philosophy between MIT/BSD and GPL licening
This is probably not true (because clang is in C++ it can't be less complicated than anything), and gcc is complicated in some parts because it uses much better algorithms (the LLVM register allocator is not as good as LRA). LLVM also has some very ugly DSLs like the .td files.
But GCC's codebase does have lots of added complexity from the extremely weird GNU coding style where they want you to pretend you're writing Lisp and all commits have to update a changelog file. Plus terrible GNU software like autotools and recursive make.
Clang has some code base and speed advantages, but the big reason the large players like Apple are grabbing onto it is licensing. A lot of companies really want to move away from any GPL stuff. It's sad since so much of what we have in the Linux ecosystem came from GNU.
Which is always ironic, given that without Stallmann's GNU concepts, Linux would never have happened.
And most likely, given the BSD state back then, it would mean we would just keep using either commercial UNIXes, or Windows would have won the UNIX wars.
you mean being persecuted by over-bearing commercial unices?
(e.g. SVr4 and ATT)?
lets not mis-confuse the issues to our personal ends - the argument is just as valid that without BSD UNIX, stallman would also not have a system to base a clone on..
GNU attempts to redefine the existing cultural status quo of open-source software dating from the dawn of computing to its own personal ends
Meanwhile FreeBSD (on amd64, i386, armv6/7, aarch64) has been buildable with clang since some point in 9.x (2012-13), comes with clang only since 10.0 (01.2014), and since 12.0 the bootstrap linker on amd64/i386/armv7 is LLD (which was the case on aarch64 from the beginning iirc)
Really? I would prefer more things were GPLv3/AGPLv3. Open source today is just a bunch of middleware, but few end products. People today use open source software to build closed source solutions. It's a far cry from what a lot of us envisioned back in the 90s. I wrote about this before:
Compiling things with clang is faster and uses less memory.
Linux distros, package repositories, etc. are basically giant compilation farms, compiling packages making sure they work well together so that you don't have to.
So switching to clang might impact their resource usage.
---
For you, the user, the performance of binaries compiled with GCC or clang is pretty much on par. Some binaries are a bit faster with clang, others are a bit faster with GCC, often in negligible ways.
If you are doing something that's very resource intensive, recompiling that software yourself tuning it to your use cases is probably going to have a much larger impact on resource usage than whether the shipped package was compiled with gcc or clang.
How does choosing one compiler over another "eliminate use of vendor extensions and UB"? Whatever compiler is chosen you'll have as much UB and as many extensions to deal with, as if the other one had been chosen.
No, GCC has serious shortcomings in certain areas.
-flto=thin is better, autofdo/bolt is better, the jit is better, but the most important point are C constexpr which are implemented as in C++, but with GCC you cannot decide at compile-time if a constexpr is constant, so it misses out on many optimizations. It only has _Static_assert, but no usable __builtin_constant_p. With GCC it errors at compile-time, with clang it returns 0.
clang also has diagnose_if, e.g. to match user-defined compile-time warnings with user-defined run-time warnings.
e.g. the clang memcpy can be 100x faster than the gcc memcpy, when the size and alignment is known.
And gcc-9 added serious regressions on some platforms, that you need to blacklist it. gcc-10 probably not being better.
I have to disagree, -flto=thin is faster in compile time, but in performance I get better results with -flto=n in GCC.
Also with FDO I get better results with GCC over Clang/LLVM, my main test subjects are rendering (Blender), archivers, encoders and emulation.
However with straight up -O2/-O3 I very often get better performance with Clang/LLVM. I haven't benchmarked on ARM though, my results may be very different there.
but I'm not really sure. I guess if you want a recent kernel, KDE Plasma based distro and you work with LLVM/clang.
I really don't know where it fits with Mageia/PCLinuxOS and other Mandrake descendants.
History IIRC -- Mandrake was RH Linux with KDE; Mandriva was a continuation of that which split; OpenMandriva were devs from that split that took ROSA Linux (still doing KDE4 I think) and then continued their project from that base.
Working with LLVM/clang is pretty much the same on a distorted compiled with clang as one compiled with gcc. Even the C++ ABIs are largely compatible if you use libstdc++ instead of libc++.
Doesn’t mention if the kernel built fine with clang, that would be interesting if so as that used to be tricky. (I’m probably out of date; maybe it’s fine nowadays?)
X86_64 required the implementation of ASM goto, which we just shipped. You'll need to build clang from source, but the feature will be in clang-9.0 release. Other arches should build with clang-8 (technically x86_64 will build pre 4.19 kernels) but we shipped pixel 2 kernels w/ clang-4.0 so older Clang's may work depending on your target arch/tree/configs.
The kernel _almost_ builds fine with Clang; one of the major stopgaps were the usage of VLAIS (which I think have been all removed given that they amount to pure insanity) and `asm goto`, which it has been implemented by LLVM (I think it will become available on stable when 9.0 is released this autumn).
The kernel is not built with clang as that only works with ARM architectures right now (the Android kernel supports it, but the mainline kernel does not).
2. What version of Linux are you trying to build? Mainline, stable, next?
3. What configs are you trying.
4. What version of clang are you using.
For example, pixel 2 kernel is arm64, 4.4 stable kernel, limited configs, and clang-4.
Things for the most part are pretty green with released versions of clang. There are some long tail configs or combos of the above, but it's pretty minimal and we have a good handle on them.
I have OpenBSD in a VM, sadly, I have many monitors and TVs I need to interface with wirelessly to do presentations, and OpenBSD does not really support that.
"On the AMD side, the Clang vs. GCC performance has reached the stage that in many instances they now deliver similar performance... But in select instances, GCC still was faster: GCC was about 2% faster on the FX-8370E system and just a hair faster on the Threadripper 2990WX but with Clang 8.0 and GCC 9.0 coming just shy of their stable predecessors. These new compiler releases didn't offer any breakthrough performance changes overall for the AMD Bulldozer to Zen processors benchmarked.
On the Intel side, the Core i5 2500K interestingly had slightly better performance on Clang over GCC. With Haswell and Ivy Bridge era systems the GCC vs. Clang performance was the same. With the newer Intel CPUs like the Xeon Silver 4108, Core i7 8700K, and Core i9 7980XE, these newer Intel CPUs were siding with the GCC 8/9 compilers over Clang for a few percent better performance."