For starters, Apple's money is not the main driver of LLVM (In fact, publicly, Apple is not the #1 contributor anymore).
Second, "but merely the fact that compiler technology has advanced significantly in ways that GCC is not well positioned to exploit. " is simply false
In fact, that's exactly the problem for GCC: Compiler technology has not advanced roughly at all.
GCC caught up to everyone else for the same reason.
Time for a history lesson.
About 14 years ago,a group of folks including Diego Novillo, Jeff Law, Richard Henderson, Andrew MacLeod, me, and Sebastian Pop (along with bug fixes/changes from a lot of others) sat around and build a "middle end" for GCC.
Prior to that, GCC had a frontend, and a backend. The frontend was very high level (and had no real common AST between the frontends), the backend was very low level.
There was nothing in between.
We cherry picked the state of the art in compilers and research, and build a production quality IR and optimizer out of it.
This research has not really changed that much in about 10-15 years. Most of the research these days focuses not on straight compiler opts, but on things like serious loop transforms, and helping runtimes (GPU, GC, etc), or dynamic languages.
This covers only until the branch was merged. At that point, it was "not a piece of crap", but this was before people added all the stuff on top of this architecture.
On top of that architecture, it took another few years to get good, and a few years after that to get really good.
Bringing us to today.
LLVM was started around the same time, but had less contributors back then.
Essentially, you could view it as "instead of build something in between two really old parts, what could we do if we just redid it all". People thought it was a waste of time for the most part, but Chris Lattner persevered, found a bunch of crazy people to help him over the years, and here we are.
Because you see, it turns out compiler technology has not really changed at all. So, algorithmically, LLVM and GCC implement the same optimization techniques in the middle of their compilers. Because there is nothing better to do. Just slightly different engineering tradeoffs. To put it another way: Outside of loop transforms, essentially static language compilers targeting CPU architectures are solved. We know how to do everything we want to do, and do it well. It just has to be implemented.
So given enough time/effort, LLVM and GCC will produce as good of code as each other there. The question becomes "will they keep up with each other as engineering/tuning happens" and "who can generate great code faster".
The problem for GCC on this front is three fold
1. The backend, despite being pretty heroic at this point, really needs a complete rewrite, but people value portability over fast code.
LLVM, having started completely from scratch, has a modern, usable backend. They are not afraid to throw stuff away.
2. For any given thing you can implement, it's a lot easier to do it in LLVM than GCC, so, given time, LLVM will produce faster code because it takes less work to make it do so than it does to make GCC do so.
3. Because it was architected differently and more modernly, clang/LLVM are significantly faster at compiling than GCC. GCC can remove most if not all of the middle end time (and does), but it's still slow in other places, and that's really really hard to fix without fundamental changes (See #1)
There are still plenty of open problems in compilers. For instance, writing a program to effectively use all four of my CPU cores is pretty tedious. It would be awfully nice if my compiler could automatically parallelize operations, do effective register allocation across cores, distribute data for best use of L1 cache, etc.
Certainly researchers who are working on this sort of thing today are doing it in LLVM or some custom framework. I can't imagine GCC has any significant traction at least.
The only open problem here is the one i stated "serious loop transforms". Parallelization is not even "hard", it's just hard for languages like C++. Fortran compilers have been parallelization for 20+years
For any interesting type of vectorization we want to do, the problem is not "can we figure out what to vectorize" it's "how long do we want to spend vectorizing code" :)
For starters, Apple's money is not the main driver of LLVM (In fact, publicly, Apple is not the #1 contributor anymore).
Second, "but merely the fact that compiler technology has advanced significantly in ways that GCC is not well positioned to exploit. " is simply false
In fact, that's exactly the problem for GCC: Compiler technology has not advanced roughly at all.
GCC caught up to everyone else for the same reason.
Time for a history lesson.
About 14 years ago,a group of folks including Diego Novillo, Jeff Law, Richard Henderson, Andrew MacLeod, me, and Sebastian Pop (along with bug fixes/changes from a lot of others) sat around and build a "middle end" for GCC.
Prior to that, GCC had a frontend, and a backend. The frontend was very high level (and had no real common AST between the frontends), the backend was very low level.
There was nothing in between.
We cherry picked the state of the art in compilers and research, and build a production quality IR and optimizer out of it.
This research has not really changed that much in about 10-15 years. Most of the research these days focuses not on straight compiler opts, but on things like serious loop transforms, and helping runtimes (GPU, GC, etc), or dynamic languages.
You can see all the tree ssa work here: https://github.com/gcc-mirror/gcc/blob/master/gcc/ChangeLog....
This covers only until the branch was merged. At that point, it was "not a piece of crap", but this was before people added all the stuff on top of this architecture. On top of that architecture, it took another few years to get good, and a few years after that to get really good.
Bringing us to today.
LLVM was started around the same time, but had less contributors back then.
Essentially, you could view it as "instead of build something in between two really old parts, what could we do if we just redid it all". People thought it was a waste of time for the most part, but Chris Lattner persevered, found a bunch of crazy people to help him over the years, and here we are.
Because you see, it turns out compiler technology has not really changed at all. So, algorithmically, LLVM and GCC implement the same optimization techniques in the middle of their compilers. Because there is nothing better to do. Just slightly different engineering tradeoffs. To put it another way: Outside of loop transforms, essentially static language compilers targeting CPU architectures are solved. We know how to do everything we want to do, and do it well. It just has to be implemented.
So given enough time/effort, LLVM and GCC will produce as good of code as each other there. The question becomes "will they keep up with each other as engineering/tuning happens" and "who can generate great code faster".
The problem for GCC on this front is three fold 1. The backend, despite being pretty heroic at this point, really needs a complete rewrite, but people value portability over fast code.
LLVM, having started completely from scratch, has a modern, usable backend. They are not afraid to throw stuff away.
2. For any given thing you can implement, it's a lot easier to do it in LLVM than GCC, so, given time, LLVM will produce faster code because it takes less work to make it do so than it does to make GCC do so.
3. Because it was architected differently and more modernly, clang/LLVM are significantly faster at compiling than GCC. GCC can remove most if not all of the middle end time (and does), but it's still slow in other places, and that's really really hard to fix without fundamental changes (See #1)