The cmake step (configure) was actually run by symlinking vanilla clang and then was swapped back before the make step. The asciicasts were manually made so you will notice one starts before the other so they are roughly the same. I do really like your attention to detail though :)
"trusting trust" is a very interesting topic for a cloud compiler for sure. Funnily enough I was at the llvm developer meetup this month getting feedback on this exact topic.
There are many approaches to this. The easiest gain here avoiding that is using this for built bots. You use this service to speed up PR approval and discover breakages much faster, then build locally for your final release version to share with anyone.
For companies that need to have control of their own service for that reason we can provide enterprise accounts where they use their own cloud provider.
In general we download prebuilt compiler toolchains every day and use them without knowing if they are compromised. I don't see this as any different and would take same steps with an important product that needs protection from that problem before making a release version.
It actually scales better the more source files you have.
The bottleneck currently is link job dependencies.
LLVM is a good example of that, It has 2000 tasks to complete. It hits a point about 360 in where it has to link a tool that is used to generate code for a later stage.
The fastest build would be all source files with no linking. That could be done in about 30 seconds.
Same would apply if you had 1k or 10k source files, 30 seconds.
I picked LLVM because it is the toughest benchmarks for this because I want real world use cases and expectations.
I wonder how one would deal with "trusting trust" in such a system.