> Wouldn't adding a preprocessor directive like "#include_once <header.h>" solve this?
No -- the point is still that you recompile each header for each source file that uses it. Running the compiler N times (once for each source file) costs M x N, not M + N.
> Why is this?
I agree with you here, I think this dismissal of precompiled headers needs a more thorough explanation (but perhaps it was given orally in the talk).
To get any benefit from the use of precompiled headers at all, you need to include practically every single header in your project in your precompiled header. Doing this is the opposite of modular.... it tightly couples each of your N compilation units with each of the M modules (changing any one of the modules bundled in the pch will force you to recompile all compilation units using the pch.)
> changing any one of the modules bundled in the pch will force you to recompile all compilation units using the pch
Is there a fundamental reason the compilation can't be optimized away?
For example, given a header file A.h, couldn't you precompile it and generate a bloom filter of all preprocessor tokens contained therein (except ones defined in files that A.h includes)? Then if B.h (which includes A.h) changes, see if any of the preprocessor macros that are defined at the point of inclusion are in A's bloom filter. If not, you can safely skip the recompilation of A.h. If so, you should probably change A.h anyway to directly include the definition of any macros that are affecting its compilation.
I wonder how necessary this actually is. I think using non-command-line defines to change the behavior of header files is rare. From what I can tell, ccache just ignores the issue: https://ccache.samba.org/manual.html#_how_ccache_works
I'm not sure how it gets away with this and works as well as it does. Maybe there is a behind the scenes check that I don't know about? In any case, it might be a good framework to add the caching you describe.
ccache doesn't cache the results of compiling each individual header. Other than the shared cache, ccache is conceptually just a workaround for deficiencies in make.
This is an implementation detail that is fixable without language changes: it is often the case that two header files are semantically disjoint and the compiler (partly the preprocessor: you'd have to more incrementally update the compile after each include) could keep track of these relationships so it could safely merge results later; it is certainly no harder than a "merge" comparison of two sorted token lists. Their goals do go past this (their issues about "resiliency"), but it is seems disingenuous to include these performance issues when there are less drastic solutions.
I don't follow. I'd think that for any given header, you would only have to recompile it if the #include dependencies listed in that file (or their subdependencies) changed. I'd hope that for most projects this is far from every header. When would this happen?
It also seems like this would be a win even if only used for rarely-changing system headers. As shown in the slides, for small projects the lines of unchanging standard includes dwarf the project specific code. Wouldn't it be a big win to avoid parsing and compiling all of these?
No -- the point is still that you recompile each header for each source file that uses it.
You're right, I was thinking about this wrong. I was thinking that the main cost is not the direct includes (M x N), but that each header further includes other headers. (M x N^O). One-time includes prevent the exponential explosion, but you are still left with M x N. But I tend to use ccache to avoid unnecessary recompilations, so I rarely feel the brunt of this.
No -- the point is still that you recompile each header for each source file that uses it. Running the compiler N times (once for each source file) costs M x N, not M + N.
> Why is this?
I agree with you here, I think this dismissal of precompiled headers needs a more thorough explanation (but perhaps it was given orally in the talk).