He ended up creating his own build system, "tup" < http://gittup.org/tup/ >, based off of it. It also has the property desired in this article that, "No-op builds should be O(1) and instantaneous, and most other builds should be O(WhateverChanged)".
Interesting quote: When a single C file is changed in
the 100,000 file case, make can take over half an hour to figure out which file needs to be compiled and linked.
We ran some tests where I work on a pretty good sized code-base and found that we were CPU bound, even on an 8-core system.
There is an exception, if you are using networked file systems of some type (especially clearcase dynamic views) then you are almost certainly I/O bound.
This was years ago, so who knows.
[Update: he says he thinks it was on the D mailing list years ago, and was probably related to the fact that every source file in a nontrivial C program includes dozens or hundreds of headers, especially with a naive compiler.]
[Update 2: TinyCC http://bellard.org/tcc/ compiles at 30 MB/s. It's not hard to imagine a dual-core CPU on a conventional drive failing to feed the compiler enough source to be CPU-bound.]
The reason that parallel make helps so much (especially with 2x more jobs than processors) is that you aren't idling while blocked for I/O, and are piling up the I/O queue higher so the scheduler can de-randomize some I/O (elevator algorithm FTW).
Even over the network this is true. My old lab had homedirs all on NFS (gigabit I think, but that's not too important) and all our builds were CPU-bound, even without raising the -j argument from cores+1. The same lab (before I was there) even wrote a paper on why the "compile" test is a horrible one for measuring filesystem performance; it just doesn't do enough I/O.
I'd be interested to see what happens if you were to compile something large written in assembly (or close to it), but I'd put money that just the sequential optimizer would be slower than I/O.
EDIT: found the paper and an article about it http://www.linux-mag.com/cache/7464/1.html http://www.fsl.cs.sunysb.edu/project-fsbench.html
There is clearly a point where if your filesystem is slow enough that the process does become I/O bound. I have no experience with ClearCase but from what I've heard it's molasses slow and might be to that point.
My current main environment is Visual Studio for C#, and there you get a lot of errors detected while editing, but not all of them, so you have to continually press the rebuild button, and the time of that rebuild just keeps growing...
Then again, I recently bought a SSD, that helps a bit with build times. :-)
I swear, I read "press the rebuild button" and wanted to cry.
Edit: the sarcasm-free version of the above amounts to this: there was a time when the idea of "writing software to help you write software" was a standard notion, something that everyone did. It seems like the modern world is training a generation of programmers who don't understand this, and who see the "end result" software as the only software worth writing. The idea of tool creation and integration is alien to them. That's what the IDE buttons are for.
However, it doesn't change the fact that you still have to perform it. If you're changing a file that has a lot of dependencies, all of those have to be re-built, unlike if you're sitting in Eclipse coding Java, where you don't.
If you define every function in terms of a macro, and change that macro, though, of course you have to rebuild the whole project.
I don't know Java so I'm assuming it's similar to other compilers. Please correct me if I'm wrong.
Changing the type is an interesting one. If you change the backing class from one type to another, but you were using an interface to access it, then the Java compiler doesn't need to recompile that code -- the compiled InvokeInterface bytecode for that method invocation doesn't change. However -- I feel like I may have read this somewhere -- there are optimizations which might cause it to replace InvokeInterface with static invocations when it can determine at compile time what class is used. If that's the case, then it would have to recompile the client class too.
Of course, you could configure Eclipse to run ant or make or something else to compile your code, but the editor is still driving the integrated incremental compiler to mark up all your syntax errors.
JDT is a compiler in the same sense of the term as javac is a compiler.
It's just a different compiler, with some different features.
A minor nit: most other builds should be O(WhateverChanged) -- consider a C++ header file used by 10 c++ files. Those 10 would need to be recompiled. The agony there is that you wish for some less-than-full-file dependency analysis so that only the files that use that constant you just changed need to recompile.
On the whole a good start to those massive builds you have.
But what if it were simplified to just a live ls-diff based on when a command was last run / a timestamp? If Git / Make would hook into something like that it'd be really useful.
And ibb is proof of concept, showing that O(1) is possible, desirable and useful. Having said that, I've been using it daily, because we have > 200MB of PHP/JS/CSS/HTML at IMVU.
That said, build systems and VCS aren't the only ways to optimize the process. For example, a system using scripts processed at runtime lets you reduce the amount of compiled code and iterate quickly on the one section by pressing a "reload script" button, or, even more conveniently, monitoring the source file and auto-reloading when changes are saved. If you're daring enough you could probably even do this with machine-compiled languages by abusing dynamic linking, though the ease of doing that would ultimately depend on the runtime linker capabilities.
Doing this kind of data-driven thing will cost runtime performance since less can be compiled and optimized statically, but if you treat it as the "scaffolding" work that it is and also include ways to reclaim some of the static factors for release builds, you can get a better overall result than you would if you were just suffering through long builds.
Since the dynamic views into the source were implemented as a custom file system, they could track all the file access that went into building each object. They supplied a custom version of make that pulled that info. out to construct complete dependency descriptions. If you had the toolchain in clearcase as well, that would include all the compiler bins, libs and includes.
Once that info was cached, and as they had code at filesystem level, it could easily be invalidated as files were touched.
Another product of this mechanism was that if you tried to build something that someone else had already (all the same inputs), it would just grab it over the network. In practice this meant that the nightly build would cache the bulk of the object files.
Either way is ridiculous.
There may well be something wrong with the mozilla build, but I swear, it's not the metadata reads from make.
regardless, even if it's not 21 minutes, my experience building mozilla with make was definitely slow.
%.c: %.w %.ch
Of course you can only do this if you do not rely on these implicit rules.
Once my blog posts started reaching hacker news I thought "Oh, I'll just move site out of the apartment and into the cloud!" and bought an account at prgmr.com (which I highly recommend, by the way).
However, I serve WordPress via Apache on a 256 MB VM, which clearly thrashes under load.
Tonight I will purchase an upgrade to 512 MB of RAM and play with nginx.
I'm sorry for the inconvenience.
p.s. I do have WP-Supercache enabled and a PHP bytecode cache. I _could_ just host at wordpress.com, but I might as well learn nginx while I'm at it...