
Termite: A generic distributed compilation system in Go - fogus
https://github.com/hanwen/termite
======
jgrahamc
The big problem with distributed compilation is not the distribution of jobs
(there are tons of ways to do that), but resolving the dependency information
correctly. I founded Electric Cloud (<http://www.electric-cloud.com/>) to
solve this problem and we ended up needing to:

1\. Completely understand the build process from the build script (e.g.
Makefile)

2\. Write a parallel distributed, versioned file system so that we could
automatically spot the dependencies that were not mentioned in the build
script.

Once we'd done that we were able to get very high parallelism from unmodified
build scripts which was impossible with tools like make -j, distcc, etc. The
reason is that almost all build scripts assume serial builds and thus do
things like reusing a temporary file name which breaks parallelism. Or you
have nightmares like a shared precompiled header file that everyone is
updating.

Once you crank up the parallelism then you can start worrying about disk
latency, network latency etc. That led us to do everything in RAM on the
worker nodes, our own binary protocol with compression, and peer-to-peer
communication inside the worker node cluster to maximize switch utilization
and minimize master load.

------
draven
When I scanned the frontpage and saw "termite" and "distributed" in the title
I thought it would be about Termite the distributed Scheme. See
<http://code.google.com/p/termite/>

It's been around for quite some time.

------
robfig
Very cool. Great work, Hanwen!

I'm interested by the fact that he has sufficient Go code to compile to make
distribution useful. After all, a Go screencast shows the entire standard
library getting compiled in seconds on a macbook.

~~~
hanwen
Hi Rob! Long time no see; how are you?

Termite is not intended for compiling Go, but rather for large projects that
use C++, Java, etc, and especially for projects that cannot use DistCC. In
particular, a large project that uses DEX comes to mind, but I imagine there
are others.

It's also a spin-off of my FUSE library in Go: I was looking for something
that really needs FUSE to be effective, and a language that supports
concurrency and networking so easily as Go is awesome for building a
distributed system.

I'm surprised at the attention it got (25 followers on github); I had intended
to write an official announcement only after I reached a serious milestone
(eg. compiling the linux kernel in a few seconds.)

As for the name, I was thinking of a social insect in Brazil; maybe I should
change the name to Cupim.

------
huhtenberg
Sounds very similar to _distcc_ (that has been around for ages).

<http://www.distcc.org>

~~~
durin42
It's similar, but uses some tricks so that you don't have to heavily modify
the build tools to use it.

~~~
icefox
What about icecream which also has been around for ages?

<http://en.opensuse.org/Icecream>

