
Incremental Compilation - dwaxe
http://blog.rust-lang.org/2016/09/08/incremental.html
======
guessmyname
Off topic — I am surprised to find this statement [1] in the blog post
considering how negative were my teachers in university with the "trial and
error" approach of solving a problem. They always said that you should design
a solution before the implementation, and I agree that it makes sense but
there are cases where you need to put a print and exit to debug something; and
if you think about it, that is what a debugger does.

Nowadays — and if you are doing interviews will understand — people expect you
to write flawless code from scratch using a basic code editor (looking at you
HackerRank) with no time for tests. Sixty minutes to write a solution for four
challenges, without autocomplete, without tests. Even the genius co-worker
that comes with an optimized solution for every problem would need more than
fifteen minutes to solve those crazy/random challenges that at the end will
measure your ability to resolve _those_ specific questions but not problem
solving skills in general.

So is trial-and-error bad or not?

[1] Much of a programmer’s time is spent in an edit-compile-debug workflow.

~~~
Manishearth
Designing a solution is still far from having code that works. You have
designed a solution, and now you need to test it. In significantly complicated
codebases, there are often all kinds of obscure issues that can crop up and
make your (manual or automated) testing fail. These are not things you can
think of beforehand in your design because it is not possible for you to have
known about all of these issues.

In fact, later in your comment you allude to this yourself -- "with no time
for tests". You're worried about interviews not having time for tests. But
tests _are_ a edit-compile-debug feedback loop. If a test fails, you need to
debug it, fix the issue, and recompile. Often you have to do this many times.
This is precisely what the blog post is talking about.

You're generalizing the term "trial and error" to mean something more than
what your teachers probably meant.

(I do agree that those interview practices are silly. But the blog post is not
talking about stuff like that.)

------
skybrian
Nice to see it. Incremental compilation is one of those things that production
compilers have but research languages usually don't have. You only need it
when you have a lot of code, but then you really need it.

~~~
pcwalton
Production compilers often don't have it either.

(Having to split up your code manually into separate .cpp [or what have you]
files doesn't qualify as incremental compilation—that's just separate
compilation. Incremental compilation is when the compiler automatically
figures out what needs to be recompiled, even when all you hand it is a big
blob of code that hasn't been manually split up by a human in any way.)

~~~
barrkel
The Delphi compiler has incremental compilation, but by your definition it
wouldn't.

Delphi uses the source to figure out dependencies. That is, the compiler has
make logic, and traces through the 'uses' declarations to recursively discover
all the source and object files. It compares timestamps to discover which
object files are out of date and need to be recompiled. Thus, if you touch a
single file and do a compile, only a couple of files will be compiled - the
main program source and the modified file. This dependency logic is in the
compiler, not a separate make program.

If this isn't the compiler doing incremental compilation, I think we have a
terminology problem. Because this is what is meant by incremental compilation
as the term is applied to almost all production compilers, and if you persist
in making a distinction, you may be confusing more than clarifying. ISTM that
you're memoizing more parts of the compilation process than has historically
been done in incremental compilers. It may be clearer to use a more qualified
phrase to express this, rather than shift the generally accepted meaning of a
term.

~~~
Manishearth
I don't see where in pcwalton's comment he alludes that the Delphi compiler's
definition of incremental compilation doesn't match his own.

He explicitly says "Incremental compilation is when the compiler automatically
figures out what needs to be recompiled".

(He does say that production compilers don't have incremental compilation, but
that's not a matter of definition, and I read that to mean "usually don't")

~~~
coldtea
> _He explicitly says "Incremental compilation is when the compiler
> automatically figures out what needs to be recompiled"._

He also says that merely using the separate files to deduce this (which the
parent says is what Delphi does) is not incremental compilation.

~~~
Manishearth
It doesn't "merely use the separate files", it traverses the use statements.

There's no manual splitting of .cpp and .h files. No manual management of
dependencies.

~~~
barrkel
Rust is going down the path where recompilation is done on a finer granularity
than files. I was objecting to setting that as _the_ bar to meet for
incremental compilation. It's technically impressive, but it will only be a
win if the average compilation unit takes a noticeable amount of time to
compile (this was rarely the case with Delphi). It may well be the case for
Rust+LLVM.

~~~
Manishearth
Fair. I think Delphi counts, and I don't think that was the bar being set (but
it's easy to see how it could be read that way). The C++ header file model is
quite different in the amount of burden placed on the programmer over the
Delphi use-statement model.

------
harveywi
Does anyone know if there has been any work in formalizing incremental
compilation? It would be great to have some sort of theoretical framework to
use as a guide when implementing various incremental compilation strategies.

There is some (ongoing?) work on low-cost incremental computation by
differentiating lambda calculi [1] [2] [3], and it seems like an incremental
compiler could maybe be a good use case for it.

[1] [http://www.informatik.uni-
marburg.de/~pgiarrusso/ILC/](http://www.informatik.uni-
marburg.de/~pgiarrusso/ILC/)

[2] [http://bentnib.org/posts/2015-04-23-incremental-lambda-
calcu...](http://bentnib.org/posts/2015-04-23-incremental-lambda-calculus-and-
parametricity.html)

[3] [http://ps.informatik.uni-
tuebingen.de/research/ilc/](http://ps.informatik.uni-
tuebingen.de/research/ilc/)

------
Raphael_Amiard
Great blog post ! One thing I wonder as a compiler head though is, what is the
granularity of this stuff ? Is it per-file or per-function, or even per block
? If it's finer than per file, how do you do the tree diff ? Do you parse
whole files and then do a node-by-node diff ? Do you have an incremental
parser ?

If I was to implement incremental compilation, I'd start with per-module I
guess, because it's the atomic unit for a compiler, so it would be a lot
simpler. That's why I'm curious.

~~~
finnyspade
I can actually help you out on this one. I spent the summer hacking on the
Rust compiler for my internship. If I'm understanding the code correctly, this
is the enum of elements represented in the dependency graph.

[https://manishearth.github.io/rust-internals-
docs/rustc/dep_...](https://manishearth.github.io/rust-internals-
docs/rustc/dep_graph/enum.DepNode.html)

Note that as per:

[https://manishearth.github.io/rust-internals-
docs/rustc/dep_...](https://manishearth.github.io/rust-internals-
docs/rustc/dep_graph/struct.DepGraph.html)

The DepGraph that we actually care about is specialized to talk about DefIds.
In rustc, DefIds are attached to the following things:

[https://manishearth.github.io/rust-internals-
docs/rustc/hir/...](https://manishearth.github.io/rust-internals-
docs/rustc/hir/def/enum.Def.html)

This means dependencies are analyzed down to local variables and uses. But
even beyond that it tracks things such as borrow checks, linting and more!

------
bla2
From an uninformed distance this looks a bit underwhelming: initial builds are
slower, incremental builds are not _that_ much faster, and build outputs are
no longer deterministic functions of just the build inputs (which is useful
for example for distributed build systems).

~~~
mtanski
The way Rust works today is that you compile a whole crate together (all the
files). The side effect of that is that the compiler can optimize the whole
create vs what's in one file. You get an effect that analogous to LTO (like in
C/C++). Conversely, if you now go to incremental builds the opportunity for
LTO style optimizations. Thus you end up with a worse runtime performance in
incremental compilation.

Caveats and fine details missing all over the place. More focused on the
general point.

~~~
ngrilly
> Thus you end up with a worse runtime performance in incremental compilation.

Are you sure about this (in the current implementation)?

~~~
mtanski
The article it self says "[...] This is because incremental compilation splits
the code into smaller optimization units than a regular compilation session,
resulting in less time optimizing, but also in less efficient runtime code."

~~~
ngrilly
Thanks, you're right. I missed that.

------
_RPM
Since rust compiles to native machine code (bytes), how does it calculate the
starting address of the code? for example, If there is a JMP instruction, JMP
takes an address as an operand most likely -- Doesn't the kernel determine the
starting address, or is the starting address the address returned by mmap?

~~~
fulafel
Modern amd64 code is position-independent, so the JMP address is relative.

On other architectures, it's the linker's job to convert symbolic addresses
and it can choose all the addresses in one go as it's the final stage
producting the executable.

~~~
_RPM
Relative to what, the start address? Does rust determine the start address
from the return value of `mmap`?

~~~
fulafel
Relative to the current instruction pointer aka program counter. You say
things like "JMP -100 bytes".

------
flukus
How does this differ from what make files have been doing for decades?

~~~
pcwalton
You don't have to split up your code manually. You just write one Rust crate,
splitting your code into files on whatever granularity you like (with mutual
recursion fully supported), and the Rust compiler automatically does the rest.
You don't have to write tedious header files; the compiler figures out all the
dependencies automatically.

~~~
flukus
I don't split up my code manually for the compiler, I do it for the humans,
being compiler friendly is just a pleasant side effect. You can build java
files the same way without header files, worst case scenario is you have some
fat compilation units.

~~~
pcwalton
> I don't split up my code manually for the compiler, I do it for the humans,
> being compiler friendly is just a pleasant side effect.

There is no human benefit to having to copy and paste function signatures—and
worse, the bodies of functions you want to be inlined, including all
templates—into header files. There's also no benefit to having the compiler
parse hundreds of KB of header files over and over and over again.

> You can build java files the same way without header files, worst case
> scenario is you have some fat compilation units.

Java (production implementations used in practice) performs whole program
optimization via its JIT. It's nice for Java, but that isn't how Rust works.

~~~
flukus
I don't disagree on the header files, no sane language designed today would
include that.

> Java (production implementations used in practice) performs whole program
> optimization via its JIT. It's nice for Java, but that isn't how Rust works.

Whole program optimization isn't generally something you want on dev builds,
which is where incremental compilation is useful. For a production build why
wouldn't you do a full rebuild anyway?

~~~
pcwalton
> Whole program optimization isn't generally something you want on dev builds,
> which is where incremental compilation is useful.

By "whole program optimization" I also include things like generic/template
instantiation. Whenever you use, say, a HashMap, you have to recompile the
implementation of the HashMap specialized to the size, alignment, destructor,
etc. of types you're using with it.

> For a production build why wouldn't you do a full rebuild anyway?

When I profile apps written in Rust, it's important for me to be able to get
good turnaround time on optimized builds.

~~~
enitihas
Do Java generics work like that? I thought the JVM did not know about the
generics as the types are erased at compile time.

~~~
eonwe
JVM doesn't know about generic types at the runtime. There's no specialization
of HashMaps, it's just arrays of points to objects all around.

With project Valhalla:
[https://en.wikipedia.org/wiki/Project_Valhalla_(Java_languag...](https://en.wikipedia.org/wiki/Project_Valhalla_\(Java_language\))
there's been talk about specialization for value types.

~~~
pjmlp
It is already available for those that want to play with it.

[https://adoptopenjdk.gitbooks.io/adoptopenjdk-getting-
starte...](https://adoptopenjdk.gitbooks.io/adoptopenjdk-getting-started-
kit/content/en/openjdk-projects/valhalla.html)

------
adelarsq
Up to 15% faster or more?! =)

~~~
sho_hn
There's still a bunch of compiler phases they're not caching yet, so it's a
very very early result.

~~~
kibwen
If you click on the asterisk next to that heading, you'll see that the 15%
figure is a joke reference to an XKCD. :P In practice, it wouldn't make sense
for us to say "incremental compilation makes us X% faster" because the benefit
will vary wildly by workload.

