
Ask HN: Why is compiling C/C++ so slow compared to Java? - karussell
Every big C/C++ project I've seen takes ages to compile. At least regarding build time Java seems to be a lot faster. What is the exact reason?
======
wglb
One of the reasons is that any reasonably interesting C++ program has lots of
classes, and quite frequently, each class is in a separate file. Which
actually means two files, one for the .cpp and another for the .h.

One implication of this is that if one of the header files is changed, any cpp
file that uses that header file or cpp file that uses a header that uses the
changed header is going to need recompilation.

Further, lots more is happening during compile for C++ as compared to Java,
where more happens at run-time. Depending on their extent of usage, templates
can consume measurable compile time. Extensive optimization can cause
measurable increase in compile time.

Also contrast the compile time you get with go, which is ridiculously faster
than C++ when measured by the edit-compile-link-run cycle.

I am now of the opinion that the C++ compilation model is broken, or more
charitably, has been shown to not have learned from the compilation models of
the following languages:

* python * ruby * go * java and other jvm uses such as clojure * and many others

despite what Herb says about how modern c++ is.

~~~
lunixbochs
Every language you listed (except Go, which tries very hard for the fast
compilation times) just compiles to bytecode and runs on a virtual machine
later. This can be considered a much simpler task than compiling to machine
language directly.

When you use a virtual machine, much of the machine code required to execute
the resulting program has already been compiled. Some languages/runtimes also
use a JIT to compile code at runtime instead of build-time.

~~~
chc
Why is it necessarily much simpler to generate bytecode than machine code?
Aren't they basically the same thing except one is run by a virtual machine
and the other by a real machine?

~~~
karussell
Yeah, would be interesting for me as well :) !!

------
dmlorenzetti
I'm going to assume you mean that when you update a project, it takes a long
time to recompile-- as opposed to the time it takes to compile from scratch
(although I would still be surprised if Java is faster on a comparable-sized
project).

You may have an include file that gets touched/changed a lot, and that shows
up (directly or indirectly) in most of your project files. Have a look at your
dependency tree and see if anything jumps out at you. Alternately, you can
"touch" a few files at a time manually, re-build, and try to track down a
culprit that way.

Also, take a look at John Lakos' "Large-Scale C++ Software Design", which
offers metrics and design guides for reducing dependencies among files in a
project.

Assuming you have no way to get around each source file including lots of
headers, many C/C++ build tools offer some kind of "pre-compiled header"
option, that you might look into.

Finally, check to see whether you're doing a very aggressive "release" build,
with lots of inter-procedural optimizations enabled. I find IPO at the link
stage takes a very long time. So compare your debug build options against your
release build options, and think about what the timing differences tell you.

~~~
karussell
> although I would still be surprised if Java is faster on a comparable-sized
> project

Presumable I should really do a side by side comparison for a project of the
same LOC but I always had the subjective feeling

> You may have an include file that gets touched/changed a lot

I more meant an initial clean&compile step

> "pre-compiled header" option

what do you mean here?

> Finally, check to see whether you're doing a very aggressive "release"
> build,

ok, this could be one cause, yes

~~~
dmlorenzetti
_what do you mean here? ("pre-compiled header")_

I don't know all the details, but consider that the compiler has to read the
header file, parse it, make a map of its contents (its declarations and
#defines and type definitions and so forth). And it has to do that over and
over, for every source file that includes that header.

So if the compiler stashes away a binary representation of the contents of
that header, then it can skip the slow bits of reading the text file and
figuring out its contents and all. It only has to do that the next time you
change the header.

 _edit: See<http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html> _

Java, I believe, essentially does this already, because it turns each .java
file into a .class file (or whatever, it's been a while). And that .class file
contains all the information that other java sources need in order to know
what's in that java file.

Fortran-90, a language that gets no love on Hacker News, has a nice module
facility that also enables this. Module information can get stored in a binary
form that other sources can look into without having to re-parse the original
text file.

Going back to C/C++, though-- if your root problem is that you have big header
files that appear everywhere and slow down the compilation of every module,
then you may want to think about reorganizing your code to reduce
dependencies. Pre-compiling the headers may remove one of your symptoms, but
not the real cause of the problem.

~~~
karussell
Thanks for your explanation!

------
igorsyl
My experience is that slow-compiling projects tend to have bloated header
files. The C/C++ pre-processor needs to apply #define's and friends before it
touches source files. Java compilers don't suffer from this problem because
the Java spec doesn't have support for header files.

------
advisedwang
There are a few factors:

* Massive includes. Every compilation unit ends up including many header files. Most of these header files include more header files, often ones already included. If a header file has previously been included, it must be included a second time as C++ allows the meaning of including a file to differ each inclusion. This results in a vast amount of necessary preprocessing.

* Compiling the same code multiple times. Every compilation unit must generate code for every template it uses, even if other compilation units are generating the same code. Only at link time are the multiple instantiations of the same code collapsed. As the STL is a massive blob of templates this is a frequent issue!

------
st3fan
The problem is simply that C++ is a MUCH more complicated language than Java.
Just compare the grammars and look at all the crazy ambiguities of C++.

