
The C++ Build Process Explained - green7ea
https://github.com/green7ea/cpp-compilation/blob/master/README.md
======
wahern
> You could put a function's definition in every source file that needs it but
> that's a terrible idea since the definition has to be the same everywhere if
> you want anything to work. Instead of having the same definition everywhere,
> we put the definition in a common file and include it where it is necessary.
> This common file is what we known as a header.

In C and C++ parlance "definition" should be "declaration" and
"implementation" should be "definition".[1] The terminology is important if
you don't want to get confused when learning more about C and C++. This is
compounded by the fact that some languages describe these roles in the
author's original terms. (Perhaps the author's terminology reflects his own
confusion in this regard?)

[1] This is indisputable given the surrounding context, but I didn't want to
paste 3-4 whole paragraphs.

~~~
kccqzy
Yes, indeed.

In extremely old code, I sometimes see people preferring to manually write
declarations of functions (even libc functions) in every source file instead
of including a header.

To add to the confusion, in certain cases, it is possible to put a function's
definition in a header file (for example if it's a function template, or in an
anonymous namespace, or the static keyword is used to indicate internal
linkage). So it is possibly to write this function definition manually in
every translation unit.

Otherwise, the ODR rule requires functions to be defined exactly once.

> One and only one definition of every non-inline function or variable that is
> odr-used is required to appear in the entire program (including any standard
> and user-defined libraries). The compiler is not required to diagnose this
> violation, but the behavior of the program that violates it is undefined.

~~~
cm2187
As a non c++ programmer, what I don't understand is why do people need to
write header files by hand. Surely it should be auto-generated from the source
code?

~~~
bluGill
Remember you are writing your comment in 2018, what was feasible in 1975 was
very different. Programmers back then were smart enough to pull the job off,
but the computers they had were limited such that it wasn't worth it. First
because the computers were limited programs had to be smaller so the benefit
wasn't as great. Second computers were slower so it was worth spending extra
human effort once to save a lot of computer effort.

C++ has been working on modules to fix this for years now. It turns out to be
harder than you would think. Everyone agrees with the basic problems
statement, but there is disagreement on the details. All sides have good
points in favor of their approach, and there are some places where you have
both as they are incompatible. Progress is being made (and a lot of the
incompatibilities turned out to be solvable but it took years of thinking to
come up with how)

------
aidenn0
If anybody is curious about how templates work, there are many ways, but the
two most historically popular are:

Prelinker:

1\. Compile each file, noting which templates are needed in a section in the
object file

2\. Have a special program called the "prelinker" run before the linker that
reads each .o file, and then somehow instantiates each template (which
usually, but not always requires reparsing the C++ file)

Weak Symbols:

1\. When compiling instantiate every needed template, but mark it somehow in
the object file as being weak, so that the linker only pulls in one
instantiation for each definition.

The prelinker used to be more popular, as if you e.g. instantiate the same
template in every single file, your compiler does _tons_ more work with the
weak symbols approach, _but_ now weak symbols are popular both because they
are _much_ simpler to implement and the fact that compilation is usually
parallel, while linker is typically not means that walk clock times may even
be faster.

~~~
int_19h
Who actually used the prelinking implementation, aside from those very few
compilers that valiantly tried to support "export template"?

~~~
Quekid5
Nobody, AFAIK. Even with an explicit "export template", it's basically
impossible because of of the interaction of all the features of the C and C++
and CPP parts of the language. (Precompiled headers are/were are thing, but
they're very brittle. Of course, one assumes you already know this, I'm trying
to provide additional exposition.)

People started to use templates for metaprogramming and from that point on the
scope for "reuse" of templates isn't really there. (Reusing parsing _might_ be
plausible, but it's really difficult because parsing is extremely context-
sensitive because of SFINAE, #defines, etc.)

Some might comment that "modules" is "export template" all over again, but
this time there are actually 2-3 implementations of 2-3 of the _proposals_ and
everyone is confident that the remaining minor problems can be resolved
satisfactorily... and they're all exchanging experiences to help each other!

~~~
roel_v
"Precompiled headers are/were are thing, but they're very brittle."

In compilers other than Visual Studio, yes (or so I'm mostly told - I don't
have all that much experience with them), but msvc has had them since at least
VS6 (late 1990's) when I first started using them, and they work very well and
have saved me many, many hours since then. Maybe once or twice I had to delete
the pch file in all that time, and that was most likely more of a issue of the
GUI mangling the saved internal state than the actual compiler.

I've heard pushback against precompiled headers from Unix land for 2 decades,
I'm not really sure where it comes from. I have the impression it's mostly
cognitive dissonance - 'msvc has it and gcc doesn't, therefore it must be bad
because gcc is 'better' than msvc'. It's similar to #pragma once - in use,
it's objectively better in every possible way than include guards are, and gcc
fanboys still dismissed it back when gcc didn't have them.

~~~
gpderetta
FWIW, gcc has had pragma once for ages. On the other hand recently we had
issues with MSVC not recognizing that an header and its simlink were the same.
GCC and clang had no problems.

There is a reason pragma once is not standardized. Defining when to include
lines refer to the same file is extremely hard.

------
tiagoma
Watch this:
[https://www.youtube.com/watch?v=dOfucXtyEsU](https://www.youtube.com/watch?v=dOfucXtyEsU)

------
amelius
Aside, C++ would have been so much nicer without the need for header files ...

~~~
tjoff
In principle I agree. An IDE should be able to automatically generate
something like a header file regardless of language.

But still, no IDE or tool I've seen does this better. A header file gives you
a very nice overview, and it really isn't any hassle to speak of to keep them
up to date.

It takes getting used to (as with everything when starting out with a new
language), but before you know it you might even start to miss them in other
languages.

~~~
pcwalton
Header files absolutely are a hassle to keep up to date. Besides, they're a
terrible experience. You have to pay attention to include order! You have to
write forward declarations! You have to write include guards, in 2018!

They're also bad as an overview. They may have been good in the 1980s, but
nowadays a proper documentation generator gives you better formatting, search
features, and cross-references. Inline documentation is particularly obnoxious
in header files: I like seeing function-level documentation alongside
interface _and_ implementation, but if I put the documentation in both the
.cpp and the .h file it's duplicated in two places and easily gets out of
date.

~~~
westmeal
Does #pragma once count as an include guard in your book?

~~~
jchb
#pragma once is not in any C++ standard, although basically all modern
compilers support it. The reason it is not in the standard is that its
behaviour in the presence of symlinks may differ depending on the filesystem
and compiler. Is it the unique combination of filename + content that should
be included once or is it the file path? etc.

~~~
josefx
> The reason it is not in the standard is that its behaviour in the presence
> of symlinks may differ depending on the filesystem and compiler

I have the feeling that the difference matters to almost nobody and should be
avoidable by the few people it actually affects. I had more issues with
colliding include guards than I had with symlinks and I still end up replacing
CLASSNAME_H with PROJECT_CLASSNAME_H in our own headers every now and then
since the autogenerated guards are too naive.

------
crumbshot
The section describing how a function call is made appears to be slightly
incorrect.

The return value of the `add` function, in most ABI definitions, would be
stored in a register. After that, the `main` function may then copy that value
to its own space it has reserved on the stack.

This is at odds with the description in the article, which seems to describe
`add` passing its return value to `main` via the stack.

(This is assuming no optimizations - all this would most likely be inlined
anyway, with no function call.)

------
joker3
I've been looking for something like this to send to some C++ newbies. This is
almost what I need but not exactly. Is there something similar that explains
how libraries work?

~~~
aidenn0
Libraries are just a bundle of .o files (e.g. run "ar t /usr/lib/x86_64-linux-
gnu/libpython2.7.a" on an ubuntu with devtools installed and you'll see all
the .o files that are in libpython2.7).

The special thing about them though is that the linker will not include any .o
files for which no symbols are referenced.

You can think of the typical linker algorithm as follows:

If I am missing a symbol X scan through all not used objects in each library
until you find it, then include that .o file. Now check if any symbols are
missing again and repeat until done.

~~~
colanderman
> Now check if any symbols are missing again and repeat until done.

That properly describes linking using --start-group/\--end-group. Without
those flags, the process is closer to "look in the first .a file for
definitions of all currently undefined symbols; then look in the next .a file
for definitions of remaining undefined symbols; etc.". The difference becomes
apparent if you link chains of libraries in the wrong order, or if you have
cyclic dependencies between libraries; normally they will not be resolved
unless you use the grouping flags. (But really, you should avoid making such
cycles in the first place!)

~~~
aidenn0
Thanks for the correction. I was posting that from my phone without any
references in front of me.

------
roel_v
"Basically, the compiler has a state which can be modified by these
directives. Since every _.c file is treated independently, every_.c file that
is being compiled has its own state. The headers that are included modify that
file's state. The pre-processor works at a string level and replaces the tags
in the source file by the result of basic functions based on the state of the
compiler."

I almost sort of get what the author means here but then I don't really. I
mean, there is no 'state' for the compiler that is modified by precompiler
directives, so this is probably an analogy or simplification he's making here,
but I don't really understand how he gets to the mental image of 'compiler
state'. Why not just say it like it is: the preprocessor generates long-assed
.i (or whatever) 'files' in memory before the actual compiler compiles them,
the content of which can be different between compilation units, because
preprocessor preconditions might vary between compilation units?

~~~
mbel
It's neither an analogy nor simplification. When he speaks about "compiler
state" he means "preprocessor symbol table state". When preprocessor processes
a file its state is mutated -- symbols get defined, redefined or undefined.

What you propose as a replacement (an in-memory file) does not provide any
insight into why the same file preprocessed twice may end up looking different
or why order of included files matter.

~~~
roel_v
Well in that case, I guess it's a definition thing. When I teach C++, I find
it much more useful to make a clear separation between 'preprocessor' and
'compiler', and not make the preprocessor part of the compiler and then make
the... uh... 'actual compiler' also part of the compiler.

When you take the preprocessed state of a compilation unit, by having the
preprocessor write it out to disk, and show someone what the effects are of
passing one or the other -D flag, or change the order of includes - that
directly and concretely shows what is going on. And then this preprocessed
file is passed on to the actual compiler. There is a clear separation between
stages, easy to understand, and useful to boot when the time comes you have to
debug an issue related to it and you want to look at the preprocessed file to
see what's going.

~~~
mbel
> I find it much more useful to make a clear separation between 'preprocessor'
> and 'compiler'

Yes, I absolutely agree. And to be honest I cannot imagine explaining how
preprocessor works without describing it as a separate entity.

------
marco_craveiro
Bit of a side-question, but somewhat related. Is anyone working on "whole
program compilation"? I don't mean whole program optimisation, I mean an
attempt to read _all_ files for a given target in memory at the same time and
then generate all translation units in one go (all in memory? and maybe
linking them in memory too?). Clearly, there would be caveats (strange header
inclusion techniques relying on macros to modify text of include files would
break, gigantic use of memory and so forth), but for those willing to take the
risk, presumably this should result in faster builds right?

In fact, ideally you'd even generate _all_ binaries for a project in one go
but that may be taking it a step too far :-)

At any rate, I searched google scholar for any experience reports on this and
found nothing. It must have a more technical name I guess...

~~~
Khoth
The terms you're looking for are "single compilation unit" or "unity build".

It's used sometimes, I think mostly to help the compiler optimise better.

Build times for a full rebuild may be faster, but may not, since traditional
builds can use many CPU cores. However, it stops incremental builds from
working - if you modify one source file, you have to recompile everything.

~~~
eequah9L
Or a compile server / incremental compilation. Tom Tromey worked on supporting
this in GCC years ago, and blogged about the roadblocks that he met on the
way. I don't remember the details, but eventually the project was abandoned.

It might still be interesting to read through this stuff--throw "tromey gcc
compile server" at a search engine and see what comes up.

------
Const-me
I don’t think that article’s accurate. At least not anymore. Modern C++
compilers do less while compiling, and much more while linking. This allows
them to inline more stuff and apply some other optimizations.

VC++ calls that thing “Link-time Code Generation”:
[https://docs.microsoft.com/en-us/cpp/build/reference/ltcg-
li...](https://docs.microsoft.com/en-us/cpp/build/reference/ltcg-link-time-
code-generation?view=vs-2017)

LLVM calls it “Link Time Optimization”, pretty similar:
[http://llvm.org/docs/LinkTimeOptimization.html](http://llvm.org/docs/LinkTimeOptimization.html)

~~~
gpderetta
LTO is still an opt in thing though. I suspect that most pronects still don't
use it.

------
leni536
I just watched Matt Godbolt's recent talk about the linking process[1]. It's a
pretty good talk.

[1]
[https://www.youtube.com/watch?v=dOfucXtyEsU](https://www.youtube.com/watch?v=dOfucXtyEsU)

------
IloveHN84
I hope someday the build and linking process could be standardized, but I
don't believe it will happen, because many members of the committee come from
Microsoft, Google and other tech giant who want to sell their compiler (or
give it for free, but still).

There are too much Interests and the standardization would kill many of them

