
Are Headers in C++ the Problem? - bibyte
https://buckaroo.pm/posts/are-headers-really-the-problem/
======
msla
C++'s refusal to maintain compatibility with C while scrupulously maintaining
compatibility with some C _practices_ is the problem.

They modified the type system so good, idiomatic C code won't compile, but
they refuse to make writing unsafe code any safer in any meaningful way.

They added new syntax and reserved words, again breaking C compatibility, but
they refuse to clean up C syntax even to the extent of removing the bizarre
octal notation which makes 0100 do something deeply, deeply unexpected to
anyone who doesn't know some obscure bits of lore.

Finally, they moved a bunch of real complexity to the compile phase, something
C largely doesn't do, but they refuse to replace the file-based system they
inherited from C with anything which would make C++ compiles even a little bit
faster.

~~~
htfy96
Your stance reminds me of this post:

C++ Modules - a chance to clean up the language?
[https://www.reddit.com/r/cpp/comments/agcw7d/c_modules_a_cha...](https://www.reddit.com/r/cpp/comments/agcw7d/c_modules_a_chance_to_clean_up_the_language/)

In short, C++ is a company-driven language instead of a community-driven one
like Python and Rust. Some C practices are maintained because voters from the
committee agree that _these_ practices are necessary (for their code) and it's
impossible to make a completely backward-incompatible change when they have
>1M LOC code base.

~~~
msla
I think there's an underlying concept here.

C is three things:

1\. A mid-level language, by which I mean a language where you do all of the
mechanics but you abstract most of the machine-specific details. Memory
management is manual, but you don't know or care about how malloc() works
behind the scenes, and you can't even ask. You get integers with defined
semantics up to things like overflow/wraparound, but you don't know or care if
they're implemented in terms of multiple machine words or even if there's such
a thing as a carry flag. It's midway between a macro assembler and, say,
Python, which does nontrivial magic behind the scenes.

2\. An unsafe language, with nontrivial undefined behavior (the
overflow/wraparound stuff I mentioned above) with no guide rails like the ones
Java provides, where if you overflow the language specifies an error will
occur. In C, very little checking code of that type is emitted or specified.

3\. A systems programming language, where programmers do unsafe things, like
writing values to DMA registers, things the language can't abstract away
because... well... you have to write an OS kernel in _something_ , after all.
This is unsafe by design, as opposed to the above, where things are unsafe
because of a safety/speed trade-off.

Rust proponents want to separate 2 from 1 and 3, and make a language where you
can do things by hand and do unsafe things on purpose, but the language has
more guide rails to prevent you from doing unsafe crap by accident.

C++ apparently wants to separate 1 from 2 and 3, to move more high-level and
get more "language magic" (templates, iteration stuff... ) without making the
language any safer in _any_ respect. That's just an uncomfortable position for
a language to be in.

My point is, it could move Rust-ward _and_ high-level-ward if it ditched some
C-isms from the language... but you gave a good explanation of why it won't.

~~~
floatingatoll
The interest in making _languages_ safer, rather than in making _code written
in them_ safer, seems to have developed into a bigger factor recently than it
was in the 90s. I can’t personally think of any not-Rust languages that
prioritize operating safety over ease of coding (compared to Rust). Perl lets
you rewrite the global interpreter at runtime, PHP enables decades-old code to
continue running, Apple Basic offered PEEK and POKE for writing arbitrary
bytes to anywhere addressable by the memory controller. I’m sure some exist,
but my point is that we have been favoring expanded _capabilities_ for decades
and only now are we starting to favor expanded _safeties_. When taken in that
light, it makes perfect sense that C++ has always wanted to expand the
capabilities of C without caring about safeties. What else could have resulted
from that history?

~~~
steveklabnik
In some sense. I’m not sure when “langsec” started, but it’s certainly older
than rust.

~~~
floatingatoll
While it is certainly true that Langsec has been around since we invented
programming, Rust is the only viable language I’ve encountered in practical
use anywhere that prioritizes langsec over ease-of-delivery. The focus of C++
on features rather than safeties is echoed by almost every language invented
prior to Rust.

(I don’t know why Rust was the agent of change in that priority, but I’m very
glad for it!)

EDIT: COBOL is good at this, or tries to be. Lots of loopholes but it’s clear
from this article anyways that they _tried_ to make it a safe language. But
they prioritized features like “call arbitrary C functions” over safety, and
you can just write unsafe code without even a hint that you’re doomed. That’s
perhaps the essence of what I see as the difference: Rust forces you to
declare your unsafe intentions to do unsafe things.

[https://www.kiuwan.com/blog/security-business-oriented-
langu...](https://www.kiuwan.com/blog/security-business-oriented-languages-
cobol-rpg/)

~~~
renox
Strange claim, have you ever heard about Ada?

~~~
floatingatoll
Yes.

------
sqrt17
The problem may be not headers itself, but that there are so many ways to
split the compilation and build artefacts in a project.

Java and (Turbo/Free) Pascal try to have few ways of dividing the materials:
Java has class files (and Jar files, and modules), Turbo Pascal has Units.
Both involve having a certain namespace correspond to some file the compiler
can find and compile, and where you then use the definition from the compiled
file.

Python and JavaScript/TypeScript have modules as namespaces that correspond to
one source file - if you import a symbol from a module "bar/foo", the compiler
will look in "bar/foo.js" (or a dozen places in node_modules, or in a dozen
eggs), and if foo.js doesn't have that symbol you know something is wrong
instead of wondering if there are other files that contribute to that
namespace.

Lisp and C/C++ have a tradition that goes back far enough that compilation
units and namespaces (packages in Common Lisp) are disjoint things. C++ is on
the path to making this _more_ complicated with the addition of modules that
create another portioning of the compiled stuffs without forcing the others
(headers, namespaces, source files) to agree with it.

So, headers are not the problem - it's that for a given entity in the source
code (namespace, class, function, whatever) it's not automatically clear to
the compiler where to look for it. Which effectively leads to the funky ball
of dependencies between source files being variously manually specified in
(auto-, C-)make files as well as extracted automatically and still being
error-prone.

~~~
Waterluvian
I think you hit on what I've felt about learning Python then trying to learn
c++.

In python, all code paths are "statically discoverable". Meaning if a symbol
exists in a file, I can find where that symbol comes from in that file. And if
it's an import, the import tells me where to look for it in another library or
relative path.

In c++ when I was trying to learn it was a lot of, "okay so why is 'foo'
available here?" "Because it's imported by the 'bar' library that you're
importing at the top of the header."

~~~
beagle3
> In python, all code paths are "statically discoverable".

That's not exactly true. modules can monkey-patch themselves and other
modules; they can override the import mechanism to do all sorts of things.
It's usually considered bad form, but .. it's possible, and some people like
it that way.

~~~
Waterluvian
Yes you're right. But in the first five years of learning I've never run into
that conventionally. Where I have, it was made painfully obvious by comments.

Practically speaking, when it comes to learning by reading code, it's
invaluable.

------
Quekid5
The problem with headers (in C++) isn't #include. It's the bleed-through of
macros (and pragmas, I guess) from headers into subsequent headers. Without
that, precompiled headers would be an obvious win always and every time.

Macros are side effects and there's really no sane way to constrain them
(currently), but C++ modules are an attempt to do so.

~~~
usefulcat
In my experience, precompiled headers (as implemented by g++ at least) don't
help as much as one might hope. I suspect that they only cache the parsed AST,
as opposed to generated code. I wouldn't be surprised to find that generating
optimized code is a lot more computationally intensive than parsing.

~~~
oppositelock
For complex projects, precompiled headers are massive, and sometimes slow you
down!

In several big C++ source bases that I've worked with, we reversed the process
- at compile time, concatenate all the source files together, include
everything you need just once. It's massively faster than precompiled headers,
at the cost of sometimes running out of compiler heap, and inability to
parallelize builds. The loss of parallelism was offset by the much faster
compilation, though.

------
adgasf
Fixing this is one of the killer features of the "Blaze" family of build-
systems (Buck, Bazel, Pants, Please), since they do not defer the actual build
execution to another program (such as Make or Ninja).

Response from a CMake dev:

> Well, by the time CMake could discover -MM flags, the build has already been
> written and CMake (the program) is out of the picture. Linking to a CMake
> target is also not just "add this library to your link line" either, so a
> simple response file written somewhere during the build for the linker to
> use is not sufficient (nevermind that this file may be updated by any TU
> compilation rule in a library target, something build tools tend not to like
> too much). I guess combination configure/generate build tools can do this,
> but CMake is a build generator and does not execute the build at all.

[https://www.reddit.com/r/cpp/comments/auyl07/are_headers_rea...](https://www.reddit.com/r/cpp/comments/auyl07/are_headers_really_the_problem/ehdm36y/)

~~~
yongjik
...Is it? Maybe blaze/bazel changed a lot in the past several years, but I
remember it didn't even auto-import headers. You have to manually specify
_every header file_ in your dependency list. (To be precise, for every header
you include, your dependency must include some BUILD target that contains that
header.)

So you don't ever get "undefined symbol"... you instead get "cannot include
header"! Not sure if that's an improvement. :/

~~~
adgasf
The header lists are specified either as a list of directories (Bazel) or
using globs (Buck). You must have been using quite an old version!

Yes, it is a big improvement. The error message tells you exactly what you
failed to do. If a target does not export all of its headers correctly, then
fixing that fixes it for everyone consuming that target, which scales well to
large code-bases.

It is not a coincidence that so many companies (Google, Facebook, Amazon,
Twitter, Thought Machine) have converged on this design.

------
pjc50
> If you include header X then you must also link to the library target that X
> belongs to

Visual Studio's solution: #pragma lib(foo). That's that problem solved. This
is not the big problem.

The big problems are a mix of 3 entangled problems:

\- textual substitution by macros with cross-file scope: the behaviour of an
included file depends entirely on previous files.

\- object memory layout and API are unnecessarily bound together, so you can't
change so much as a single function argument without completely recompiling
all dependent code.

\- a number of things (mostly templates but some strings) are duplicated
hugely beacause they're defined in the headers, and then de-duplicated at link
time. So compilation ends up disk-bound while the compiler writes out lots of
data that will be thrown away.

So, you can change one byte in a header high up the dependency chain and
recompile most of your code. Slowly.

~~~
JoshTriplett
> Visual Studio's solution: #pragma lib(foo). That's that problem solved.

Not even close to that easy; platform differences make it a pain to find
libraries consistently. (Windows tends to make it most difficult, with Apple
close behind for a few libraries.)

~~~
pjc50
Yeah, that gets specified "outside the language" in the compiler arguments for
link path and include path.

------
misnome
Patient: "Doctor Doctor, it hurts when I do this" Doctor: "Don't do that"

This is arguing that there is a solution to manage a problem, and that because
there are (caveated, external, third party) ways to manage it, there isn't a
problem. Yes, there are ways to mitigate some of the problems of headers, but
all the bookkeeping is part of the criticism.

And there's still duplicate definitions in separate places to keep in sync,
you end up having to move all your code into headers if you want to make
things generic (with compile time repercussions), and it's easy to mismatch
header versions with the libraries for the same file.

Linking to the libX library for X.h is the least problematic of the header
problems/criticisms.

------
jacinabox
IMO what is needed is a preprocessor with reference semantics that are
conformable to the modern notion of modules. I.e. a preprocessor that thinks
that a reference is defined when it's defined in a module included by this
one, instead of "before this point in a translation unit." This way it's
adequate to include a list of defined symbols in a pre-compiled header, and
reuse them when including that module to some other module.

------
keithnz
I think headers aren't "the" problem, I kind of want to say they are "a"
problem, but they aren't really even that. I think I'll go with that they can
be really annoying. They are quite a powerful mechanisim, but can be difficult
to understand when you have complex header hierarchies (as often found in
embedded systems) and introduce a bit more friction when coding.

------
zwieback
I think they are a good fit for C and C++. Headers allow more fine-grained
control of module inter-dependency and are great for targeting different
processor architectures. Java, C#, etc. don't have to deal with that to the
same degree so there's no direct comparison.

~~~
mikepurvis
Java has the Foo/IFoo pattern. It's not a direct equivalence, but if you want
to separate interface from implementation, you have that option.

~~~
chrisseaton
> Java has the Foo/IFoo pattern.

Are you thinking of C#?

~~~
Quekid5
What? There's no real difference except spelling (aka syntax and convention).

------
kccqzy
Well there are plenty of header-only libraries that put their entire code in
headers and you do not need to link to anything. You can do this in C too but
it's much more popular in C++ with templates. I'm surprised the author doesn't
mention this.

~~~
adgasf
This blows up compilation times, so not really an option for larger projects.

------
coldtea
> _Headers are not necessarily evil._

Haven't demonstrated that headers are not bad. Just that there's a involved,
error-prone, half-manual, workaround to "automate" their discovery and use.

------
blackflame7000
Idk if you inline some functions in the header and then rely on the compiler
to automatically pull the right version in then you could be in for some hard
to find bugs.

------
jokoon
Aren't modules supposed to improve the situation?

------
ycombonator
> In summary: Headers are not necessarily evil.

The headline by OP seems like a clickbait.

------
warmwaffles
Going to invoke Betteridges Law here and say No.

~~~
wtetzner
I don't know if they're _the_ problem, but they're _a_ problem.

