
Killing the C++ Modules Technical Specification - jahnu
https://izzys.casa/posts/millennials-are-killing-the-modules-ts.html
======
wyldfire
> Effectively, without support for build tools, modules are effectively dead
> in the water.

...

> "We'll make the build systems understand".

This seems to be a critical point that I agree with: you can't divorce these
two, their fates must be bound to one another. Hopefully the build systems can
be easily modified to accommodate.

Rust community folks might be wrong about c++ but they're right about Rust.
Rust had the luxury of not being bound to any legacy specifications and it
wasn't unreasonable for them to write their own build+dependency/package
management tool. The net effect of them having done that means that
independent of all the other great things Rust brings, much of the tedium of
developing with C/C++ is gone.

Make no mistake, C++ standard committee: the bar is high and Rust has set the
bar.

~~~
pjmlp
> the bar is high and Rust has set the bar.

Modules were introduced in the 70's as concept, Mesa being one of the first
languages to use them, followed by CLU and ML....

The issue with lack of modules in C++, was that Bjarne had the compromise C++
should fit into C toolchains used by AT&T, and later all C shops.

EDIT: I get Rust is quite cool, but maybe a bit of programming language
research is also welcomed.

~~~
humanrebar
I think "the bar" is talking about languages a developer might write her next
application in. Mesa and CLU aren't on the list. OCaml _might_ be, depending.

~~~
pjmlp
Except Rust modules (which are the subject here) are anything but innovative,
there are plenty of languages since Mesa, CLU and ML came to the world with
equally or better module systems.

~~~
humanrebar
Who invented modules first is moot when you're picking the language to write
your next project in. Just like who invented the assembly line is moot when
you are shopping for your next car. "The bar" for a car isn't Ford because
Henry Ford was innovative. People don't really care about any of that when
evaluating their next choice.

~~~
pjmlp
So what has Rust brought into modules to set the bar that isn't already
available in other systems programming languages, with exception of C and C++?

~~~
jacobush
Ok, I am bit harsh here, but for me there ARE no other systems languages
except C, C++ and recently maybe Rust. I am also told, that in distant lands
with strange and foreign customs, Ada is a systems language. (I don't know -
no Ada missionaries have yet set foot on our shares and preached it to us.)

~~~
singingboyo
>no Ada missionaries have yet set foot on our shares

I'm not sure if 'shares' is a typo or not, but it fits great with my
impression of why companies choose the languages they do sometimes: everyone
else uses it (it's the local culture/custom), and nothing/no one has forced
them to change (by lowering their profits/share value).

~~~
jacobush
It was a typo, should have been "shores". However, it was a brilliant typo -
the powers that be have decided on Greenhill and ClearCase. And the people who
decide these things seem to be closer to the share holders than they are to us
coders, so Ada missionaries on shares is maybe what it takes.

------
mattnewport
I'm very familiar with C++, passingly familiar with the modules proposal and
unfamiliar with Rust and this article leaves me puzzled exactly what the
author expects / wants a modules proposal to do instead.

My perspective as a C++ developer has been that modules have one primary
purpose, which is to improve build times by eliminating the need to parse huge
amounts of C++ in recursively included header files when compiling a
compilation unit. If that's all a modules proposal accomplishes I'm fine with
that. A modules proposal that doesn't accomplish that is useless.

Build tools generally already parse .cpp files for includes to figure out
header file dependencies for a .cpp file and have to combine the filenames
extracted with information from the build system on include search paths to
turn those into dependencies on specific concrete files. That involves
information that lives in the build system and not in the code. Modules will
require something similar from build tools but as with parsing includes the
rules about imports (must be before actual code, must be alone on a line)
parsing out the imports is trivial and cheap compared to recursively parsing
the full code in header dependencies.

So I guess I don't quite understand the problem. Modules are an improvement
over headers for build times but they are not going to do everything that
module systems perhaps do in some other languages because they have to be
adopted incrementally into the many different existing build systems used in
C++ code bases (which live outside the standard). Perhaps my ignorance of Rust
means I'm missing what the author wants in C++ that's different from the
current proposal.

~~~
zach43
I’m not very familiar with C++ build tools which parse source code for header
dependencies, but I know that C++ compilers (the preprocessor specifically)
does this. From what I know of cmake and gmake, neither of these build tools
automatically parse C++ files to find dependent header files, but both require
manual specification of the include directories and include paths.

~~~
comex
They require manual specification of include paths, but not of what .cpp files
depend on what .h files. However, at least on Unix, this is _not_ typically
done by having the build tool parse C++ files. Rather, while compiling a given
.cpp file, the compiler is instructed (using -MD or other variants starting
with -M) to output a .d file listing all the header files it encounters as it
goes, which are then treated as dependencies. The output format is a subset of
the Makefile language with just file dependencies (like “a.cpp : b.h”) rather
than compilation rules; thus, anything that uses make as a backend (including
autotools, cmake in that mode, etc.) can just have the Makefile ‘include’ all
.d files, and make will do the right thing. (Alternative backend build tools,
such as ninja, have to parse the .d files themselves, typically supporting
only that subset rather than reimplementing all of make!) You don’t have them
for the first compile, but that’s okay: the dependencies are only used to skip
recompiling files if none of their dependencies changed.

This design replaced an earlier one where the user had to manually run ‘make
depend’, which _would_ pre-scan for dependencies using a separate tool,
makedepend(1). First ‘gcc -M’ was added as a straight replacement of
makedepend, then -MD and variants were added to write dependency files while
compiling each object file rather than requiring a separate invocation (which
is slower).

But it sounds like modules will complicate things, since it could be mandatory
to gather dependency info upfront in order to compile modules before files
that depend on them...

------
gpderetta
"The compiler must not become a build system" does not prevent the compiler
generating local dependency info. In fact this is already possible today as
many compilers (preprocessors really) are capable of generating header
dependency info.

You will need a similar mode were the compiler does only minimal parsing of
the source file and generate a dep file containing the list of modules defined
in this file and the list of modules imported by this file.

The build system can then collet all these dep files and reconstruct the full
dependency graph. Given the right format, you could probably feed them
directly to make.

~~~
SAHChandler
(Hi, author of the article)

The issue is, "how do you find these dependent modules?". As of right now,
there is no way to find a module. It is not tied to a location in a
subdirectory, nor is the name of a module tied to the file itself. I'm fine if
we can get the dependency information, but how can we do that if there is no
guarantee for finding the actual dependencies? Leaving these decisions
unspecified by the C++ standard is dangerous. Having an important thing such
as "how does the name of a module map to the name of the translation unit it
contains?" be none of "undefined", "ill-formed", or "implementation defined"
is asking for trouble. And to be quite honest, I don't see how running the
compiler in a minimal parsing mode (once for each module), followed by running
the compiler a second time for its full IFC (or whatever file format is used)
and possible object file generation can be a good idea.

~~~
gpderetta
Running the compiler twice (first in preprocessor mode), then in actual
compilation mode is exactly what is done now for many build systems. The first
pass, in addition to building dependencies also generates the mapping from
files to modules. You will need to list all the files you need to 'preprocess'
in your build script, to generate the dependencies, but that's what it is done
today already and I don't see a way out nor even a need to change. Modules are
not supposed to enforce a specific build strategy, nor they should.

The standard currently imposes very little requirements on the actual build
process, headers and source files do not even need to exist as actual files on
a system, so it would be a lot of work to actually introduce these concepts in
the standard and risk delaying modules further.

That doesn't mean that there might not valuable to standardize that, but it is
another battle and many (I, for example) will argue that a strict mapping from
file names to module names is wrong.

~~~
SAHChandler
Forgive me, but I can't think of a single build system where the compiler is
run on a source file twice. All of the ones I've seen support either parsing
the .d file generated by a compiler, or parse /showIncludes if using MSVC. If
you have any examples of build systems doing this, I'd love to see them so I
can avoid them.

~~~
dcohenp
I've seen at least one, but it's proprietary and internal to my bigco. I'm
willing to bet there's quite a few more at other bigcos. So lucky for you if
you can avoid them, but I'm pretty sure there's many of us who don't get that
choice.

------
humanrebar
> Granted, the Rust community is typically wrong about nearly everything when
> comparing their language to C++...

What percentage of the Rust community is a professional C++ developer, either
now or recently in the past? I was under the impression that "nearly all" was
a fair estimate.

If that's true, does that mean that the C++ community itself is "typically
wrong" about their language?

~~~
flohofwoe
It seems most Rust programmers are switching over from Python (but C and C++
are next, followed by Java and Javascript, look near the end):
[https://blog.rust-lang.org/2016/06/30/State-of-Rust-
Survey-2...](https://blog.rust-lang.org/2016/06/30/State-of-Rust-
Survey-2016.html)

There is no "right" or "wrong" C++, there's also no single "C++ community".
I'd say that C++ isn't even a programming language, it's more like a meta-
language to build your own language.

Whether that's good or bad is up for discussion, but (a) it gives a lot of
freedom, (b) it invites tinkering, (c) it wastes a massive amount of time when
trying to communicate with C++ coders from other confessions, and (c) it makes
it hard to integrate C++ libraries into C++ projects.

edit: confession => 'denomination' seems to be the correct English term

~~~
Koshkin
> _a meta-language to build your own language_

All (good) programming languages are like that. In general, any API, any
library that you develop _is_ a language. I agree that C++ provides powerful
tools of abstraction that make these languages easier to use (compared to what
can be accomplished in C or Go, for example).

~~~
rtpg
My feeling has been that C++ has a special degree of complexity beyond most
languages. Many terms come up in the "day-to-day" that wouldn't come up in
other languages (lvalue/rvalue, move semantics), and so much can be
overwritten through operators.

I think the only other language with a similar feel is Scala. It's the
programming language equivalent to Magic The Gathering: half the fun is just
in the mechanics of it all

~~~
Koshkin
> _complexity beyond most languages_

I think complexity of C++ is merely a (partial) reflection of complexity of
programming. For example, move semantics that you have mentioned is not
specific to C++, and I find it nice when a language offers an explicit
formalism for it.

------
simias
I wonder if C++ is headed for COBOL-ification. This article is very
interesting but doesn't really give any hope for success in implementing a
modern module system in C++.

I feel like the only way to have truly useful, simple and intuitive modules in
C++ would involve breaking a certain amount of backward compatibility and
force a few constraints (project layout, file naming scheme, build system,
...) on the user. But clearly that's not really in C++'s DNA and you risk
ending up with something that's not quite C++ while at the same time not
having the simplicity and elegance of modern languages that have been designed
from scratch without all the baggage C++ carries.

C++ won't go away any time soon, that's for sure. But when I read articles
like TFA or for instance [https://bitbashing.io/std-
visit.html](https://bitbashing.io/std-visit.html) my gut reaction is always
that maybe if you get frustrated with C++'s extreme complexity, historical
baggage and user-unfriendliness you should consider moving to something else
instead of trying to bolt on even more exotic features on a language which is
already crumbling under the weight of all the features and programing styles
it already supports. I know I did.

As an example I stumbled upon this std::visit link by perusing the top posts
of reddit's /c/cpp (linked by TFA), here's the discussion:
[https://www.reddit.com/r/cpp/comments/703k9k/stdvisit_is_eve...](https://www.reddit.com/r/cpp/comments/703k9k/stdvisit_is_everything_wrong_with_modern_c/)

There's an interesting thread about how:

    
    
        variant<string, int, bool> mySetting = "Hello!";
    

Doesn't do what one might thing it should do. The variant ends up holding a
`true` boolean instead of a string. Why?

>char const* to bool is a standard conversion, but to std::string is a user-
defined conversion. Standard conversion wins.

If you know C++'s history and its C heritage it makes sense but it sure as
hell doesn't make me want to use C++ for my next project.

~~~
humanrebar
C++ is pretty close to treating unqualified string literals the same as native
pointers. That is, there for backwards compatibility and for rare edge cases,
but generally discouraged. I predict that, given a few years, we'll start
adding checks to linters and static analyzers to warn about using them because
of the ambiguity and implicit conversion issues.

There is already a std::string liberal. Better:

    
    
        MyVariant mySetting = "Hello!"s;
    

But std::string_view more closely models a string literal. std::string is more
of a string buffer. So even better:

    
    
        MyVariant mySetting = "Hello!"sv;

~~~
simias
That's good to know but it kind of proves my point, doesn't it? Do you know
many languages where literal strings come with a big warning sign saying
"probably not what you want, use this (rather opaque) alternative syntax
instead"?

The fact that pointers automatically coerce into bools makes some sense in C
but is an aberration for modern C++. It's just an example of a legacy
"feature" pointing its ugly head to sabotage a modern C++ construct.

If you look at C++ guidelines out there a lot of it is about what par of C++
_not_ to use. Don't use raw pointers, don't use raw arrays, use exceptions,
don't use exceptions, maybe use exceptions but only in some cases...

And a lot of the time the most intuitive notation, the one that goes all the
way back to C, is also the one you don't want to use. Don't use "Hello,
world!", use "Hello, world!"sv. What does the sv do exactly? Uh, we'll talk
about that in chapter 27 when we talk about user-defined literals. This is
right between the chapter about suffix return types and the one about variadic
templates.

~~~
smitherfield
_> The fact that pointers automatically coerce into bools makes some sense in
C but is an aberration for modern C++. It's just an example of a legacy
"feature" pointing its ugly head to sabotage a modern C++ construct._

This is a rather odd complaint; I can't think of any programming language
which has first-class support for nullable types, and where they aren't
truthy/falsy, at least in conditional contexts. It's a pretty straightforward
boilerplate-reducing idiom.

Did you mean to say builtin arrays coercing to pointers? (I'd agree that's
probably the biggest problem with C and, by extension, C++).

Or did you mean `NULL` being an `intptr_t` in C++? That's the very rare C++
misfeature that _doesn 't_ come from C (where it's a `void*`), but at least
C++11 `nullptr` fixes that.

~~~
kibwen
Java is a language where nullable types aren't truthy/falsy. The following
code won't compile:

    
    
      String foo = null;
      if (foo) {  // error: incompatible types: String cannot be converted to boolean
          System.out.println("was true");
      } else {
          System.out.println("was false");
      }
    

If one changes `if (foo)` to `if (foo != null)`, then the code will compile.

~~~
smitherfield
Right. I haven't written any Java in a while so it slipped my mind.

~~~
pjmlp
Same applies to any language derived from Algol linage.

------
twoodfin
I love strongly worded technical arguments.

I am not deeply familiar with the Modules TS, but couldn't each environment
solve the module name <-> translation unit mapping problem in its own way? For
systems that can't keep reliable track of changes to the underlying
representation (i.e., files) that would mean a fairly strict naming scheme.

The mapping doesn't have to be maintained by the compiler. C/C++ already
depend on a suite of more-or-less independent tools to produce runnable
software. I don't see how this problem conceptually is much worse than what a
linker has to do.

~~~
humanrebar
> I don't see how this problem conceptually is much worse...

I'm asking the same question to myself. There's a bit in the article about it
being _slower_ than the status quo, but I'd like to see actual benchmarks
before making pronouncements. If that hypothesis is true, then surely the TS
will fail before it makes it into C++20 proper.

------
slavik81
All I know is that C++ build times are painfully slow for large projects. Even
if you spend a lot of time and effort on making them as fast as possible,
they're still not good.

#include just results in a tremendous amount of code in each and every source
file. Even looking at a 6 line program, we see that after the preprocessor
runs, the compiler is given 18,162 lines to process.

    
    
        $ cat > main.cpp <<EOF
        #include <iostream>
        int main() {
          std::cout << "Hello World!" << std::endl;
          return 0;
        }
        EOF
        $ gcc -E main.cpp | wc -l
        18162
    

That is still nearly instantaneous, but it only gets worse from there and it
happens to every source file in the project. I just took at look at a 1,848
line source file I'm working with, and it expands out to 147,054 lines after
preprocessing. The current state of things is awful and modules are by far the
most important feature for the future of C++. We _need_ module support in our
build tools and we need them to be _fast_.

Should the TS be delayed to see if the Clang idea is better? I'm curious to
know what they're doing, but that's a lot to ask for. The TS is effectively a
beta, and pushing back its start means less time in beta before the C++20
standard is finalized. There's a lot of work that's ultimately going to be
built on top of this so getting it right is vital, but the clock is ticking.

~~~
pjmlp
CppCon 2016 has some Google presentations about how they use clang modules
(aka module maps).

"Deploying C++ modules to 100s of millions of lines of code"

[https://www.youtube.com/watch?v=dHFNpBfemDI](https://www.youtube.com/watch?v=dHFNpBfemDI)

"There and Back Again: An Incremental C++ Modules Design"

[https://www.youtube.com/watch?v=h1E-XyxqJRE&t=922s](https://www.youtube.com/watch?v=h1E-XyxqJRE&t=922s)

------
makecheck
Perhaps it’s made complex by searching for perfection when really you just
need something that makes a high percentage of builds better.

You’d be crazy to create a final release of something without starting from a
“clean” state, which means they DO NOT need a module implementation with 100%
accuracy in all of C++’s asinine corner cases. I wouldn’t trust that accuracy
even if they claimed it, I’d “clean” anyway.

This means that compilation speedup just has to ensure that _MOST_ incremental
rebuilds perform better without adding insane development overhead (such as
having to respecify things).

If 60% of the time my incremental rebuilds are faster, I’d say that is more
than enough to justify a C++1x release. They need to just move forward.

------
cousin_it
Forget C++ for a moment. In your dream language, if one source file uses
functionality from another, where should that fact be specified?

1) In the source file

2) In the makefile

3) In both

It seems to me that the only sane option is (1) with a project-wide or system-
wide set of "roots". Why is anyone in favor of (2) or (3)?

~~~
Koshkin
> _dream language... source file... makefile..._

I am dreaming of a programming system that takes advantage of the modern
developer workstations which are very powerful and thus can support more
sophisticated design and build tools than a text editor and a make. (IDEs
provide only an incremental improvement on top of these.)

~~~
majewsky
Please _do_ design it. Whenever someone does "visual programming" or something
like that, it usually fails for anything beyond toy examples.

The only counterexample that I can think of is LabVIEW, which is pretty
successful with (or rather, despite) its visual editing. (Its main selling
point is device compatibility, AFAIK.)

So if you have a design for a visual (or, more generally, "beyond-text")
programming system in your head that's any good, please let it out. I'd like
to see it. (And until then, I'm sticking to vim.)

~~~
Koshkin
No, I am not a huge believer in visual programming, either. Text is (still) a
better medium for formal specifications, which is evident from the wide use of
HDLs rather than visual tools in hardware design.

That said, nothing prevents anyone from having (admittedly rather vague) ideas
about using a computer as something more than a glorified typewriter.

------
rwmj
They should really look at how OCaml modules work, since they solve all the
same problems already, with desirable properties like separate compilation and
easily discoverable mapping of module name to source file.

The drawback of OCaml modules is that all dependencies must form an acyclic
graph, which can be a pain sometimes although there are well understood
workarounds.

~~~
DoofusOfDeath
> The drawback of OCaml modules is that all dependencies must form an acyclic
> graph, which can be a pain sometimes although there are well understood
> workarounds.

Can you elaborate on that?

It's hard to imagine a sane build system where the dependency graph contains
cycles.

~~~
lomnakkus
Usually, you just declare the module interface separately from the
implementation. I don't recall there being any to "work around" e.g. declaring
mutually recursive _types_ in two different modules[1]. (And I don't think
that's in any way sane, FWIW.)

[1] Of course you can parametrize one of the modules with a signature declared
elsewhere and the other module implement that signature, but that's really
just regular old dependency-breaking, so I'm not sure that counts.

------
jmgao
This seems reminiscent of export templates, which were a resounding success:
[http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2003/n142...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2003/n1426.pdf)

------
opk
I'm glad to see this point being made because it covers exactly what worried
me when I first saw this C++ modules specification. At work, I maintain the
build system for some mixed C++ and Fortran 90 code and prepocessor includes
really are a case of worse is better in my view. Generating build dependencies
for Fortran 90 modules is a horrible mess and can easily get broken. And I
can't really see that modules solve many real problems.

------
Const-me
I’m unaware about the recent development, but MS has very module-like features
in their version of C++ for decades already.

For classic C++ they have #import directive (1) Recently, for windows
store/UWP platforms, C++ can consume windows runtime components (2)

Two things that allow these features to work are standardized ABI, and
standardized type info format.

I don’t think high-level modules are possible in C++ before ABI and type info
are both standardized in a compiler-neutral way. Better yet, in a language
neutral way, like MS did that: you can implement a COM object in a script
language as a WSC file, and consume it from C++ using #import.

[1] [https://msdn.microsoft.com/en-
us/library/8etzzkb6.aspx](https://msdn.microsoft.com/en-
us/library/8etzzkb6.aspx)

[2] [https://docs.microsoft.com/en-us/cpp/windows/how-to-
activate...](https://docs.microsoft.com/en-us/cpp/windows/how-to-activate-and-
use-a-windows-runtime-component-using-wrl)

------
danblick
I'm confused; isn't the solution to this problem "gcc -M"?

------
slezyr
> Build tools will parse source files over time and keep track of which
> modules are where and which translation unit needs what.

Isn't this is what cmake does? IIRC it already parses the headers.

~~~
humanrebar
In vanilla cmake, you declare an ordered list of include directories.

But the point about "how does it work now?" is fair. I'm confused why a module
search path is any worse than an include search path.

~~~
lultimouomo
This is not related to the search path, but to determining which files need to
be rebuilt when a file changes.

The idea is that when you change a module, you might need to recompile all the
modules that include it, since its interface might have changed.

What I don't get in this criticism is that you have this exact problem right
now with headers, and it has been solved by the preprocessor outputting
dependency information (based on #include directives) for the build system to
read. Why can't the same be done with modules (of course it might not be the
preprocessor that does it, but the compiler or a specific tool)?

~~~
humanrebar
I think I'm catching on. With a '#include', you give a file name. With an
'import', you give a module name, which doesn't necessarily map to a file
name.

So the problem could be solved by saying 'import foobar;' maps to a file (on a
search path, maybe) named exactly "foobar". But then we need to work out how
subdirectories and module names map to each other. How do you import
"foo/bar.ixx"? "foo.bar"? "<foo/bar>"?

Though all that seems technically solvable to me. Maybe there's just
difficulty in designing this in the context of a committee.

------
namelost
I admit I know nothing about C++ modules proposals, but the post talks about
an "acyclic graph of translation units". Does this imply that modules can't
import one another?

What happens when two .cpp files want to do the equivalent of including each
other's header files?

------
yahna
dumb title.

nitpicking.

equating innocent, if terse, comments with threats.

Why am I reading what this person has to say?

------
_pmf_
Why do C++ developers make it so, so, so, so hard on themselves and the
unfortunate poor souls like me who have to interact with their products?

~~~
Koshkin
Perhaps, they just want to share with everyone their experience interacting
with the language of their choice?

(I am joking, but the complexity of the language may, in fact, reflect in the
relative complexity of the APIs, say.)

