Hacker News new | comments | show | ask | jobs | submit login
Killing the C++ Modules Technical Specification (izzys.casa)
128 points by jahnu 10 months ago | hide | past | web | favorite | 132 comments

> Effectively, without support for build tools, modules are effectively dead in the water.


> "We'll make the build systems understand".

This seems to be a critical point that I agree with: you can't divorce these two, their fates must be bound to one another. Hopefully the build systems can be easily modified to accommodate.

Rust community folks might be wrong about c++ but they're right about Rust. Rust had the luxury of not being bound to any legacy specifications and it wasn't unreasonable for them to write their own build+dependency/package management tool. The net effect of them having done that means that independent of all the other great things Rust brings, much of the tedium of developing with C/C++ is gone.

Make no mistake, C++ standard committee: the bar is high and Rust has set the bar.

> the bar is high and Rust has set the bar.

Modules were introduced in the 70's as concept, Mesa being one of the first languages to use them, followed by CLU and ML....

The issue with lack of modules in C++, was that Bjarne had the compromise C++ should fit into C toolchains used by AT&T, and later all C shops.

EDIT: I get Rust is quite cool, but maybe a bit of programming language research is also welcomed.

I think "the bar" is talking about languages a developer might write her next application in. Mesa and CLU aren't on the list. OCaml might be, depending.

Except Rust modules (which are the subject here) are anything but innovative, there are plenty of languages since Mesa, CLU and ML came to the world with equally or better module systems.

Who invented modules first is moot when you're picking the language to write your next project in. Just like who invented the assembly line is moot when you are shopping for your next car. "The bar" for a car isn't Ford because Henry Ford was innovative. People don't really care about any of that when evaluating their next choice.

So what has Rust brought into modules to set the bar that isn't already available in other systems programming languages, with exception of C and C++?

Ok, I am bit harsh here, but for me there ARE no other systems languages except C, C++ and recently maybe Rust. I am also told, that in distant lands with strange and foreign customs, Ada is a systems language. (I don't know - no Ada missionaries have yet set foot on our shares and preached it to us.)

>no Ada missionaries have yet set foot on our shares

I'm not sure if 'shares' is a typo or not, but it fits great with my impression of why companies choose the languages they do sometimes: everyone else uses it (it's the local culture/custom), and nothing/no one has forced them to change (by lowering their profits/share value).

It was a typo, should have been "shores". However, it was a brilliant typo - the powers that be have decided on Greenhill and ClearCase. And the people who decide these things seem to be closer to the share holders than they are to us coders, so Ada missionaries on shares is maybe what it takes.

You would do yourself a favour to improve your CS skills regarding the history of computing, maybe you know, get to learn some interesting stuff.

There are YouTube videos....

A really damn good execution.

Which in my mind matters more than all this pedantry.

In what sense do Rust modules provide a better execution over Ada packages?

If we define "the bar" as "people's expectations for what a useable module system looks like" you can set it without being novel at all

Note that I didn't mention innovation or novelty or originality. I just said that Rust "set the bar." To respond to your objection I'd just clarify by qualifying it better and refer to the "systems programming" or whatever domain C/C++ and Rust both belong to.

So then check modules in Modula-2, Modula-2+, Modula-3, Ada, Oberon, Oberon-2, Oberon-07, Active Oberon, Component Pascal, Turbo Pascal.

All used in the context of systems programming.

Rust modules don't add anything new to them.

It adds an excellent packaging system and developer experience in the form of cargo, which is also a package manager for external modules. "The bar", I think, is taken to mean the combination of all of these features and their execution, not the idea of modules.

A packaging system is orthogonal to modules.

Those ideas are also present in Maven, Gradle, NuGet. All older than cargo.

And all trace back to CPAN as general concept.

I am still waiting for cargo to support binary libraries.

Rust has not brought much new in terms of modules. It just has decent modules support according to established best-practice. The innovations are primarily around the borrow-checker.

That I agree with.

There are plenty of widely-used languages that have modules and came well before rust.

You seem to be responding to a claim that Rust invented modules, but I think nobody said that.

Nothing in Rust is original from PL theory perspective. It's deliberately not a research language, but a practical one. Rust set the bar here merely by showing up and delivering a decent implementation (in a language that you can actually use today).

Rust sets the bar with the borrow checker and safer concurrency.

Everything in terms of safety and modules was present in languages like Ada and Modula-3.

The word "module" has very different meanings. When I read about 1ml [0] I stumbled upon this:

> A definition like this, where the choice of an implementation is dependent on dynamics, is entirely natural in object-oriented languages. Yet, it is not expressible with ordinary ML modules.

Here, ML modules are compared to classes (not packages or modules has defined in Java, for example). This makes me wonder how often people talk past each other without noticing, when they discuss modules.

[0] https://people.mpi-sws.org/~rossberg/1ml/1ml.pdf

True, but there is an actual definition.


There are plenty CS books which offer a similar definition to Wikipedia.

For Java, class, package and (Java 9 Jigsaw) module all fit the Wikipedia definition of module.

We might have consensus about the definition, but it does not seem to be a useful precise one. The C++ working group seems to have a similar problem. Things like "compilation unit" have a precise meaning, but they cannot find consensus if it should relate to their "module" or not.

The basic concept of module is having language support for code separation, with data/code hiding, compiler type checking across modules and optimally a binary format.

I think the issue is more with some C++ devs that never used other languages, trying to grasp how modules fit into translation unit + PCH model they know.

> the bar is high and Rust has set the bar

I don't think Rust deserves much praise for its module system. It is widely regarded as being extremely confusing even for pros. There are efforts underway to make it saner:



> It is widely regarded as being extremely confusing even for pros.

No, it's mostly considered to be unintuitive, not confusing for people who know it. You're unlikely to "just get" how it works, but once you learn it it's perfectly reasonable.

Edit: Note that the RFCs don't actually change how the module system works, just the syntax

I would like modules to fix the issues with the header inclusion model and hundreds of thousand lines long translation units.

Anything else is mission creep at this point and risk delaying the effort further. Let's fix a problem at a time.

Weren't modules created by Herb Sutter at Microsoft to improve the VS201x IDe and compiler? If this is the case, it seems a big player in the tooling world is already on board.

No, Apple designed them for Objective-C initially, so some support was added to clang for all C languages, based on module maps.

Google improved that implementation for C++ and has a private implementation of it in use.

Gabriel dos Reis picked up some ideas from a former work he did together with Bjarne for MSVC++.

The ongoing C++ Modules TS is to merge the ideas from both sides into what will become C++ Modules.

Hmm, I suppose I was talking about the specific work to integrate modules in to the C++ standard. I recall hearing from Bjarne at a STAC talk he gave a year (or so) ago that he and Herb had been coordinating effort between MS and MSFT to use Modules in the Microsoft toolset to improve compiler performance.

The history of modules you provided is helpful though, thanks.

C++ has taken new features like auto and static if from D, not from Rust.

I would say for the C++ standard committee the bar is high and D has already set the bar.

I'm very familiar with C++, passingly familiar with the modules proposal and unfamiliar with Rust and this article leaves me puzzled exactly what the author expects / wants a modules proposal to do instead.

My perspective as a C++ developer has been that modules have one primary purpose, which is to improve build times by eliminating the need to parse huge amounts of C++ in recursively included header files when compiling a compilation unit. If that's all a modules proposal accomplishes I'm fine with that. A modules proposal that doesn't accomplish that is useless.

Build tools generally already parse .cpp files for includes to figure out header file dependencies for a .cpp file and have to combine the filenames extracted with information from the build system on include search paths to turn those into dependencies on specific concrete files. That involves information that lives in the build system and not in the code. Modules will require something similar from build tools but as with parsing includes the rules about imports (must be before actual code, must be alone on a line) parsing out the imports is trivial and cheap compared to recursively parsing the full code in header dependencies.

So I guess I don't quite understand the problem. Modules are an improvement over headers for build times but they are not going to do everything that module systems perhaps do in some other languages because they have to be adopted incrementally into the many different existing build systems used in C++ code bases (which live outside the standard). Perhaps my ignorance of Rust means I'm missing what the author wants in C++ that's different from the current proposal.

The author is just trying to prove that C++ modules cannot be like Java or Python modules, which apparently some people don't get. My view is that C++ programmers already know this, since the language will not change suddenly just because modules have been introduced. The idea that the build system will take care of that is perfectly fine for normal C++ users.

>The author is just trying to prove that C++ modules cannot be like Java or Python modules, which apparently some people don't get.

I don't think he trys that and i don't think it's true. The committee doesn't want to that, but I don't see why it would be possible by extending the language.

Introducte a module - keyword, that works pretty much like a namespace keyword. Force the file name and path to match the name a la java. Introduce a mechanism a la "using namespace" that does import the namespace (like #include header.h) of the module and tells the build system what to build. Neither needs to become the other to do this if we forbid preprocessor commands and if constexprs around that. Allow attributes to specify visibility.

Of course it's going to be a way more complicated, but i can't see how it would be impossible.

I think c++ devs just don't want that.

I’m not very familiar with C++ build tools which parse source code for header dependencies, but I know that C++ compilers (the preprocessor specifically) does this. From what I know of cmake and gmake, neither of these build tools automatically parse C++ files to find dependent header files, but both require manual specification of the include directories and include paths.

They require manual specification of include paths, but not of what .cpp files depend on what .h files. However, at least on Unix, this is not typically done by having the build tool parse C++ files. Rather, while compiling a given .cpp file, the compiler is instructed (using -MD or other variants starting with -M) to output a .d file listing all the header files it encounters as it goes, which are then treated as dependencies. The output format is a subset of the Makefile language with just file dependencies (like “a.cpp : b.h”) rather than compilation rules; thus, anything that uses make as a backend (including autotools, cmake in that mode, etc.) can just have the Makefile ‘include’ all .d files, and make will do the right thing. (Alternative backend build tools, such as ninja, have to parse the .d files themselves, typically supporting only that subset rather than reimplementing all of make!) You don’t have them for the first compile, but that’s okay: the dependencies are only used to skip recompiling files if none of their dependencies changed.

This design replaced an earlier one where the user had to manually run ‘make depend’, which would pre-scan for dependencies using a separate tool, makedepend(1). First ‘gcc -M’ was added as a straight replacement of makedepend, then -MD and variants were added to write dependency files while compiling each object file rather than requiring a separate invocation (which is slower).

But it sounds like modules will complicate things, since it could be mandatory to gather dependency info upfront in order to compile modules before files that depend on them...

Build pipelines I'm familiar with typically parse the includes from a .cpp file (and potentially recursively through the included headers) and resolve the includes to actual files based on include paths specified as part of the build configuration. Sometimes this information is cached in a .dep file or by some other mechanism. This information is then used to determine what actually needs to get built by an incremental build when a header changes.

The compiler is usually passed include paths to resolve includes when invoked. The build tool need to track this just to enable minimal incremental builds.

"The compiler must not become a build system" does not prevent the compiler generating local dependency info. In fact this is already possible today as many compilers (preprocessors really) are capable of generating header dependency info.

You will need a similar mode were the compiler does only minimal parsing of the source file and generate a dep file containing the list of modules defined in this file and the list of modules imported by this file.

The build system can then collet all these dep files and reconstruct the full dependency graph. Given the right format, you could probably feed them directly to make.

(Hi, author of the article)

The issue is, "how do you find these dependent modules?". As of right now, there is no way to find a module. It is not tied to a location in a subdirectory, nor is the name of a module tied to the file itself. I'm fine if we can get the dependency information, but how can we do that if there is no guarantee for finding the actual dependencies? Leaving these decisions unspecified by the C++ standard is dangerous. Having an important thing such as "how does the name of a module map to the name of the translation unit it contains?" be none of "undefined", "ill-formed", or "implementation defined" is asking for trouble. And to be quite honest, I don't see how running the compiler in a minimal parsing mode (once for each module), followed by running the compiler a second time for its full IFC (or whatever file format is used) and possible object file generation can be a good idea.

Running the compiler twice (first in preprocessor mode), then in actual compilation mode is exactly what is done now for many build systems. The first pass, in addition to building dependencies also generates the mapping from files to modules. You will need to list all the files you need to 'preprocess' in your build script, to generate the dependencies, but that's what it is done today already and I don't see a way out nor even a need to change. Modules are not supposed to enforce a specific build strategy, nor they should.

The standard currently imposes very little requirements on the actual build process, headers and source files do not even need to exist as actual files on a system, so it would be a lot of work to actually introduce these concepts in the standard and risk delaying modules further.

That doesn't mean that there might not valuable to standardize that, but it is another battle and many (I, for example) will argue that a strict mapping from file names to module names is wrong.

Forgive me, but I can't think of a single build system where the compiler is run on a source file twice. All of the ones I've seen support either parsing the .d file generated by a compiler, or parse /showIncludes if using MSVC. If you have any examples of build systems doing this, I'd love to see them so I can avoid them.

I've seen at least one, but it's proprietary and internal to my bigco. I'm willing to bet there's quite a few more at other bigcos. So lucky for you if you can avoid them, but I'm pretty sure there's many of us who don't get that choice.

> The build system can then collet all these dep files and reconstruct the full dependency graph.

And you actually want some tools to do a version of this. In particular, IDEs and smarter text editors. And, at scale, maybe even automated build systems. Though all these (not compilers and not build scripts) tools will likely want to do things incrementally, if it's possible and beneficial enough.

This reminds me a little bit of the C++ compilers (HP's?) that would build a database of instantiated STL types to avoid rebuilding std::basic_string<char> for every translation unit.

In fact, starting with C++11, you can do this in your projects. Make a common header file with contents like this:

    extern template class std::basic_string<char>;
    extern template class std::vector<int>;
    // etc...
and a corresponding .cc file like this:

    template class std::basic_string<char>;
    template class std::vector<int>;
    // etc...
The template instantiations you list will be compiled in that .cc's object file. Anyone who includes the header will assume those templates were instantiated elsewhere. Net effect is the listed templates are only compiled once.

(Of course this doesn't alleviate the problem of parsing the template header files over and over.)

See the section on explicit instantiation here: http://en.cppreference.com/w/cpp/language/class_template

Now I'm wondering what is a good way to measure my project's top 100 redundantly instantiated templates.

  find build/ -name \*.o -exec nm -C \{} \; | egrep ' W .*<' | sort | uniq -c | sort -n
is what I use. This lists all templated, implicitly instantiated, non-inlined symbols in increasing order of number of uses. Note that templates internal to libstdc++ will also show up and you may want to instantiate these too.

Note that explicit instantiation won't decrease the size of your object files, only the compilation time. (By default, the linker throws away duplicate instantiations.)

gcc has -ftime-report which is kinda opaque but still interesting.

Doesn't this prevent a lot of inlining? Or perhaps the compiler is smart enough in most cases?

No change to inlining. The definitions are still in the headers and the compiler will still eagerly inline them if they are defined in the class declaration (as typical) or defined outside the class declaration and explicitly marked inline (less typical).

(Whether inlining of non-trivial methods is desirable is - and in fact recently was - a discussion for another HN story.)

> Granted, the Rust community is typically wrong about nearly everything when comparing their language to C++...

What percentage of the Rust community is a professional C++ developer, either now or recently in the past? I was under the impression that "nearly all" was a fair estimate.

If that's true, does that mean that the C++ community itself is "typically wrong" about their language?

It seems most Rust programmers are switching over from Python (but C and C++ are next, followed by Java and Javascript, look near the end): https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2...

There is no "right" or "wrong" C++, there's also no single "C++ community". I'd say that C++ isn't even a programming language, it's more like a meta-language to build your own language.

Whether that's good or bad is up for discussion, but (a) it gives a lot of freedom, (b) it invites tinkering, (c) it wastes a massive amount of time when trying to communicate with C++ coders from other confessions, and (c) it makes it hard to integrate C++ libraries into C++ projects.

edit: confession => 'denomination' seems to be the correct English term

> a meta-language to build your own language

All (good) programming languages are like that. In general, any API, any library that you develop is a language. I agree that C++ provides powerful tools of abstraction that make these languages easier to use (compared to what can be accomplished in C or Go, for example).

My feeling has been that C++ has a special degree of complexity beyond most languages. Many terms come up in the "day-to-day" that wouldn't come up in other languages (lvalue/rvalue, move semantics), and so much can be overwritten through operators.

I think the only other language with a similar feel is Scala. It's the programming language equivalent to Magic The Gathering: half the fun is just in the mechanics of it all

> complexity beyond most languages

I think complexity of C++ is merely a (partial) reflection of complexity of programming. For example, move semantics that you have mentioned is not specific to C++, and I find it nice when a language offers an explicit formalism for it.

Just like any other multi-paradigm language.

More like half. A large fraction of the Rust community is folks coming from non-systems languages who saw Rust as a gateway to systems.

But we have both, and most of the folks doing comparisons with C++ do tend to be actual C++ devs (because if you're coming from Python or Ruby and haven't done C++ you really ... can't? compare with C++?). At least within the community folks pretty strongly care about misrepresenting other languages so if stuff is inaccurate it gets called out; and for most "rust vs foo" blog posts I've only rarely seen glaring mistakes a few times. Small mistakes are common but those are common to any kind of post.

I took the comment in the post as a bit of hyperbole and probably referring to subjective bits that Rust+C++ & only-C++ programmers tend to disagree on.

In general most of the vocal Rust folks who talk about C++ talk about C++ because they have lived it.


It can get more subtle than that too, though. For example, see this recent discussion I had on this topic: https://www.reddit.com/r/programming/comments/72p472/why_did...

That post in particular was a subject of discussion at CppCon for a hot second and is partially why I mentioned the Rust thing. There are other things as well, but I don't want to rake people if it isn't necessary. I pay attention to the Rust subreddit and I see enough misconceptions about C++ and how it works in various comments that it helped reinforce my choice to include that snarky statement. I do understand why so many C++ programmers at CppCon thought that Rust can be dismissed as a fad because of its community's understanding of C++. That said, this sort of parroting about how C++ works and why it is "bad" is everywhere and you can even see it in the comments on this very submission (such as mention of std::variant and implicit conversion to bool. That specific issue is easily remedied but wasn't part of the specification. It's been noted as a defect and will be fixed "soon"?)

I still chuckle when randos make arguments like "C++ can't be parsed with an LALR parser", or "virtual functions are more expensive than function pointers", as if the former is the end of the world, and the latter is a checkmate that will suddenly make my knees week, arms heavy. There's code in my editor, and it's just spaghetti.

> I still chuckle when randos make arguments like "C++ can't be parsed with an LALR parser"

I'm not an expert on parsers but I don't think LALR parser can evaluate constexpr C++ functions which is required for parsing C++.

E.g. in the following code, expression `A<f()>::a * u` will parse either as a variable declaration (int* u) or a multiplication (5 * 5) depending on the value returned by constexpr function f:

template<bool b> class A {};

template<> struct A<true> { typedef int a; };

template<> struct A<false> { static const int a = 5; };

constexpr bool f() { return true; }

const int u = 5;

int main() { A<f()>::a * u; }

I think that the OP point is that it doesn't matter much.

Interesting, thanks. I asked in that thread because I legitimately wanted to know; I very much know that my C++ knowledge is ancient, and that many things are very different now. Always learning!

One cause of this phenomenon is both sides sometimes oversimplifying a complex situation in different directions.

When people talk about "C++", they may mean:

- "C++ as it theoretically could be in when C++20 or so is used pervasively"

- "C++ as it can be when certain C++17 features are used pervasively"

- "C++ as it is often used in practice today"

(and this is still oversimplifying more nuanced realities).

It's very common for people to make observations about "C++" which are wrong on one sense and right in another.

This abrasive comment from the author really put me off from the piece. Doesn't help that I first read berium's reddit post which disagrees [1]. He seems to be involved in build2 which has implemented support for Modules which garners more of my trust in this conversation.

Coming from the C++ world and being fairly knowledgeable about the language (not a full language lawyer but someone people usually turn to) and learning Rust, I find it rare for there to be bad comments from the Rust folks. I don't understand where this is coming from.

Now for my own judgemental comment. I see a trend of people in /r/cpp and other places ignoring the differences (like safety) or misrepresent Rust.

To be clear, I fully accept people having differences in priority (how much safety needs to be compiler-enforced). Its people who don't recognize the difference that I'm bothered by.

Rust is also foreign enough that a trivial glance can be misleading like assuming exceptions are the equivalent of panics because someone got caught up on them having a similar implementation (stack unwinding by default) when in reality, Result is more what people should look at (transport information, ?-operator for unwinding, etc).

[1] https://www.reddit.com/r/cpp/comments/75eqal/millennials_are...

It does not follow. It could be that the subset of C++ programmers that are "typically wrong" about C++ are more likely to use rust.

I think the rest of the article elaborates on why I asked that. If only a subset of the C++ community is "typically wrong", that subset involves quite a lot of people, including those on the standards committee, at CppCon, and on /r/cpp.

> Granted, the Rust community is typically wrong about nearly everything when comparing their language to C++...

Well, where they're right is in having a modest and non-condescending attitude, which is something that C++ pundits should consider adopting.

You're generalizing about a group (C++ pundits) in the same way that the author generalizes about another group (the Rust community).

There are C++ experts who display modesty, just as there are members of the Rust community who display modesty. There is no need to try to retaliate against one person's words by hurling generalized insults against an entire group.

I wonder if C++ is headed for COBOL-ification. This article is very interesting but doesn't really give any hope for success in implementing a modern module system in C++.

I feel like the only way to have truly useful, simple and intuitive modules in C++ would involve breaking a certain amount of backward compatibility and force a few constraints (project layout, file naming scheme, build system, ...) on the user. But clearly that's not really in C++'s DNA and you risk ending up with something that's not quite C++ while at the same time not having the simplicity and elegance of modern languages that have been designed from scratch without all the baggage C++ carries.

C++ won't go away any time soon, that's for sure. But when I read articles like TFA or for instance https://bitbashing.io/std-visit.html my gut reaction is always that maybe if you get frustrated with C++'s extreme complexity, historical baggage and user-unfriendliness you should consider moving to something else instead of trying to bolt on even more exotic features on a language which is already crumbling under the weight of all the features and programing styles it already supports. I know I did.

As an example I stumbled upon this std::visit link by perusing the top posts of reddit's /c/cpp (linked by TFA), here's the discussion: https://www.reddit.com/r/cpp/comments/703k9k/stdvisit_is_eve...

There's an interesting thread about how:

    variant<string, int, bool> mySetting = "Hello!";
Doesn't do what one might thing it should do. The variant ends up holding a `true` boolean instead of a string. Why?

>char const* to bool is a standard conversion, but to std::string is a user-defined conversion. Standard conversion wins.

If you know C++'s history and its C heritage it makes sense but it sure as hell doesn't make me want to use C++ for my next project.

C++ is pretty close to treating unqualified string literals the same as native pointers. That is, there for backwards compatibility and for rare edge cases, but generally discouraged. I predict that, given a few years, we'll start adding checks to linters and static analyzers to warn about using them because of the ambiguity and implicit conversion issues.

There is already a std::string liberal. Better:

    MyVariant mySetting = "Hello!"s;
But std::string_view more closely models a string literal. std::string is more of a string buffer. So even better:

    MyVariant mySetting = "Hello!"sv;

That's good to know but it kind of proves my point, doesn't it? Do you know many languages where literal strings come with a big warning sign saying "probably not what you want, use this (rather opaque) alternative syntax instead"?

The fact that pointers automatically coerce into bools makes some sense in C but is an aberration for modern C++. It's just an example of a legacy "feature" pointing its ugly head to sabotage a modern C++ construct.

If you look at C++ guidelines out there a lot of it is about what par of C++ not to use. Don't use raw pointers, don't use raw arrays, use exceptions, don't use exceptions, maybe use exceptions but only in some cases...

And a lot of the time the most intuitive notation, the one that goes all the way back to C, is also the one you don't want to use. Don't use "Hello, world!", use "Hello, world!"sv. What does the sv do exactly? Uh, we'll talk about that in chapter 27 when we talk about user-defined literals. This is right between the chapter about suffix return types and the one about variadic templates.

I agree with you, but as Bjarne points out in "Design and Evolution of C++", without its 99% copy-paste compatibility with C, C with Classes would never had taken off inside AT&T.

It was his solution for not having to touch such a low level language, after being forced to exchange Simula for BCPL.

So yeah C++ is plagued with C compatibility, but it does provide enough tooling for anyone that cares about strong typing and safety.

Better alternatives are needed, but they will only get wide scale adoption like Apple is doing with Swift, pushing full speed ahead regardless of what devs think.

Hence why for me, even if it isn't ideal, Java / .NET languages + C++ for low level stuff is already kind of sweet spot.

> Do you know many languages where literal strings come with a big warning sign saying "probably not what you want, use this (rather opaque) alternative syntax instead"?

Haskell? Python 2?

(Also, C/C++ string literals are perfectly safe to use — they're guaranteed-null-terminated immutable arrays that implicitly convert to `string` and `string_view`; it's more that its safer in modern C++ for function parameters to be `string_view`/`string`/`const char(&)[N]` instead of `const char*`).

In Haskell there is only one form of string literal. It looks like "this".

Yes, but to get a reasonable string implementation you need

  {-# LANGUAGE OverloadedStrings #-}
plus lots of other ceremony.

(There's only one form of string literal in C++ too; the "s" and "sv" suffixes shown above are operators, not part of the literal).

Adding to your point, there are two string types you should use in Haskell (ByteString and Text) and neither of them are the type you get from an unadorned string literal (absent OverloadedStrings and some expectations).

What do you mean by "lots of other ceremony"?

> The fact that pointers automatically coerce into bools makes some sense in C but is an aberration for modern C++. It's just an example of a legacy "feature" pointing its ugly head to sabotage a modern C++ construct.

This is a rather odd complaint; I can't think of any programming language which has first-class support for nullable types, and where they aren't truthy/falsy, at least in conditional contexts. It's a pretty straightforward boilerplate-reducing idiom.

Did you mean to say builtin arrays coercing to pointers? (I'd agree that's probably the biggest problem with C and, by extension, C++).

Or did you mean `NULL` being an `intptr_t` in C++? That's the very rare C++ misfeature that doesn't come from C (where it's a `void*`), but at least C++11 `nullptr` fixes that.

Java is a language where nullable types aren't truthy/falsy. The following code won't compile:

  String foo = null;
  if (foo) {  // error: incompatible types: String cannot be converted to boolean
      System.out.println("was true");
  } else {
      System.out.println("was false");
If one changes `if (foo)` to `if (foo != null)`, then the code will compile.

Right. I haven't written any Java in a while so it slipped my mind.

Same applies to any language derived from Algol linage.

Haskell has this with `Maybe`, although you could write the Bool instance yourself pretty easily if you liked.

I'd say for strings it's entirely domain dependent. Many problem domain's won't ever use `std::variant` but may require that all strings live in ROM, so for those domains the default behavior makes perfect sense. If you want to compare C++ to modern languages, I get it, it's obtuse, but C++ is also used in instances where other languages can't be and so has to at least support doing things that may be unintuitive in other domains. That being said it would be nice if we could deprecate things with wholly better replacements like raw pointers, raw arrays, etc.

I love strongly worded technical arguments.

I am not deeply familiar with the Modules TS, but couldn't each environment solve the module name <-> translation unit mapping problem in its own way? For systems that can't keep reliable track of changes to the underlying representation (i.e., files) that would mean a fairly strict naming scheme.

The mapping doesn't have to be maintained by the compiler. C/C++ already depend on a suite of more-or-less independent tools to produce runnable software. I don't see how this problem conceptually is much worse than what a linker has to do.

> I don't see how this problem conceptually is much worse...

I'm asking the same question to myself. There's a bit in the article about it being slower than the status quo, but I'd like to see actual benchmarks before making pronouncements. If that hypothesis is true, then surely the TS will fail before it makes it into C++20 proper.

All I know is that C++ build times are painfully slow for large projects. Even if you spend a lot of time and effort on making them as fast as possible, they're still not good.

#include just results in a tremendous amount of code in each and every source file. Even looking at a 6 line program, we see that after the preprocessor runs, the compiler is given 18,162 lines to process.

    $ cat > main.cpp <<EOF
    #include <iostream>
    int main() {
      std::cout << "Hello World!" << std::endl;
      return 0;
    $ gcc -E main.cpp | wc -l
That is still nearly instantaneous, but it only gets worse from there and it happens to every source file in the project. I just took at look at a 1,848 line source file I'm working with, and it expands out to 147,054 lines after preprocessing. The current state of things is awful and modules are by far the most important feature for the future of C++. We need module support in our build tools and we need them to be fast.

Should the TS be delayed to see if the Clang idea is better? I'm curious to know what they're doing, but that's a lot to ask for. The TS is effectively a beta, and pushing back its start means less time in beta before the C++20 standard is finalized. There's a lot of work that's ultimately going to be built on top of this so getting it right is vital, but the clock is ticking.

CppCon 2016 has some Google presentations about how they use clang modules (aka module maps).

"Deploying C++ modules to 100s of millions of lines of code"


"There and Back Again: An Incremental C++ Modules Design"


Perhaps it’s made complex by searching for perfection when really you just need something that makes a high percentage of builds better.

You’d be crazy to create a final release of something without starting from a “clean” state, which means they DO NOT need a module implementation with 100% accuracy in all of C++’s asinine corner cases. I wouldn’t trust that accuracy even if they claimed it, I’d “clean” anyway.

This means that compilation speedup just has to ensure that MOST incremental rebuilds perform better without adding insane development overhead (such as having to respecify things).

If 60% of the time my incremental rebuilds are faster, I’d say that is more than enough to justify a C++1x release. They need to just move forward.

Forget C++ for a moment. In your dream language, if one source file uses functionality from another, where should that fact be specified?

1) In the source file

2) In the makefile

3) In both

It seems to me that the only sane option is (1) with a project-wide or system-wide set of "roots". Why is anyone in favor of (2) or (3)?

4.) In the source file where the two are composed.

In general, modules should not import their dependencies, but have them provided to them by their common parent. See Newspeak's module system as an example.

Radical! But doesn't that mean the first module can't be typechecked in isolation? I see Newspeak is dynamically typed so maybe it's not a problem there, but what's the right solution for statically typed languages then?

Newspeak is intended to be statically typed, presumably optionally/gradually. I think the implementation is missing right now, though the syntax already allows type annotations.

As the other commenter wrote, the answer is that modules can never "import" concrete modules, only interfaces. And that means some extra effort, though having type-inference assist with the construction of that module should help. Definitely something that needs figuring out.

IMHO, explicit composition is something we really need to figure out in general, and it should solve the weird mix of linguistic and extra-linguistic mechanisms we have now.

Specify an interface.

Alternatively, infer the interface and type check that in the composing module.

I think D uses 3. You import a module in the source file, but if you need to include outside source files, then they would have to be listed when invoking the compiler. D also has a package system so that this isn't needed if you have sub-modules within a larger project.

So sortof like the -I flag in c++ compilers?

Yes, but D also follows the convention of module names being mapped into the filesystem, or package name if it is a directory with a package.d file in it.

D also has the -I flag. Still needed for libraries in other folders.

There are many kinds of dependencies. If X needs Y to compile I would expect the dependencies to be specified in the code.

If X needs one of (Y1, Y2, ...) to run - I could understand some kind of external configuration.

It's especially useful for frameworks and game engines. Kinda strange to edit sourcecode of tomcat or unreal engine to configure it to use particular regexp implementation.

Another aspect is when the linking happens.When building the application? When running the application (plugins)? You mention "to compile" and "to run", but that is an orthogonal design dimension.

So, currently we have four ways to resolve dependencies:

1. Specific at compile time (import in source file via compiler)

2. Unspecific at compile time (via build system)

3. Specific at run time (shared library via OS)

4. Unspecific at compile time (via plugin system)

> dream language... source file... makefile...

I am dreaming of a programming system that takes advantage of the modern developer workstations which are very powerful and thus can support more sophisticated design and build tools than a text editor and a make. (IDEs provide only an incremental improvement on top of these.)

Please do design it. Whenever someone does "visual programming" or something like that, it usually fails for anything beyond toy examples.

The only counterexample that I can think of is LabVIEW, which is pretty successful with (or rather, despite) its visual editing. (Its main selling point is device compatibility, AFAIK.)

So if you have a design for a visual (or, more generally, "beyond-text") programming system in your head that's any good, please let it out. I'd like to see it. (And until then, I'm sticking to vim.)

No, I am not a huge believer in visual programming, either. Text is (still) a better medium for formal specifications, which is evident from the wide use of HDLs rather than visual tools in hardware design.

That said, nothing prevents anyone from having (admittedly rather vague) ideas about using a computer as something more than a glorified typewriter.

I cannot imagine any fundamental improvements. Care to share your dream?

> I cannot imagine

I know, it's hard - people could not imagine a car looking different from a horseless carriage back in the day...

(Speaking of the "dream", it is something along the lines of modern environments for Smalltalk. When you model a system, you do not necessarily want to be repeatedly going through the rigid edit/compile/debug cycle; the environment could in theory give you more "immediate" experience modeling the system you are trying to develop.)

Ah, the Bret-Victor-dream. There are IDEs which provide this. For example, Eclipse can do hot code replace for Java. The Unreal Editor can also hot reload dlls. Many web frameworks come with auto-update functionality.

Why would my dream language have a makefile?

They should really look at how OCaml modules work, since they solve all the same problems already, with desirable properties like separate compilation and easily discoverable mapping of module name to source file.

The drawback of OCaml modules is that all dependencies must form an acyclic graph, which can be a pain sometimes although there are well understood workarounds.

> The drawback of OCaml modules is that all dependencies must form an acyclic graph, which can be a pain sometimes although there are well understood workarounds.

Can you elaborate on that?

It's hard to imagine a sane build system where the dependency graph contains cycles.

Usually, you just declare the module interface separately from the implementation. I don't recall there being any to "work around" e.g. declaring mutually recursive types in two different modules[1]. (And I don't think that's in any way sane, FWIW.)

[1] Of course you can parametrize one of the modules with a signature declared elsewhere and the other module implement that signature, but that's really just regular old dependency-breaking, so I'm not sure that counts.

the other huge problem is the poor support for hierarchical namespaces at the package level. nothing like linking your code with a library and having things break because you have a module name clash

Yes, if you give all your modules a generic "toplevel" name like "Utils". However if you use underscores to namespace things you get something a lot like a hierarchy. You just need to have the discipline to call your modules "Myproject_Utils".

Now I didn't say that OCaml has a perfect module system, although some of the first-class module features are stunning, but my point is that it solves a lot of the problems that this C++ guy is complaining about.

right, that's the standard workaround, but it really seems like something the compiler should be doing for you. i find it a very surprising wart in an in general very well-designed module system.

This seems reminiscent of export templates, which were a resounding success: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n142...

I'm glad to see this point being made because it covers exactly what worried me when I first saw this C++ modules specification. At work, I maintain the build system for some mixed C++ and Fortran 90 code and prepocessor includes really are a case of worse is better in my view. Generating build dependencies for Fortran 90 modules is a horrible mess and can easily get broken. And I can't really see that modules solve many real problems.

I’m unaware about the recent development, but MS has very module-like features in their version of C++ for decades already.

For classic C++ they have #import directive (1) Recently, for windows store/UWP platforms, C++ can consume windows runtime components (2)

Two things that allow these features to work are standardized ABI, and standardized type info format.

I don’t think high-level modules are possible in C++ before ABI and type info are both standardized in a compiler-neutral way. Better yet, in a language neutral way, like MS did that: you can implement a COM object in a script language as a WSC file, and consume it from C++ using #import.

[1] https://msdn.microsoft.com/en-us/library/8etzzkb6.aspx

[2] https://docs.microsoft.com/en-us/cpp/windows/how-to-activate...

I'm confused; isn't the solution to this problem "gcc -M"?

> Build tools will parse source files over time and keep track of which modules are where and which translation unit needs what.

Isn't this is what cmake does? IIRC it already parses the headers.

An alternative approach to parsing headers is to observe which files the compiler reads when compiling a certain source file, and recording these as dependencies of the source file. At least tup(1) can do that: http://gittup.org/tup/

I don't think you recall correctly. CMake is not a build tool in that sense (invokes compiler/linker), it's a build generator. The build tool that actually has to track headers is whatever runs on the output of CMake.

In vanilla cmake, you declare an ordered list of include directories.

But the point about "how does it work now?" is fair. I'm confused why a module search path is any worse than an include search path.

I think it's worse because even with a list of directories to look in, you still don't know what _file_ contains the module you're looking for. Literally, you need to open up each file in each directory of the specified search path and possibly even _run it through the preprocessor_ in order to determine if it's the module you're looking for. And assuming you don't want to do all this every time, then you need to maintain a cache somewhere, and all of this logic has to be part of the compiler rather than the build system.

Anyway, Java dealt with a bunch of this by enforcing that the file naming and directory hierarchy would match the class hierarchy.

This is not related to the search path, but to determining which files need to be rebuilt when a file changes.

The idea is that when you change a module, you might need to recompile all the modules that include it, since its interface might have changed.

What I don't get in this criticism is that you have this exact problem right now with headers, and it has been solved by the preprocessor outputting dependency information (based on #include directives) for the build system to read. Why can't the same be done with modules (of course it might not be the preprocessor that does it, but the compiler or a specific tool)?

I think I'm catching on. With a '#include', you give a file name. With an 'import', you give a module name, which doesn't necessarily map to a file name.

So the problem could be solved by saying 'import foobar;' maps to a file (on a search path, maybe) named exactly "foobar". But then we need to work out how subdirectories and module names map to each other. How do you import "foo/bar.ixx"? "foo.bar"? "<foo/bar>"?

Though all that seems technically solvable to me. Maybe there's just difficulty in designing this in the context of a committee.

I think cmake invokes the compiler to get header file dependencies generated as make. Such as -MM flag in gcc.

I've never heard that before. Do you maybe have a source for that? The Visual Studio compiler is way too slow for that to work on a big code base, so I highly doubt that this is true.

To clarify, cmake doesn't directly invoke the compiler. It generates a makefile that will ask gcc to generate make-dependencies when it's compiling (the first clean build doesn't need dependency info as everything is rebuilt anyway so this works). These dependencies are then included in the makefiles in subsequent runs. Cmake itself doesn't need to know about them.

When using cmake to generate VS projects it probably does something equivalent, unless msbuild has dependency tracking built in out of the box, I don't know msbuild very well.

I admit I know nothing about C++ modules proposals, but the post talks about an "acyclic graph of translation units". Does this imply that modules can't import one another?

What happens when two .cpp files want to do the equivalent of including each other's header files?

dumb title.


equating innocent, if terse, comments with threats.

Why am I reading what this person has to say?

Why do C++ developers make it so, so, so, so hard on themselves and the unfortunate poor souls like me who have to interact with their products?

Perhaps, they just want to share with everyone their experience interacting with the language of their choice?

(I am joking, but the complexity of the language may, in fact, reflect in the relative complexity of the APIs, say.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact