Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Apple's Module proposal to replace headers for C-based languages [pdf] (llvm.org)
385 points by _djo_ on Nov 26, 2012 | hide | past | favorite | 177 comments


Overall I like it. I like how they are treating both C and C++ as first-class citizens of this new feature (instead of, for example, inextricably tying its design to classes and namespaces). I like that they have a plausible migration story for how to interoperate with existing header files. And the overall design really looks like something that would fit into all of the C and C++ work that I do without getting in the way.

Sure it's non-standard and no one who cares about portability will use this (yet). But this is exactly the way that good ideas get refined and eventually standardized. You surely wouldn't want to standardize a module system that hadn't been already implemented and tested -- that would just leave you with surprises when theory meets reality.

C and C++ are here to stay -- we should be open to improvements in them.

They don't explicitly mention this, but I'm sure that they have no plans to remove existing #include functionality -- it is a near certainty that someone, somewhere depends on having the preprocessor state affect how an include file is processed. There are probably even cases where you can look at the design rationale for this choice and say "yep, that really is the best solution for what you are trying to do."


> Sure it's non-standard

Doug chairs the study group that's evaluating a module system for c++ ("""Sutter announces there is a Study Group for modules and Doug Gregor is the chair.""" http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n338...)

Doug happens to be employed by Apple, but calling this "Apple's proposal" is somewhat misleading I think.


Have you ever worked at a company that develops compilers? Do you know anyone on the clang team at Apple? Do you know Douglas Gregor?

dgregor is clearly heavily involved and the likely the principle architect of this proposal, but he did it on Apple's behalf to solve their problems, by one of their employees, with clear input for a number of Apple's internal stakeholders. Yes, he is the chair of study group, but is participation in WG21 is funded by and at the behest of Apple.

The germ of every idea starts with one person, but many (most?) ideas require the involvement of many people to make them happen. Where credit lies is complicated, but I think it is completely accurate to refer to this as either Doug's proposal or Apple's proposal.


I do work at a company (not Apple), developing compilers -- specifically Clang. I know most of the folks working on Clang at Apple, and I know Douglas Gregor quite well. I'm on Doug's study group working on modules.

So, I think I have some insight into what's going on. The design and implementation of the modules support in Clang is absolutely being driven by Doug at Apple. Anyone can see that. =] However, there are some other aspects to this effort that were not the focus of an LLVM dev meeting talk. Daveed Vandevoorde has written the proposals to the committee thus far[1], and is continuing to work on the proposal and language-design side[2]. Doug is currently the one driving the implementation forward, but Clang and this implementation is completely open source, and the intent (to my knowledge thus far) is absolutely to converge with the proposed standardized feature.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n334... [2]: http://youtu.be/8SOCYQ033K8

I expect lots of others will end up contributing ideas and and implementation effort long-term, even though Doug has charted the course on the implementation side and Daveed on the proposal side thus far. I don't think this is at all likely to become a vendor-specific extension with no standards support. This is something the committee is really actively pursuing with broad interest across organizations and representatives.


I'm wondering: All modules proposals so far have come from Daveed Vandevoorde including the last one [1].

As far as I can see clang is just the first compiler implementing this proposal which is a very important step for standardization, because prior implementation experience ends up influencing the process heavily.

[1] : www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3347.pdf


> instead of, for example, inextricably tying its design to classes and namespaces

Considering it's Apple-designed and Apple works mostly with C (and Obj-C), it's not really surprising that they designed it for C and working with C++ rather than design it for C++.


Apple uses C++, even on mobile. The iBooks acknowledgements lists Boost and Google Protobufs.


That Apple uses C++ (you could also have mentioned LLVM) does not change their production and API being mostly C or Obj-C, and thus their proposals are usually for C. A previous examples was blocks, they created those for C and Obj-C, they can be used from C++ but are completely incompatible with C++11 lambdas as far as I know.


You can now implicitly cast lambdas to blocks, although they're otherwise completely separate (and must be - C++'s approach wouldn't easily translate to C).


They can be cast the other way, too. No?


Casting to a C++11 lambda doesn't really make any sense; each lambda function is a unique anonymous type. There's also no need to do any casting to use an Obj-C block anywhere that you can use a C++11 lambda, since they obviously already implement the Callable concept.


Also webkit. Can't they just treat lambdas as types(class/struct) with operator() methods?


Probably in other areas, but a major one is in IO-Kit (their device driver framework) which is written in and uses a specialized subset of C++ called Embedded-C++, though the userspace API for it has a C wrapper.


A lot of Apple's userland C APIs are also wrappers for C++ implementations, but it varies by project.


And also like... the XNU kernel?


Having a limited C++ background, started with C++ in college, I have a half remembered idea about how #include works and linking and such not. When I started working with Obj-C, I found the mix of include and import. This proposal looks like Apple is migrating their* import (module) system to C and C++.

As for removing #include, at least in Obj-C, #include is still there to allow working with C libraries. I would assume the same for C & C++.

* Not really sure what is Apple's Obj-C versus a standardized Obj-C


#include and #import are nearly identical. Both simply take the contents of the file you point them at and blindly copy that text into your source file at that location. The only difference is that #import will ignore subsequent attempts to insert the same file, while #include will obediently insert the same contents over and over again.

Sadly, there's no such thing as "Standardized Obj-C", and Apple's version is about as close as it comes. They basically own the language.


Hmmm...I thought there was a bit more to #import than that, but good to know.


With regards to backward compatibility with include: modules will still be preprocessed before they are assembled. This is necessary to drive much of the preprocessor meta-programming that is around in various libraries and which will not go away with variadic templates due to performance reasons.


This looks promising, aside from being long overdue. Header files have always been one of the more annoying parts of C/C++/Obj-C development.

The important bit is that the proposal's ideas for making the transition easier are good and make it seem like this may get traction where similar efforts have failed before. That Doug Gregor and other LLVM/Clang/LLDB developers are already working on the Clang implementation is even better. At the very least we may see this in Objective-C.


I think it is promising and long overdue, too. It also is clear to me that this will win, because Apple pushes it into LLVM, and has a head start at it.

On the other hand: if someone would do the equivalent to their browser, people would call it fragmentation.

It will be interesting to see how gcc reacts to this. If this decreases compilation times significantly, I think they will have to follow suit.


> On the other hand: if someone would do the equivalent to their browser, people would call it fragmentation.

This happens all of the time in browsers. See: Dart, vendor prefixes, JavaScript, etc, etc.

This is also absolutely nothing new in compilers. GCC has had a bucket load of its own C extensions for years (decades), as have many other compilers from many vendors.

Vendor-specific extensions are par for course. In fact, they're a good thing! The first step in moving a standardized language forward is to have the vendors designing and adding non-standard extensions so that they can experiment with ways to "scratch the itch" they're feeling. Good extensions get taken up in committee, and if they can be made palatable to all involved, they get standardized. Bad extensions die on the table.

Having tried-and-tested features drive standardization allows hindsight and experience to strengthen resulting standards. We call the opposite, where the standards body invents a feature out of whole cloth with no example implementation having been tested in the real world, "Design by Committee". This strategy does not have a high reputation for quality and success.


There is almost no way this won't decrease compilation times, particularly with non-aggressive optimizations. With optimizations off, the vast majority of compilation time in C++ is processing the standard header files over and over again.


> On the other hand: if someone would do the equivalent to their browser, people would call it fragmentation.

I haven’t seen many people calling Dart, Pepper, Native Client or the Chrome Web Store “fragmentation”.


With "the equivalent" I wasn't thinking about creating entirely new ecosystems; I was thinking of tweaking the existing ecosystem to favor one's own tool chain.

The moment somebody writes the first module, the world sees the first 'best compiled with clang' code. Once that code uses #import to improve compilation speed, that could become 'must be compiled with clang'. That, IMO, is similar to adding an extra tag to your browser's HTML dialect and then promoting its use.

I expect Apple is aware of the risk and wants to to prevent this (the fact that they plan a 100% compatible middle step is an indication of that), but the risk is there.

Also, people have argued that the effort spent on Native Client should be spent on improving JavaScript performance because of the fragmentation issue. Responding to another comment: people also have complained about the gcc-isms present in lots of open source libraries.


The ecosystem is the www, not Javascript and HTML, so those are all tweaking and extending the ecosystem.

The moment somebody writes the first Dart or NaCl based website or the first Pepper plugin, the world sees yet another (not the first) "must be viewed with Chrome".


Websites written in Dart work just fine in any standards-compliant modern browser. You just compile it to JS the same way you do CoffeeScript.

This makes sense because no widely deployed browser, not even Chrome, has the native Dart VM in it. You can download a build of Chromium with the Dart VM in it, but Chrome itself doesn't have it (though we hope it will when the time is right).

Check out api.dartlang.org. If the site works for you then your browser supports Dart just fine. :)


Right, I forgot about that, thanks.


Objective-C is certainly important, but C++ and Objective C++ is where I think this is super important to see. I don't care which solution wins (yet, I might after more inspection/convincing), we just need A solution, and one that works for some older projects with little work.


It's been awhile since I did large scale C++, but I remember precision with header file inclusion was a big deal in C++ projects, to the point where you'd mangle your class structure if it'd keep a cascade of header dependencies out of a set of source files. It looks like this module proposal would work at cross purposes to that effort.


I don't see why this is at cross purposes. The point of this is to do the following transformation to the Big O numbers

M x N -> M + N

With M being numbers of included headers and N being the number of files including that header.

I think that would be solving that mangling of stuff not working at cross purposes.

The point is to make it so you go over each file once, rather than multiplicatively many as you do now.


That's what the previous best practices (impl pointers and such) did with C++ before; the goal of some of those practices was to minimize the constant factors. I may be misreading how "modules" work, but they seem to accept that sprawling dependencies will get pulled in (if only once).


This proposal supports "import std.stdio;" type syntax as well. That seems to greatly help fix the constant part of the equation as well.

(In the selection called "Selective Import"). We can't save the world from horrible coders, but selective import would drastically drop the constant costs.


I think the submodule system will help address this issue. People will often import the entire standard library just for one small feature. If submodules are implemented correctly we can reduce that.

I'm a little wary at such a major change to C, though.


While LLVM authors probably know best, I don't understand some of his criticisms on the "Inherently Non-Scalable" slide.

  • M headers with N source files ->  M x N build cost
It's only MxN if there is no use of the "#ifndef _HEADER_H" workaround that he mentioned earlier. Wouldn't adding a preprocessor directive like "#include_once <header.h>" solve this? Alternatively, these guards could be added to the headers themselves without changing the preprocessor. This probably should be a parallel set of headers (#include <std/stdio.h>) to avoid breaking the rare cases that depend on multiple inclusions, but creating that set would be a simple mechanical translation.

  • C++ templates exacerbate the problem
I'm mostly a C programmer, so I have no argument here.

  • Precompiled headers are a terrible solution
Why is this? It likely would break the ability of headers to be conditional on previous #define statement, but since the proposal does this anyway it doesn't seem insurmountable. Along that lines, how does this proposal handle cases where one needs/wants conditional behavior in the header such as "#ifdef WINDOWS" or the like? And is caching headers during the same compilation also "terrible"?


Instead of "#ifndef _SOME_HOPEFULLY_UNIQUE_NAME_H" I've started using "#pragma once" -- it's likely supported by every compiler you care about.

http://en.wikipedia.org/wiki/Pragma_once


That can cause problems if a header is accessible by two different paths (which is not uncommon in a large project). Using both gives you maximum efficiency without sacrificing correctness, and it works everywhere.


Thanks, I wasn't aware that it was that well supported. Reading more about this, I also learned that GCC supports the Obj-C once only #import in C and C++ as well.


> Wouldn't adding a preprocessor directive like "#include_once <header.h>" solve this?

No -- the point is still that you recompile each header for each source file that uses it. Running the compiler N times (once for each source file) costs M x N, not M + N.

> Why is this?

I agree with you here, I think this dismissal of precompiled headers needs a more thorough explanation (but perhaps it was given orally in the talk).


> Why is this?

To get any benefit from the use of precompiled headers at all, you need to include practically every single header in your project in your precompiled header. Doing this is the opposite of modular.... it tightly couples each of your N compilation units with each of the M modules (changing any one of the modules bundled in the pch will force you to recompile all compilation units using the pch.)


> changing any one of the modules bundled in the pch will force you to recompile all compilation units using the pch

Is there a fundamental reason the compilation can't be optimized away?

For example, given a header file A.h, couldn't you precompile it and generate a bloom filter of all preprocessor tokens contained therein (except ones defined in files that A.h includes)? Then if B.h (which includes A.h) changes, see if any of the preprocessor macros that are defined at the point of inclusion are in A's bloom filter. If not, you can safely skip the recompilation of A.h. If so, you should probably change A.h anyway to directly include the definition of any macros that are affecting its compilation.


I wonder how necessary this actually is. I think using non-command-line defines to change the behavior of header files is rare. From what I can tell, ccache just ignores the issue: https://ccache.samba.org/manual.html#_how_ccache_works

I'm not sure how it gets away with this and works as well as it does. Maybe there is a behind the scenes check that I don't know about? In any case, it might be a good framework to add the caching you describe.


ccache doesn't cache the results of compiling each individual header. Other than the shared cache, ccache is conceptually just a workaround for deficiencies in make.


This is an implementation detail that is fixable without language changes: it is often the case that two header files are semantically disjoint and the compiler (partly the preprocessor: you'd have to more incrementally update the compile after each include) could keep track of these relationships so it could safely merge results later; it is certainly no harder than a "merge" comparison of two sorted token lists. Their goals do go past this (their issues about "resiliency"), but it is seems disingenuous to include these performance issues when there are less drastic solutions.


I don't follow. I'd think that for any given header, you would only have to recompile it if the #include dependencies listed in that file (or their subdependencies) changed. I'd hope that for most projects this is far from every header. When would this happen?

It also seems like this would be a win even if only used for rarely-changing system headers. As shown in the slides, for small projects the lines of unchanging standard includes dwarf the project specific code. Wouldn't it be a big win to avoid parsing and compiling all of these?


No -- the point is still that you recompile each header for each source file that uses it.

You're right, I was thinking about this wrong. I was thinking that the main cost is not the direct includes (M x N), but that each header further includes other headers. (M x N^O). One-time includes prevent the exponential explosion, but you are still left with M x N. But I tend to use ccache to avoid unnecessary recompilations, so I rarely feel the brunt of this.


Wouldn't adding a preprocessor directive like "#include_once <header.h>" solve this?

No, that means that for foo.c a given header is only added once. It doesn't prevent bar.c from importing and recompiling that same header.


>> Precompiled headers are a terrible solution > Why is this?

Think about it: the header potentially depends on any "define" value defined before the header is processed. If we'd have modules that define interfaces that don't depend on macros, we can process all the definitions that make the module only once for all the obj files that depend on it.

You typically don't need conditionals on if another module is used or not. You need conditionals to compile the whole application or the dynamic library some specific way, but that is also only once per the compilation of the whole application.

At the moment there is really code that can include more times he same header to define different types, but these are extreme use cases, however most of the libraries would gain if they would be defined as "modules" definitions of which can be compiled only once per application build.


> Wouldn't adding a preprocessor directive like "#include_once <header.h>" solve this?

Obj-C already has this: #import http://en.wikipedia.org/wiki/Objective-C#.23import

It’s just #include with built-in guards, but headers still have to be compiled once per compilation unit.


Perhaps "terrible" is a bit strong, but precompiled headers are basically a hack. They can hardly compete with a proper redesign.

I'm not sure what you'd do about the conditional behaviour, though - seems like something of an omission. I've found it handy, and it is used a lot by the Windows headers. (I don't recall seeing it much outside Windows, though - perhaps a case of "Unix doesn't use it, OS X doesn't use it, RISC-OS doesn't use it, AmigaOS doesn't use it, Windows DOES use it - OK, four to one, it's not important"? ;)


Re: Precompiled headers

It may be possible to do precompiled headers right. However, compiler vendors have been trying to do so since the mid 90s and haven't succeeded yet.

The reason I personally like this proposal better than PCH is that it requires explicit use. You can slowly convert libraries to the import syntax while still #including other headers that require conditional stuff. Seriously, just converting all #include <anything from C++ STL or Boost> to this will save 10s of gigabytes of source from being compiled on some projects I've seen.


It's only MxN if there is no use of the "#ifndef _HEADER_H" workaround that he mentioned earlier.

There still is a processing cost for evaluating the instruction


For GCC this is highly optimized:

"The preprocessor notices such header files, so that if the header file appears in a subsequent #include directive and FOO is defined, then it is ignored and it doesn't preprocess or even re-open the file a second time. This is referred to as the multiple include optimization."

http://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html


Except that the contents of foo.h might be different depending on where it is being included.


Overall this seems good. The lack of modules in C/C++ is a huge pain. The "link" section seems like a really leaky abstraction, though.

  module ClangAST {
    umbrella header “AST/AST.h”
    module * { }
    link “-lclangAST”
  }
This hardcodes an implementation-specific syntax and yet says nothing meaningful. Drop the "-l" and you're just restating the name of the module. What value is there in baking a command-line flag into the module definition?

P.S. I also find it strange that something intended to blend with C/C++ doesn't use semi-colons. This is just stylistic, though.


I think I'm the only person who likes headers. I'm not overly concerned with compilation times and big-o notation. Computers can compile things really fast nowadays.

I'm more concerned with the developer usability benefits & drawbacks of the feature. As somebody who is a polyglot, but spends a large amount of time writing Objective-C, I have come to absolutely love header files.

I see header files almost as documentation. To me, a header file is a description of everything that's public about an API. My header files tend to be very well commented, and very sparse, only containing public methods and typedefs.

When the need arises to make internally-includable headers (say I'm writing a static library, and have methods that are private to the library, but public to other classes within the library), I will usually write a `MyApi+Internal.h` header for internal use, which doesn't ship with the library.

A developer should never have to dig into implementation files, or into documentation, in order to use a library. Its headers ought to be sufficient. Things like private instance variables or anything private does not belong in a header file.

FWIW, here's the public header for the library I spend most of my time working on:

https://gist.github.com/e83169d2c3984c6f077c


The headers still exist. This proposal is not about eliminating them. All this proposal does is bundle groups of headers together and give you a simplified way of including them.

If you have a look at the proposed ".map" files, the headers are still fully enumerated. I assume the IDE/debugger will still be able to find definitions in the original header files via the .map files.


> Computers can compile things really fast nowadays.

Not if you're working on a large project. Firefox takes about 15 minutes to build from scratch on my fast Linux box. Getting that down by a factor of 10 or more would be fantastic.


No, you're not alone. I view headers as a wonderful way of documentation as well. Usually I've got a header file open on the left side of the screen while working on the implementation on the right side. It's extremely convient to work this way.

I'm aware of the performance issues headers create at compile time but I still like them quite a lot. On the other hand I haven't really been too fond of any module system I've seen so far. Maybe I'm just extremely old fashioned.


> Computers can compile things really fast nowadays.

Compiling Qt or LLVM from source will quickly put a rest to these unwarranted assertions.


> Computers can compile things really fast nowadays.

You are free to take care of our > 1 hour compile time build.


Most of the time header compilation time isn't a big deal to me either, but then I started working on a project that uses Boost extensively and noticed that even small files were taking several seconds to compile because the headers were so enormous.


> Computers can compile things really fast nowadays.

Tell that to my fecking codebase.


It sounds like you'd really like OCaml's module interface files (.mli). Or, more generally, all of OCaml's module system. It's very well designed and definitely worth checking out.


It the risk of starting a fight, I really don't want this. I'm quite happy with headers and know how to effectively manage them without shooting myself.

Granted there is some compiler overhead for importing large header files but I don't really notice it at all.

Also, we already have an Apple/Next non-standard C extension (objective-C). I don't think we want anything else added without proper standardisation regardless of the motivation. I'd rather they forked the language.


> Also, we already have an Apple/Next non-standard C extension (objective-C). I don't think we want anything else added without proper standardisation regardless of the motivation. I'd rather they forked the language.

This is confusing. Surely Objective-C, which adds a hell of a lot that C does not address and many syntax and runtime changes to support it, would fall under the definition of "fork of the language", rather as C++ does, rather than simply a "non-standard C extension" (surely that description better applies to the many GNU C extensions in GCC?).

Re: "adding things without proper standardization", the role of standards committees is to reach consensus among vendors so that they can standardize non-standard extensions that they have variously implemented and tested in the real world first. To argue for the opposite, that the vendors must do nothing until the committee hands down the One True Way from On High, untested outside of their heads, is the height of Design By Committee.


> Surely Objective-C, which adds a hell of a lot that C does not address and many syntax and runtime changes to support it, would fall under the definition of "fork of the language"

It's not even a fork, it's a different language source-compatible with C. It has its own semantics, its own syntax and its own runtime.


Well Objective C was a preprocessor extension as this proposal is so it's one and the same. Both are extensions.

Yes we all know where vendors that do that got us:

   -moz-gradient:
   -ms-gradient:
   gradient:
   -webkit-gradient:
Oh and Microsoft with their C++ CLR extensions and middle finger to C99.


> Well Objective C was a preprocessor extension as this proposal is so it's one and the same. Both are extensions.

This is daft. Objective-C has been implemented using a full-fledged compiler for decades. Is C++ "just an extension to C" because it was once a pre-processor on top of C? Is Common Lisp "an extension to C" because ECL transforms it into C?

Utter lunacy.

> Yes we all know where vendors that do that got us

Yes; they got us tried and tested ways of implementing things like gradients, so that the standards body had something in the real world to base a successful standard off of. The alternative gets us things like C++98's export templates: unimplementable garbage that no vendors could support because they were invented on paper and any attempt to actually build them in the real world caused more problems than they solved.

And just so we're clear what's going on here:

The linked slides are about LLVM's implementation of Daveed Vandevoorde's proposal to the C++ committee (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n334...). The C++ Standards Committee held off on adopting that proposal for C++11 because there were no implementations of it, and asked that vendors try implementing it, adapting it as necessary to make it feasible, and provide feedback so that informed decisions based on experience could be made in the final draft of C++17.

Let me repeat that: LLVM is doing exactly what the standards committee asked them to do.


Thanks for linking to Daveed Vandevoorde's proposal. I had no idea that he had originated this concept, else I would have worded the title differently.


I guess you haven't tried to compile a big project like WebKit to experience the pain of importing large header files.

Well-defined modules, if standardized, would be a huge boost to C and C++ compilation speed. Remember the proposal does not take anything away, you can always use #include, but adds a new method for those wanting to take advantage of it. Why the resistance there?


Doug Gregor, the author of this proposal, is the chair of the modules study group of the C++ standards committee (source: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n338...). And this proposal looks very similar to the C++ modules proposal by Daveed Vandevoorde http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n334...

So it looks like Apple is working towards making this a standard, not a non-standard extension.


WRT compiler overhead, this is an asymptotic improvement, not just a constant factor. If this can get my 30-second to 20-minute compiles (depending on what's changed since the last make) down to the compile-duration range of Go (which uses this kind of dependency management for compiler performance reasons) that completely changes the workflows that are possible.

But yeah - the standardisation issue is worrying, although at least it looks like they've thought about the issue with their transitional proposal.


Compile time is usually fine if you build your code so that dependencies are simple as intermediate .o files are only built in dependency chains. Link time is much higher on larger projects, which this still fails to solve.

Knowing your shit gets you further than language changes here.

For ref, I've been writing c since 1986 and I've seen proposals like this come and go lots and every one doesn't end up with an improvement.


I haven't used C++ recently, and never in truly large projects, but what I do remember is that that "build your code so that dependencies are simple" is not something you get for free, certainly not with C++.

I also think that Apple wants every compilation speed improvement it can find, so that it can improve syntax coloring and error highlighting in (almost) real time while you are editing.

Finally, I don't think this really is a proposal. Apple shows its intent earlier than they did with WebKit, but read that last slide: "clang implementation underway".

I expect they will listen to feedback and change this if people propose real improvements, but I think it is a given that this will be in clang soon, and also that it will be used in clang, LLVM, and WebKit. From there, chances are it will spread, either via tools that use WebKit or LLVM, or because of superior compilation speed.

How soon that 'soon' will be, I don't know. Implementation may not be as simple as it looks.


I've used Oberon, which has a module system quite similar to this, quite heavily, and the way to diminish link time is to not link, instead use dynamic loading: leaf modules are loadable and unloadable, nonleaf modules can be made leaf by unloading all their descendants. Gives you the possibility for immediacy/flow, that a good REPL does. For many cases, you get edit/compile/run cycles measured in (often single-digit) seconds.


Agree with this completely.


Almost like the way D programming language handle it

http://dlang.org/module.html


For completeness, here's how Rust handles modules:

http://dl.rust-lang.org/doc/tutorial.html#modules-and-crates

Browsed golang.org for a bit and didn't find any info on how Go handles modules/packages. All I know is that it has something to do with the folder hierarchy of your project (links appreciated).


in Go, you declare a package for your file. Exported symbols start with an upper case letter.


Okay, so the `package` keyword in Go is like the `export` keyword in this proposal? Does Go allow submodules?


Packages are given paths or URLs so they might be a subpath but it's not really a subpackage of any sort other than by name.


Here's a talk I gave about Go's package system: http://nf.id.au/my-linuxconfau-go-talk-mostly-about-packages


When I saw import std.stdio; I could've sword I was looking at D for a second. So clearly I'm all for this proposal. Anything to make C++ more like D is a win in my books.


also looks like ada to me

with Ada.Strings.Unbounded;

but "withing" ada, you'd get the parent package(s). eg you'd get Ada.Strings and Ada.Strings.Unbounded but not Ada.Strings.Fixed


"Apple's Module proposal..." Is Apple really who deserves credit here? Is there something I missed about Apple's management driving this, and not Gregor or the C++ standards committee?


First slide. It seems like he's employed by Apple and being paid to work on this. So yes, I think calling it Apple's solution an apt description.


Herb Stutter heads the C++ standards committee and credits his employer Microsoft [1]. Is C++11 or C++17 "Microsoft's"?

[1] http://isocpp.org/std/the-committee


The standards don't belong to anybody. Particular proposals are credited to the person or organization that propose them.

This particular proposal is from Apple.

When Herb Sutter/Microsoft make proposals for C++, it's phrased as "Microsoft's proposal for C++0x" or whatever.


If you want a little more precision, credit it to the Apple Compiler Group - they seem to operate with quite a bit more autonomy than, say, whoever's responsible for security there.


Everyone wants modules but nobody can agree on how they should behave. This is why they weren't included in C(++)11.


Speaking as someone who doesn't follow the development of those specs, can you elaborate? What behaviors are contentious?


I actually haven't been following modules at all so I couldn't tell you exactly what's the problem. Watching from 30:40 of the following video may be insightful though:

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012...


Apple has a lot of pushing power, probably enough to get this added. At least partially.


long overdue indeed, reminds me a lot of google go?


Definitely seems like Go has led the way here. Not that there wasn't a lot of pain before, but Go's lightning compilation seems to be an impetus for the "let's finally do this" change.

I have to think that this is a minor repartee between Apple and Google language groups. Google did something clever that breaks with tradition, and now Apple is doing something similarly clever yet backwards-compatible.

I like this dynamic. I certainly appreciate this move as a OS X/iOS programmer, since getting a functional version of Go on iOS is a pipe dream for strategy tax reasons.


I don't intend to downplay the importance of Go (it's definitely helped people realize how nice fast compiles can be when you don't have to fight the language to get it), but Clang (whose leaders are making the proposal here) has had an explicit focus on fast compile times since the very beginning:

http://clang.llvm.org/features.html#performance

Note that "Fast compiles and Low Memory Use" is listed more prominently than any other feature (in primary position, with more text and pretty graphics).


Object Pascal did the exact same thing 20 years ago, and it was a language much more influential than Go at its time. So, I don't think Apple programmers had even to look at Go to design this feature.


Yeah. Compile times there were ridiculously fast, bordering on interactive. Go follows the same path.

Java has very fast compilation as well, due in no small part to tossing the preprocessor and having fast binary imports.


Part of the advantage of Pascal at the time is that it was created explicitly to allow for fast parsing. C and especially C++ are much harder to parse.


The main advantage was still the possibility of having binary modules, without parsing the same imports multiple times during "make world" compilation.

Just as input, the ISO Pascal eventually got the changes that were available in Turbo Pascal and Mac Pascal (known as ISO Extended Pascal), but by then most people considered Turbo Pascal the _de facto_ standard.


If I remember correctly, it's a single pass parser, which is why you generally write get to your code upside down.


Also due to the java compiler doing very little work (it's a really dumb compiler)


It does all the same work as a C++ compiler, except optimization phase. Yet any C++ compiler with optimization turned off is still order of magnitude slower. The key differences are:

- Java has a much simpler grammar to parse. - Everything is compiled at most once. - Encapsulation is stronger: private class members are not exposed in public class ABI and external code doesn't depend on them (this also extremely improves recompilation times - no need to compile half of the project because you changed a private method).


> Definitely seems like Go has led the way here. Not that there wasn't a lot of pain before, but Go's lightning compilation seems to be an impetus for the "let's finally do this" change.

Except that this is pure marketing. Languages with modules were already compiling as fast as Go does, back in the 80's.

Modula-2, Turbo Pascal, just to name two of many.


Yes, the idea of using modules is nothing new. It's old-school and well-proven. Many people have argued about adding pascal-like module support to C at some point or another at the coffee machine. But having enough (political) momentum to actually change C(++), and overcome extreme inertia and calcification is new. LLVM is the greatest thing to happen to compilers in a long time, it really opened up compiler development like GCC somehow never did (see http://www.drdobbs.com/architecture-and-design/the-design-of... for a likely explanation). And having a giant like Apple behind it helps, of course.


Oh yes, I am all for having modules in C and C++.

I was just calling the attention that some Go devotees seem to think their compile times are something new.


Golang should get a kickass debugger before C++ gets modules and steals their thunder.


Genuinely curious: what are you looking for in a "kickass debugger" for Golang that isn't already provided by GDB's golang support?


The ability to recompile code in the debugger and continue execution?


And some would argue generics...


Don't forget generics


I found the Go module (aka package) system subtle to understand, but it seems to work well. The subtlety being that what you name when you write "import" is not a module, but a directory on the filesystem (under a known root). Actual module names that act as the namespace for a module's public identifiers do not have to be the same as (or derived from) the import path; that is just the convention. For example, importing "utils" could conceivably bring in bobsfunkystuff.SomeFunc with there being no such module as utils and hence utils.SomeFunc.

It's also my understanding that there are no "submodules" in Go. If you want what's in the import path "foo/bar", importing "foo" is unrelated. In fact, there might not be anything in "foo", making it an invalid import path, despite "foo/bar" being a valid one.

And then there's the stuff with being able to directly import code from github, code.google.com, etc, but that dovetails in to the same mechanism after downloading.


Since these are not in C++11, we'll have to wait 10 years...


Actually you'll have to way from 2 to 5 years. Have a look at Herb Sutter's talk about "The Future of C++":

http://channel9.msdn.com/Events/Build/2012/2-005


Only one major feature is planned to make its way through C++17, modules is considered a major feature.

And there are lots of people interested in other sets of major features.

So who knows if C++17 will include it, and even it does, how long one can use it in production.

Until 2017 many native developers might just had moved into Rust, D, Go or whatever comes along.

Just look how computing used to be 5 years ago.


Do you have a source for "Only one major feature is planned to make its way through C++17"? Wikipedia tells me (without citations) that C++14 and C++17 are minor and major revisions respectively, but doesn't indicate that "major revision" stipulates only a single major change at most.


<quote>

Bjarne believes that for C++17, we only have resource for 1 major feature, 2 medium features, and a dozen minor features.

</quote>

https://www.ibm.com/developerworks/mydeveloperworks/blogs/58...


Practically Apple will probably add it to Objective C next year.


Anyone know where I can read more detail about this proposal? It looks really interesting, but there are a couple of things I'm not clear on from the pdf:

How do you get away from creating a header file for a closed source module? Without a header, how would users of your module know what they can call? Can you perform reflection on a module to inspect it? Is there some kind of tool proposed, like javadoc or pydoc, to generate documentation for a module?

How does this work with C++ templates? If you don't know in advance what types the template will be instantiated with, how can you pre-compile the code?

I'm sure the authors have thought through all these issues and more; I'd love to read about their solutions.


I know nothing about this proposal, but I can make educated guesses on all of these:

Creating header: the module needs only the public information for a closed source module. You could generate it from a .h file if you are a consumer of a closed source module, or if you are the producer of the module, you could potentially ship either pre-generated modules or all the information needed to generate a module.

javadoc, etc. It sounds like you could generate the information directly from the C/C++ source since there is now information about what is public and private in the source

Compiling this: template <typename Type> Type max(Type a, Type b) { return a > b ? a : b; } Generates no code, but it does some work in the compiler that can be cached. This is what would be stored in the module.


Thanks, I think you're probably right - the modules would have to have enough information in their compiled form that your compiler could introspect them without needing the source. The proposal wouldn't be much use if that weren't true.

Re: templates, I thought the expensive part of template processing was code generation rather than parsing. Would caching the AST really be that much of a saving? I admit I've never tried profiling it to find out...


#include <iostream>

You've just parsed ~6MB of code. You might instantiate 2 or 3 templates from it. The parsing truly is non-trivial.


Your first stop should be the latest revision of the proposal by Vandevoorde [1]. He also presented the paper at C++Now! [2].

Do answer some of your questions: Instead of compiling directly to object files/libraries and distributing headers with them, you will be able to distribute modules files. Those will be preprocessed and already be transformed to some vendor specific format.

Templates require some (actually a lot) processing before they can be instantiated. This can be done without knowing any of the types used for instantiation and this can be used to speed-up compilation.

[1]: www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3347.pdf [2]: https://github.com/boostcon/cppnow_presentations_2012/blob/m...


Exactly what I was after - thanks a lot!


It looks like the old #include style will still be available where you can't use packages, and it seemed to me that you only needed the .h files for the generation of the modules, not the source .c files.


Maybe all the things they figured out in pascal aren't that bad.


I didn't see any mention of the recursive-usage problem. How do they plan on handling co-dependant files?

For example, class A's code makes use of class B. Class B's code makes use of class A.

a.h looks like:

  class B;
  class A {
  public: void foo (B *);
  }
a.cc looks like:

  #include "a.h"
  #include "b.h"
  void A::foo (B *) { b->narf (); }
and b.h and b.cc use A in the same way.


I don't see what the issue is. In some sense, the proposal is to essentially do away with inserted code and instead do what Java is doing with its import statements: a public API. There's no recursion issue in Java.

I think you may be mixing this up with C's #include, which does have recursion issues, hence the need for its absurd #ifdef guards. But Apple did away with that years ago with #import, which is basically #include with built-in guards.


I'm wondering how it accomplishes this as a practical matter. Does it compile a file twice? Once to create the interface info, and a second time to compile the implementation? Can they even parse enough of a C++ file to generate the interface in all cases?

That they chose not to mention the issue makes me nervous that they think they can just ignore the problem. Many previous languages have ignored the problem and said "don't do that." I hope this proposal does not go in that direction.


Sounds interesting, but I've got to admit my initial response is cautious.

How will this avoid the LD_LIBRARY_PATH hell of trying to depend on a local module (ie. conflicts)?

How will it work at all with local submodules inside a single project? (ie. I have 200 local c files, each with a header. Now what? A module each? How do we handle simple dependencies between classes and functions?)

How will we import actual macros?

My guess is that the answers are:

1) include path style --module-path=blah

Seems fair, but this is going to be as messy as include paths already are.

2, 3) dont use modules except at a system level.

Thats a shame as far as im concerned, but perhaps I'm wrong. Can anyone else see how these might work?


Anyone saying 'inherently unscalable' about a feature of a language that's used in as many places as C needs to really think about things...

I understand some of the objections, and the import mechanism doesn't sound like a bad thing, though some of the objections are weird -

Import only imports the public API and everything else can be hidden - who wasn't doing this for libraries or large code modules anyway? Have static functions at the code module level, 'private' headers for sharing functions within a larger logical module and public headers at the logical module or libary level. Is this too cumbersome?


The problem with implementation hiding is that it is impossible to do well with templates. You will always make all implementation details available to your a user of your header and sometimes the implementation details are a public API themselves (libc++ vector brings in almost all of utility, a bunch of type_traits and so on).


I may come off as a bit nutty here, but do we really want to add another mechanism to C/C++ for something that has been worked with for decades?

Most of these issues are things you learn really fast how to avoid in production systems, using things like #pragma once, include guards, and proper symbol and header exposure when writing C libraries.

This doesn't really seem like much other than feature bloat for something that works, works well enough, and which probably won't be implemented in a timely fashion by at least 1 major compiler vendor (Hey folks! There's life outside clang and gcc and icc!).


I once thought about an automatic bullet-proof precompiled-header system. I think this is actually not impossible to implement.

When first parsing a header, the parser can take track about all macros it depends on in the preprocessor state. E.g. a very common macro is the one it reads very first, like `#ifndef __MYFILE_H`.

Then, including a header becomes a function (<macro1, macro2, ...>) -> (parser state update, i.e. list of added stuff). This can be cached.


My concern is portability because that is the largest benefits of C at-least. C++ portability is less true however it is more portable than most languages.


This is cool, should make Rusty Russell's CCAN [0] a whole lot more interesting. Instead of just snippets of useful code, it could contain full modules like CPAN/PyPI/PEAR/CRAN/CTAN and various other repositories for other languages.

[0] - http://ccodearchive.net/index.html


I like the idea, but hopefully they'll shorten the std submodules to `std.io`, `std.lib`, etc.


I thought one of the points of headers system was you could use the code without having to slog through all the source.

Thinking about my trips to /usr/include, those headers weren't that useful for coding with but you could get constants and function names at least.


C# together with visual studio solves this in an amazing way. You can browse .NET DLL-files as if they were source files, with only public members being visible and without the function bodies. Good for finding function names and constants.

I can't believe nobody else is using the same system, using "go to definition" on a library function is just as natural as using it on one of your own functions. Only difference is in the second case you see the actual code of the function instead of just the function header.

(Replace .NET DLL-files with these new "modules" to make this applicable to C)

Ps. even if you do have the source, in most editors you can collapse the code to hide everything so you don't have to "slog through all the source".


Most (all?) Java IDEs do this as does XCode ( Obj-C) and probably many other things. This isn't the point here though...


But Eclipse does stupid stuff like disabling Ctrl-Ffull text search when viewing a .class file, even though text method names are plainly visible.


Looks like the core issue is a poor preprocessor implication. It's a good idea in principle, however, we would be adding new features to address shortcomings in existing features instead of fixing the problems in the existing code.


No, the issue is not just poor implementation. Problems like O(M*N) bloat and scope pollution are fundamental design flaws with the preprocessor header system.

Almost every widespread modern language (all except JavaScript? [1]) has a built-in module system. Adding modules to C is not just some sort of hack to work around preprocessor limitations. Instead I'd say that the C preprocessor is a hack to work around lack of modules.

[1] and modules will probably be added to JS in the next edition of the ECMAScript standard: http://wiki.ecmascript.org/doku.php?id=harmony:modules


You can add some functionality to limit scope and implement a caching system (which is also proposed here) to deal with M*N issues with a focus on keeping backwards compatibility. E.g. the caching is done automatically, the scope spamming control is a best effort solution based on static analysis (for example).

Moving from headers to modules is a fundamental change in how the language operates and is guaranteed to further break compatibility between compilers further. I worry that jumping to add these features to the language standard is premature and we should instead look further to optimizations within the preprocessor and linker to see if we can improve performance first.


> You can add some functionality to limit scope and implement a caching system (which is also proposed here) to deal with M*N issues with a focus on keeping backwards compatibility.

You can't cache the AST of a #include'd file without breaking the standardized semantics of #include. ccache (which is what you're basically proposing) gets away with it by just ignoring the standard and shrugging if some legal programs break horribly when it's used. The Standards Committee doesn't have that luxury.

You can't change existing functionality while "keeping backwards compatibility". The draft Modules proposal does a much better job of backwards compatibility than what you're proposing, because it allows #include to continue to have the same semantics it has always had, and even provides a clean way forward for using both #include and modules in the same translation unit.

> Moving from headers to modules is a fundamental change in how the language operates and is guaranteed to further break compatibility between compilers further.

How would a standardized module system "break compatibility" between standards compliant compilers? The title of this story is completely misleading: this isn't "Apple's" proposal, this is about LLVM's implementation of the Standard Committee's Module Working Group's draft proposal.

> I worry that jumping to add these features to the language standard is premature

The committee was worried about that too; that's why the draft proposal was held out of C++11 so that vendors could try out implementing it and see how it fared in the real world.

> we should instead look further to optimizations within the preprocessor and linker to see if we can improve performance first.

It's not as though nobody has ever tried to improve pre-processor or linker performance; people have have been looking at those issues since the beginning of C. There is, fundamentally, no way to improve the performance of the current system without breaking semantic compatibility with existing programs.


Backwards compatibility is a Big Deal for these communities.


You can maintain backwards compatibility while fixing bugs. It's equally a Big Deal to have discipline while working on language architecture to minimize feature creep/bloat.


I don't buy the performance argument: NxM -> N + M only works if every one of the N .c files is including every one of the M .h files.

If you're spamming #includes like that, you need to fix your #includes, not redefine the language.


If a .c file includes U x M .h files on average, where U is a number from 0 to 1, then the work we expect to do at compilation time is proportional to U x (N x M). This is still O(N x M).


True, but in my experience U tends to decrease sharply as M increases.


They mention:

> "‘import’ ignores preprocessor state within the source file"

I wonder if that would remove specific use-cases where you wouldn't want the import to ignore the state of the pre-processor within the source file?

Overall, I like it!


You can still use #include if you want preprocessor state to leak into a specific header file.


It breaks "there should be one-- and preferably only one --obvious way to do it". And quite a few others.

But as a stand-alone feature, stateless preprocessor includes could be a nice feature to have.


Is C bound by Zen?


Zen wisdom crosses all boundaries :)


Is D not doing this sort of thing already or am I wrong?


It's possible I don't understand the proposal and I'm probably going to get egg on my face and but I'm not sure I really want this. If I wanted Objective C or C# or Java or Python I'd use Objective C or C# or Java or Python.

I actually like the preprocessor. I like that I can write code like this

    #ifdef DEBUG
      #define DEBUG_BLOCK(code) code
    #else
      #define DEBUG_BLOCK(code)
    #endif

    void SomeFunction(int a, float b) {
      DEBUG_BLOCK({
        LOG_IF_ENABLED("Called SomeFunction(%d, %f)\n", a, b);
      });
      ... do whatever it was SomeFunction does ..
    }
In other languages that I'm used to there's no way to selectively compile stuff in/out.

I like that I can change the behavior of a file for a single include unit

   -- foo.cc --
   #include "mylib.h"

   -- bar.cc --
   #include "mylib.h"

   -- baz.cc --
   #define MYLIB_ENABLED_EXPENSIVE_DEBUGGING_STUFF 1
   #include "mylib.h"
because enabling it globally would run too slow

I like that I can code generate

    // --command.h--
    #define COMMAND_LIST \
       COMMAND_OP(Stand) \
       COMMAND_OP(Walk)  \
       COMMAND_OP(Run)   \
       COMMAND_OP(Hide)  \
       COMMAND_OP(Jump)  \

    // make enum for commands
    #define COMMAND_OP(id) k##id,
    enum CommandId {
        COMMAND_LIST
        kLastCommandId,
    };
    #undef COMMAND_OP

    // --command.cc--
    // Make command strings
    #define COMMAND_OP(id) #id
    const char* GetCommandString(CommandId id) {
      static const char* command_names[] = {
        #define COMMAND_OP(id) #id,
        COMMAND_LIST
        #undef COMMAND_OP
      };
      return command_names[id];
    }

    // make a jump table for the commands
    typedef bool (*CommandFunc)(Context*);
    bool FunctionDispatch(CommandID id, Context* ctx) {
      static CommandFunc s_command_table[] = {
        #define COMMAND_OP(id) id##Proc,
        COMMAND_LIST,
        #undef COMMAND_OP
      }
      return s_command_table[id](ctx);
    };
Or this

    class Thing {
    public:
      void DoSomething();

    private:
      #ifdef USE_SLOW_LEGACY_FEATURE
      // needs access to Thing's internals.
      void EmulateOldSlowLegacyFeature();
      #endif
    };
Yes, I can try to hide the implementation but again, the reason I'm using C++ is because I want the optimal code. Not a double indirected pimpl. If I wanted the indirection I'd be using another language.

I love C/C++ and it's quirks. I use it's quirks to make my life easier in ways other some other languages don't. Modules seems like is ignoring some of what makes C/C++ unique and trying to turn it into Java/C#

People saying the preprocessor has issues are ignoring the benefits. I miss the preprocessor in languages that don't have one because I miss those benefits.

You could say, "well, just don't use this feature then" but I can easily see once a project goes down this path, all those benefits of the preprocessor will be lost. You can't easily switch your code between module and include, especially if it's a large project like WebKit, Chrome, Linux, etc.

Leave my C++ alone! Get off my lawn!


> You could say, "well, just don't use this feature then" but I can easily see once a project goes down this path, all those benefits of the preprocessor will be lost. You can't easily switch your code between module and include, especially if it's a large project like WebKit, Chrome, Linux, etc.

It seems to me that that's the genius of this proposal: You can continue to use #include and the preprocessor where you feel it best serves you, because you can freely mix the two in the same compilation unit. It's fairly obvious when you're building things like your hypothetical command.h that you're going to base them on the preprocessor from the start, so the cost of moving from modules to pre-processor based solutions is irrelevant. And note that this proposal is not about removing macros, or removing the preprocessor, so your other two examples will continue to work in either scenario.

Liking macros and the preprocessor is fine and dandy, but there's no need to force template-heavy code to continue to pay insanely high re-re-re-re-re-parsing costs ad infinitum simply to retain those features.


On a closer read of this proposal, I believe the key is in these lines:

    A module is a package describing a library
    • Interface of the library (API)
    • Implementation of the library
That is, this will never replace the way you write applications.

All it means is that there is an alternative way to depend on 3rd party libraries that improves compile time if that library supports it.

If you're writing a library, you may choose to support it.

I agree, if this was a proposal to remove #include, #define and #if, it would be doomed as a non-starter; that's simply not practical. I don't believe that it is though.

Edit: Yes, I am saying that I believe this proposal is only practical for getting rid of public header files, the sort you'd find in /include/, and will have no impact on header files inside a single code base.


They're not ditching the processor, they're just making it so the preprocessor state is isolated for modules. You can still #define all you want, but it won't cross the boundary between your code and a module.


Which is pretty much how linking to a shared library works, your #defines and #includes have no effect.


Well, no. If you make a #define it will affect code in the #include. This is why you sees puts defined as

    extern int puts(__const char *__s);
The reason for the double underscore is so the definitions won't be affected if 's' is a macro.


D has static-ifs as a part of the language. It enables the exact same kind of conditional compilation, but without using a pre-processor: http://dlang.org/cpptod.html#metatemplates

The latest version of D is compelling. It's mostly a reasonable cleanup of C++. It retains all of the power, but the design learns from the decades of experience. The problem is that it's mostly a reasonable cleanup of C++ - it's not different enough to stand out, I think.


Reading this PDF made my day! I can't wait for headerless c/c++. When will this be implemented???


I hear D is backwards compatible with C (and C++?). They already have modules: http://dlang.org/module.html . I should use D more often.


D is not backwards compatible with C or C++, it doesn't even have a preprocesser. I think what you mean is D code can link C code if you use the C standard libraries.


Yes, I was referring to ABI compatibility. You are probably referring to source-level compatibility. If it's compatible at the binary level, it's compatible.


good idea ,but it is not useful for me , i like traditional programing type of C/C++ ,and my mechines for server are strong enough


He's basically just describing a clunkier version of the Go package model


Looks like they are trying to rewrite Python.


the one problem i agree with him on is performance - from what i can see his proposal does something to potentially improve that, but its not clear. i worry that caching pre-processed files is a red herring - its it really faster than re-including? what about preprocessor states? what about macros in include files? etc.

i feel that the preprocessor ultimately ends up with the same amount of work, just an extra pass for each included header to build a version to be cached… not to mention the complexity required to handle the multiplicity of pre-processor states required for this. maybe i am being dim and missing the obvious.

tbh, i'd rather they made their compiler work properly, like respecting alignment on copies with optimisation turned on, or implementing the full C++ 11, before adding language features to fix problems that nobody really has.


> i worry that caching pre-processed files is a red herring - its it really faster than re-including?

For future reference: In large C++ projects, it's not at all unusual for greater than 90% of the compile time to be spent parsing (and re-parsing, and re-parsing, and re-parsing, ad nauseam) header files.

> what about macros in include files?

The AST of the header is persisted. If the parser can parse macro definitions, macros will continue to work normally. The only case where you would need to fall back to #include is those rare times in which you want defining something in the source file to alter the parsing of the header (and even in most of those, you should just define the constant in the call to the compiler eg) "clang foo.c -DWITH_FEATURE_X")

> i feel that the preprocessor ultimately ends up with the same amount of work, just an extra pass for each included header to build a version to be cached… not to mention the complexity required to handle the multiplicity of pre-processor states required for this. maybe i am being dim and missing the obvious.

The pre-processor merely slurps the text of the included file into the including file; the compiler then parses the entire gigantic soup of <text of source file> + <text of all files included by source file>. The semantics of this require that every included header be re-parsed once per compilation unit. Imagine you have 5 .cpp files, each containing 200 characters, and each #including iostreams (which weighs in at roughly 1 million characters). Each .cpp file, post-preprocessor phase, will be 1 million, 200 characters long. A full compilation of the project will require the parsing of 5 million, 1 thousand characters. Any subsequent full build will require parsing the full 5 million, 1 thousand characters. Changing one .cpp file will result in the need to parse 1 million, 200 characters.

In this proposal, by contrast, an included file need only ever be parsed once; its AST can then be persisted and referenced eternally. In our above example, the iostreams header will be parsed once, and each .cpp file will be parsed once. This means a full build, the very first time iostreams is ever referenced in any compilation on the system, will require parsing 1 million, 1 thousand characters. Any subsequent full build will require parsing merely 1 thousand characters. Changing one .cpp file will require parsing merely 200 characters.


you have completely missed my point. i know full well how much time is spent parsing these things and how the mechanism works - i don't believe this proposal will actually improve that. i also don't believe your answers address the point i was trying to make either... namely that whatever preprocessed import module thing is created, it still has to be included into the compilation unit somehow... even if there is some kind of linkage type solution going on with a lightweight interface - that feels functionally equivalent to what most of the standard library headers already are - so i don't understand what could possibly be that much faster or better about it.

not to mention that this is not a problem if you encapsulate your use of standard libraries properly... maybe 10-20 compilation units have to use it if you like to split your stuff into files a lot.

standard headers are poorly written/designed by including so much crap everywhere. why can't i have specific - per function headers which include minimal stuff?

fix the headers, not the preprocessor.


> i know full well how much time is spent parsing these things and how the mechanism works - i don't believe this proposal will actually improve that.

Well, then you're pretty much 100% wrong in most C++ projects.

I honestly don't know what to tell you here. That persisting header ASTs between translation units is faster than re-parsing should be trivially obvious, and if it isn't trivially obvious, then the mere fact that precompiled headers and ccache dramatically speeds up builds ought to make it empirically obvious.

The facts just aren't on your side.

> namely that whatever preprocessed import module thing is created, it still has to be included into the compilation unit somehow

Well, yes, obviously. In the current model the compiler slurps the header into the source file, and parse the entire combination, resulting in the parse tree of the header + the parse tree of the rest of the file. In the proposed model the compiler pulls the parse tree of the header out of cache and just builds the parse tree of the file. Since in C++ header parse trees are often quite expensive to build (since template declarations have to live in the headers and their parse trees are incredibly expensive to build), this ought to be a blindingly obvious win.


[deleted]


> The article is disingenuous. Alongside its oh-so-sassy table of file sizes for helloworld, it needs a table of runtimes for helloworld. Turning stdio into an API instead of preprocessor soup is going to blow that up, unless the guy means something very unusual by "API".

You seem to be deeply confused here. This is simply a proposal for persisting the AST for eg) stdio across compiler invocations instead of reparsing it on every textual substitution of a #include, and isolating source AST changes from erroneously corrupting header ASTs. It has no runtime implications because the outputs of the linking stage will be identical.

stdio's preprocessor soup is its API. You seem to be the one with an unusual and much narrower meaning of "API".


How is this new thing different from precompiled headers, which Apple has had for 10 years?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: