
Apple's Module proposal to replace headers for C-based languages [pdf] - _djo_
http://llvm.org/devmtg/2012-11/Gregor-Modules.pdf?=submit
======
haberman
Overall I like it. I like how they are treating both C and C++ as first-class
citizens of this new feature (instead of, for example, inextricably tying its
design to classes and namespaces). I like that they have a plausible migration
story for how to interoperate with existing header files. And the overall
design really looks like something that would fit into all of the C and C++
work that I do without getting in the way.

Sure it's non-standard and no one who cares about portability will use this
(yet). But this is exactly the way that good ideas get refined and eventually
standardized. You surely wouldn't want to standardize a module system that
_hadn't_ been already implemented and tested -- that would just leave you with
surprises when theory meets reality.

C and C++ are here to stay -- we should be open to improvements in them.

They don't explicitly mention this, but I'm sure that they have no plans to
remove existing #include functionality -- it is a near certainty that someone,
somewhere depends on having the preprocessor state affect how an include file
is processed. There are probably even cases where you can look at the design
rationale for this choice and say "yep, that really is the best solution for
what you are trying to do."

~~~
bla2
> Sure it's non-standard

Doug chairs the study group that's evaluating a module system for c++
("""Sutter announces there is a Study Group for modules and Doug Gregor is the
chair.""" [http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n338...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n3380.pdf))

Doug happens to be employed by Apple, but calling this "Apple's proposal" is
somewhat misleading I think.

~~~
lgg
Have you ever worked at a company that develops compilers? Do you know anyone
on the clang team at Apple? Do you know Douglas Gregor?

dgregor is clearly heavily involved and the likely the principle architect of
this proposal, but he did it on Apple's behalf to solve their problems, by one
of their employees, with clear input for a number of Apple's internal
stakeholders. Yes, he is the chair of study group, but is participation in
WG21 is funded by and at the behest of Apple.

The germ of every idea starts with one person, but many (most?) ideas require
the involvement of many people to make them happen. Where credit lies is
complicated, but I think it is completely accurate to refer to this as either
Doug's proposal or Apple's proposal.

~~~
chandlerc1024
I do work at a company ( _not_ Apple), developing compilers -- specifically
Clang. I know most of the folks working on Clang at Apple, and I know Douglas
Gregor quite well. I'm on Doug's study group working on modules.

So, I think I have some insight into what's going on. The design and
implementation of the modules support in Clang is absolutely being driven by
Doug at Apple. Anyone can see that. =] However, there are some other aspects
to this effort that were not the focus of an LLVM dev meeting talk. Daveed
Vandevoorde has written the proposals to the committee thus far[1], and is
continuing to work on the proposal and language-design side[2]. Doug is
currently the one driving the implementation forward, but Clang and this
implementation is completely open source, and the intent (to my knowledge thus
far) is absolutely to converge with the proposed standardized feature.

[1]: [http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n334...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n3347.pdf) [2]:
<http://youtu.be/8SOCYQ033K8>

I expect lots of others will end up contributing ideas and and implementation
effort long-term, even though Doug has charted the course on the
implementation side and Daveed on the proposal side thus far. I don't think
this is at all likely to become a vendor-specific extension with no standards
support. This is something the committee is really actively pursuing with
broad interest across organizations and representatives.

------
_djo_
This looks promising, aside from being long overdue. Header files have always
been one of the more annoying parts of C/C++/Obj-C development.

The important bit is that the proposal's ideas for making the transition
easier are good and make it seem like this may get traction where similar
efforts have failed before. That Doug Gregor and other LLVM/Clang/LLDB
developers are already working on the Clang implementation is even better. At
the very least we may see this in Objective-C.

~~~
Someone
I think it is promising and long overdue, too. It also is clear to me that
this will win, because Apple pushes it into LLVM, and has a head start at it.

On the other hand: if someone would do the equivalent to their browser, people
would call it fragmentation.

It will be interesting to see how gcc reacts to this. If this decreases
compilation times significantly, I think they will have to follow suit.

~~~
micampe
_> On the other hand: if someone would do the equivalent to their browser,
people would call it fragmentation._

I haven’t seen many people calling Dart, Pepper, Native Client or the Chrome
Web Store “fragmentation”.

~~~
Someone
With "the equivalent" I wasn't thinking about creating entirely new
ecosystems; I was thinking of tweaking the existing ecosystem to favor one's
own tool chain.

The moment somebody writes the first module, the world sees the first 'best
compiled with clang' code. Once that code uses #import to improve compilation
speed, that could become 'must be compiled with clang'. That, IMO, is similar
to adding an extra tag to your browser's HTML dialect and then promoting its
use.

I expect Apple is aware of the risk and wants to to prevent this (the fact
that they plan a 100% compatible middle step is an indication of that), but
the risk is there.

Also, people have argued that the effort spent on Native Client should be
spent on improving JavaScript performance because of the fragmentation issue.
Responding to another comment: people also have complained about the gcc-isms
present in lots of open source libraries.

~~~
micampe
The ecosystem is the www, not Javascript and HTML, so those are all tweaking
and extending the ecosystem.

The moment somebody writes the first Dart or NaCl based website or the first
Pepper plugin, the world sees yet another (not the first) "must be viewed with
Chrome".

~~~
munificent
Websites written in Dart work just fine in any standards-compliant modern
browser. You just compile it to JS the same way you do CoffeeScript.

This makes sense because _no_ widely deployed browser, not even Chrome, has
the native Dart VM in it. You can download a build of Chromium with the Dart
VM in it, but Chrome itself doesn't have it (though we hope it will when the
time is right).

Check out api.dartlang.org. If the site works for you then your browser
supports Dart just fine. :)

~~~
micampe
Right, I forgot about that, thanks.

------
nkurz
While LLVM authors probably know best, I don't understand some of his
criticisms on the "Inherently Non-Scalable" slide.

    
    
      • M headers with N source ﬁles ->  M x N build cost
    

It's only MxN if there is no use of the "#ifndef _HEADER_H" workaround that he
mentioned earlier. Wouldn't adding a preprocessor directive like
"#include_once <header.h>" solve this? Alternatively, these guards could be
added to the headers themselves without changing the preprocessor. This
probably should be a parallel set of headers (#include <std/stdio.h>) to avoid
breaking the rare cases that depend on multiple inclusions, but creating that
set would be a simple mechanical translation.

    
    
      • C++ templates exacerbate the problem
    

I'm mostly a C programmer, so I have no argument here.

    
    
      • Precompiled headers are a terrible solution
    

Why is this? It likely would break the ability of headers to be conditional on
previous #define statement, but since the proposal does this anyway it doesn't
seem insurmountable. Along that lines, how does this proposal handle cases
where one needs/wants conditional behavior in the header such as "#ifdef
WINDOWS" or the like? And is caching headers during the same compilation also
"terrible"?

~~~
sigil
Instead of "#ifndef _SOME_HOPEFULLY_UNIQUE_NAME_H" I've started using "#pragma
once" -- it's likely supported by every compiler you care about.

<http://en.wikipedia.org/wiki/Pragma_once>

~~~
zb
That can cause problems if a header is accessible by two different paths
(which is not uncommon in a large project). Using both gives you maximum
efficiency without sacrificing correctness, and it works everywhere.

------
dpark
Overall this seems good. The lack of modules in C/C++ is a huge pain. The
"link" section seems like a really leaky abstraction, though.

    
    
      module ClangAST {
        umbrella header “AST/AST.h”
        module * { }
        link “-lclangAST”
      }
    

This hardcodes an implementation-specific syntax and yet says nothing
meaningful. Drop the "-l" and you're just restating the name of the module.
What value is there in baking a command-line flag into the module definition?

P.S. I also find it strange that something intended to blend with C/C++
doesn't use semi-colons. This is just stylistic, though.

------
SeoxyS
I think I'm the only person who likes headers. I'm not overly concerned with
compilation times and big-o notation. Computers can compile things really fast
nowadays.

I'm more concerned with the developer usability benefits & drawbacks of the
feature. As somebody who is a polyglot, but spends a large amount of time
writing Objective-C, I have come to absolutely love header files.

I see header files almost as documentation. To me, a header file is a
description of everything that's public about an API. My header files tend to
be very well commented, and very sparse, only containing public methods and
typedefs.

When the need arises to make internally-includable headers (say I'm writing a
static library, and have methods that are private to the library, but public
to other classes within the library), I will usually write a
`MyApi+Internal.h` header for internal use, which doesn't ship with the
library.

A developer should never have to dig into implementation files, or into
documentation, in order to use a library. Its headers ought to be sufficient.
Things like private instance variables or anything private does not belong in
a header file.

FWIW, here's the public header for the library I spend most of my time working
on:

<https://gist.github.com/e83169d2c3984c6f077c>

~~~
gilgoomesh
The headers still exist. This proposal is not about eliminating them. All this
proposal does is bundle groups of headers together and give you a simplified
way of including them.

If you have a look at the proposed ".map" files, the headers are still fully
enumerated. I assume the IDE/debugger will still be able to find definitions
in the original header files via the .map files.

------
meaty
It the risk of starting a fight, I really don't want this. I'm quite happy
with headers and know how to effectively manage them without shooting myself.

Granted there is some compiler overhead for importing large header files but I
don't really notice it at all.

Also, we already have an Apple/Next non-standard C extension (objective-C). I
don't think we want anything else added without proper standardisation
regardless of the motivation. I'd rather they forked the language.

~~~
msbarnett
> Also, we already have an Apple/Next non-standard C extension (objective-C).
> I don't think we want anything else added without proper standardisation
> regardless of the motivation. I'd rather they forked the language.

This is confusing. Surely Objective-C, which adds a hell of a lot that C does
not address and many syntax and runtime changes to support it, would fall
under the definition of "fork of the language", rather as C++ does, rather
than simply a "non-standard C extension" (surely that description better
applies to the many GNU C extensions in GCC?).

Re: "adding things without proper standardization", the role of standards
committees _is_ to reach consensus among vendors so that they can standardize
non-standard extensions that they have variously implemented and tested _in
the real world_ first. To argue for the opposite, that the vendors must do
nothing until the committee hands down the One True Way from On High, untested
outside of their heads, is the height of Design By Committee.

~~~
meaty
Well Objective C was a preprocessor extension as this proposal is so it's one
and the same. Both are extensions.

Yes we all know where vendors that do that got us:

    
    
       -moz-gradient:
       -ms-gradient:
       gradient:
       -webkit-gradient:
    

Oh and Microsoft with their C++ CLR extensions and middle finger to C99.

~~~
msbarnett
> Well Objective C was a preprocessor extension as this proposal is so it's
> one and the same. Both are extensions.

This is daft. Objective-C has been implemented using a full-fledged compiler
for decades. Is C++ "just an extension to C" because it was once a pre-
processor on top of C? Is Common Lisp "an extension to C" because ECL
transforms it into C?

Utter lunacy.

> Yes we all know where vendors that do that got us

Yes; they got us tried and tested ways of implementing things like gradients,
so that the standards body had something in the real world to base a
successful standard off of. The alternative gets us things like C++98's export
templates: unimplementable garbage that no vendors could support because they
were invented on paper and any attempt to actually build them in the real
world caused more problems than they solved.

And just so we're clear what's going on here:

The linked slides are about LLVM's implementation of Daveed Vandevoorde's
proposal to the C++ committee ([http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n334...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2012/n3347.pdf)). The C++ Standards
Committee _held off on adopting that proposal_ for C++11 because there were no
implementations of it, and asked that vendors try implementing it, adapting it
as necessary to make it feasible, and provide feedback so that informed
decisions based on experience could be made in the final draft of C++17.

Let me repeat that: _LLVM is doing exactly what the standards committee asked
them to do_.

~~~
_djo_
Thanks for linking to Daveed Vandevoorde's proposal. I had no idea that he had
originated this concept, else I would have worded the title differently.

------
crncosta
Almost like the way D programming language handle it

<http://dlang.org/module.html>

~~~
kibwen
For completeness, here's how Rust handles modules:

<http://dl.rust-lang.org/doc/tutorial.html#modules-and-crates>

Browsed golang.org for a bit and didn't find any info on how Go handles
modules/packages. All I know is that it has something to do with the folder
hierarchy of your project (links appreciated).

~~~
ori_b
in Go, you declare a package for your file. Exported symbols start with an
upper case letter.

~~~
kibwen
Okay, so the `package` keyword in Go is like the `export` keyword in this
proposal? Does Go allow submodules?

~~~
strmpnk
Packages are given paths or URLs so they might be a subpath but it's not
really a subpackage of any sort other than by name.

------
thwest
"Apple's Module proposal..." Is Apple really who deserves credit here? Is
there something I missed about Apple's management driving this, and not Gregor
or the C++ standards committee?

~~~
strmpnk
First slide. It seems like he's employed by Apple and being paid to work on
this. So yes, I think calling it Apple's solution an apt description.

~~~
thwest
Herb Stutter heads the C++ standards committee and credits his employer
Microsoft [1]. Is C++11 or C++17 "Microsoft's"?

[1] <http://isocpp.org/std/the-committee>

~~~
jlarocco
The standards don't belong to anybody. Particular proposals are credited to
the person or organization that propose them.

This particular proposal is from Apple.

When Herb Sutter/Microsoft make proposals for C++, it's phrased as
"Microsoft's proposal for C++0x" or whatever.

------
pubby
Everyone wants modules but nobody can agree on how they should behave. This is
why they weren't included in C(++)11.

~~~
kibwen
Speaking as someone who doesn't follow the development of those specs, can you
elaborate? What behaviors are contentious?

~~~
pubby
I actually haven't been following modules at all so I couldn't tell you
exactly what's the problem. Watching from 30:40 of the following video may be
insightful though:

[http://channel9.msdn.com/Events/GoingNative/GoingNative-2012...](http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Interactive-
Panel-Ask-Us-Anything-)

------
rjzzleep
long overdue indeed, reminds me a lot of google go?

~~~
zach
Definitely seems like Go has led the way here. Not that there wasn't a lot of
pain before, but Go's lightning compilation seems to be an impetus for the
"let's finally do this" change.

I have to think that this is a minor repartee between Apple and Google
language groups. Google did something clever that breaks with tradition, and
now Apple is doing something similarly clever yet backwards-compatible.

I like this dynamic. I certainly appreciate this move as a OS X/iOS
programmer, since getting a functional version of Go on iOS is a pipe dream
for strategy tax reasons.

~~~
coliveira
Object Pascal did the exact same thing 20 years ago, and it was a language
much more influential than Go at its time. So, I don't think Apple programmers
had even to look at Go to design this feature.

~~~
rossjudson
Yeah. Compile times there were ridiculously fast, bordering on interactive. Go
follows the same path.

Java has very fast compilation as well, due in no small part to tossing the
preprocessor and having fast binary imports.

~~~
coliveira
Part of the advantage of Pascal at the time is that it was created explicitly
to allow for fast parsing. C and especially C++ are much harder to parse.

~~~
pjmlp
The main advantage was still the possibility of having binary modules, without
parsing the same imports multiple times during "make world" compilation.

Just as input, the ISO Pascal eventually got the changes that were available
in Turbo Pascal and Mac Pascal (known as ISO Extended Pascal), but by then
most people considered Turbo Pascal the _de facto_ standard.

------
jeremyx
Since these are not in C++11, we'll have to wait 10 years...

~~~
octopus
Actually you'll have to way from 2 to 5 years. Have a look at Herb Sutter's
talk about "The Future of C++":

<http://channel9.msdn.com/Events/Build/2012/2-005>

~~~
pjmlp
Only one major feature is planned to make its way through C++17, modules is
considered a major feature.

And there are lots of people interested in other sets of major features.

So who knows if C++17 will include it, and even it does, how long one can use
it in production.

Until 2017 many native developers might just had moved into Rust, D, Go or
whatever comes along.

Just look how computing used to be 5 years ago.

~~~
kibwen
Do you have a source for "Only one major feature is planned to make its way
through C++17"? Wikipedia tells me (without citations) that C++14 and C++17
are minor and major revisions respectively, but doesn't indicate that "major
revision" stipulates only a single major change at most.

~~~
pjmlp
<quote>

Bjarne believes that for C++17, we only have resource for 1 major feature, 2
medium features, and a dozen minor features.

</quote>

[https://www.ibm.com/developerworks/mydeveloperworks/blogs/58...](https://www.ibm.com/developerworks/mydeveloperworks/blogs/5894415f-be62-4bc0-81c5-3956e82276f3/entry/the_view_from_c_standard_meeting_oct_2012196?lang=en)

------
vilya
Anyone know where I can read more detail about this proposal? It looks really
interesting, but there are a couple of things I'm not clear on from the pdf:

How do you get away from creating a header file for a closed source module?
Without a header, how would users of your module know what they can call? Can
you perform reflection on a module to inspect it? Is there some kind of tool
proposed, like javadoc or pydoc, to generate documentation for a module?

How does this work with C++ templates? If you don't know in advance what types
the template will be instantiated with, how can you pre-compile the code?

I'm sure the authors have thought through all these issues and more; I'd love
to read about their solutions.

~~~
aidenn0
I know nothing about this proposal, but I can make educated guesses on all of
these:

Creating header: the module needs only the public information for a closed
source module. You could generate it from a .h file if you are a consumer of a
closed source module, or if you are the producer of the module, you could
potentially ship either pre-generated modules or all the information needed to
generate a module.

javadoc, etc. It sounds like you could generate the information directly from
the C/C++ source since there is now information about what is public and
private in the source

Compiling this: template <typename Type> Type max(Type a, Type b) { return a >
b ? a : b; } Generates no code, but it does some work in the compiler that can
be cached. This is what would be stored in the module.

~~~
vilya
Thanks, I think you're probably right - the modules would have to have enough
information in their compiled form that your compiler could introspect them
without needing the source. The proposal wouldn't be much use if that weren't
true.

Re: templates, I thought the expensive part of template processing was code
generation rather than parsing. Would caching the AST really be that much of a
saving? I admit I've never tried profiling it to find out...

~~~
aidenn0
#include <iostream>

You've just parsed ~6MB of code. You might instantiate 2 or 3 templates from
it. The parsing truly is non-trivial.

------
rootedbox
Maybe all the things they figured out in pascal aren't that bad.

------
cjensen
I didn't see any mention of the recursive-usage problem. How do they plan on
handling co-dependant files?

For example, class A's code makes use of class B. Class B's code makes use of
class A.

a.h looks like:

    
    
      class B;
      class A {
      public: void foo (B *);
      }
    

a.cc looks like:

    
    
      #include "a.h"
      #include "b.h"
      void A::foo (B *) { b->narf (); }
    

and b.h and b.cc use A in the same way.

~~~
SeanLuke
I don't see what the issue is. In some sense, the proposal is to essentially
do away with inserted code and instead do what Java is doing with its import
statements: a public API. There's no recursion issue in Java.

I think you may be mixing this up with C's #include, which _does_ have
recursion issues, hence the need for its absurd #ifdef guards. But Apple did
away with that years ago with #import, which is basically #include with built-
in guards.

~~~
cjensen
I'm wondering how it accomplishes this as a practical matter. Does it compile
a file twice? Once to create the interface info, and a second time to compile
the implementation? Can they even parse enough of a C++ file to generate the
interface in all cases?

That they chose not to mention the issue makes me nervous that they think they
can just ignore the problem. Many previous languages have ignored the problem
and said "don't do that." I hope this proposal does not go in that direction.

------
shadowmint
Sounds interesting, but I've got to admit my initial response is cautious.

How will this avoid the LD_LIBRARY_PATH hell of trying to depend on a local
module (ie. conflicts)?

How will it work _at all_ with local submodules inside a single project? (ie.
I have 200 local c files, each with a header. Now what? A module each? How do
we handle simple dependencies between classes and functions?)

How will we import actual macros?

My guess is that the answers are:

1) include path style --module-path=blah

Seems fair, but this is going to be as messy as include paths already are.

2, 3) dont use modules except at a system level.

Thats a shame as far as im concerned, but perhaps I'm wrong. Can anyone else
see how these might work?

------
Nursie
Anyone saying 'inherently unscalable' about a feature of a language that's
used in as many places as C needs to really think about things...

I understand some of the objections, and the import mechanism doesn't sound
like a bad thing, though some of the objections are weird -

Import only imports the public API and everything else can be hidden - who
wasn't doing this for libraries or large code modules anyway? Have static
functions at the code module level, 'private' headers for sharing functions
within a larger logical module and public headers at the logical module or
libary level. Is this too cumbersome?

~~~
pmr_
The problem with implementation hiding is that it is impossible to do well
with templates. You will always make all implementation details available to
your a user of your header and sometimes the implementation details are a
public API themselves (libc++ vector brings in almost all of utility, a bunch
of type_traits and so on).

------
angersock
I may come off as a bit nutty here, but do we really want to add _another_
mechanism to C/C++ for something that has been worked with for decades?

Most of these issues are things you learn really fast how to avoid in
production systems, using things like #pragma once, include guards, and proper
symbol and header exposure when writing C libraries.

This doesn't really seem like much other than feature bloat for something that
works, works well enough, and which probably won't be implemented in a timely
fashion by at least 1 major compiler vendor (Hey folks! There's life outside
clang and gcc and icc!).

------
albertzeyer
I once thought about an automatic bullet-proof precompiled-header system. I
think this is actually not impossible to implement.

When first parsing a header, the parser can take track about all macros it
depends on in the preprocessor state. E.g. a very common macro is the one it
reads very first, like `#ifndef __MYFILE_H`.

Then, including a header becomes a function (<macro1, macro2, ...>) -> (parser
state update, i.e. list of added stuff). This can be cached.

------
vinayan3
My concern is portability because that is the largest benefits of C at-least.
C++ portability is less true however it is more portable than most languages.

------
pserwylo
This is cool, should make Rusty Russell's CCAN [0] a whole lot more
interesting. Instead of just snippets of useful code, it could contain full
modules like CPAN/PyPI/PEAR/CRAN/CTAN and various other repositories for other
languages.

[0] - <http://ccodearchive.net/index.html>

------
jonhohle
I like the idea, but hopefully they'll shorten the std submodules to `std.io`,
`std.lib`, etc.

------
acomjean
I thought one of the points of headers system was you could use the code
without having to slog through all the source.

Thinking about my trips to /usr/include, those headers weren't that useful for
coding with but you could get constants and function names at least.

~~~
Too
C# together with visual studio solves this in an amazing way. You can browse
.NET DLL-files as if they were source files, with only public members being
visible and without the function bodies. Good for finding function names and
constants.

I can't believe nobody else is using the same system, using "go to definition"
on a library function is just as natural as using it on one of your own
functions. Only difference is in the second case you see the actual code of
the function instead of just the function header.

(Replace .NET DLL-files with these new "modules" to make this applicable to C)

Ps. even if you do have the source, in most editors you can collapse the code
to hide everything so you don't have to "slog through all the source".

~~~
richy_rich
Most (all?) Java IDEs do this as does XCode ( Obj-C) and probably many other
things. This isn't the point here though...

~~~
Evbn
But Eclipse does stupid stuff like disabling Ctrl-Ffull text search when
viewing a .class file, even though text method names are plainly visible.

------
mtdev
Looks like the core issue is a poor preprocessor implication. It's a good idea
in principle, however, we would be adding new features to address shortcomings
in existing features instead of fixing the problems in the existing code.

~~~
mbrubeck
No, the issue is not just poor implementation. Problems like O(M*N) bloat and
scope pollution are fundamental design flaws with the preprocessor header
system.

Almost every widespread modern language (all except JavaScript? [1]) has a
built-in module system. Adding modules to C is not just some sort of hack to
work around preprocessor limitations. Instead I'd say that the C preprocessor
is a hack to work around lack of modules.

[1] and modules will probably be added to JS in the next edition of the
ECMAScript standard: <http://wiki.ecmascript.org/doku.php?id=harmony:modules>

~~~
mtdev
You can add some functionality to limit scope and implement a caching system
(which is also proposed here) to deal with M*N issues with a focus on keeping
backwards compatibility. E.g. the caching is done automatically, the scope
spamming control is a best effort solution based on static analysis (for
example).

Moving from headers to modules is a fundamental change in how the language
operates and is guaranteed to further break compatibility between compilers
further. I worry that jumping to add these features to the language standard
is premature and we should instead look further to optimizations within the
preprocessor and linker to see if we can improve performance first.

~~~
msbarnett
> You can add some functionality to limit scope and implement a caching system
> (which is also proposed here) to deal with M*N issues with a focus on
> keeping backwards compatibility.

You can't cache the AST of a #include'd file without breaking the standardized
semantics of #include. ccache (which is what you're basically proposing) gets
away with it by just ignoring the standard and shrugging if some legal
programs break horribly when it's used. The Standards Committee doesn't have
that luxury.

You can't change existing functionality while "keeping backwards
compatibility". The draft Modules proposal does a much better job of backwards
compatibility than what you're proposing, because it allows #include to
continue to have the same semantics it has always had, and even provides a
clean way forward for using both #include and modules in the same translation
unit.

> Moving from headers to modules is a fundamental change in how the language
> operates and is guaranteed to further break compatibility between compilers
> further.

How would a standardized module system "break compatibility" between standards
compliant compilers? The title of this story is completely misleading: this
isn't "Apple's" proposal, this is about LLVM's implementation of the Standard
Committee's Module Working Group's draft proposal.

> I worry that jumping to add these features to the language standard is
> premature

The committee was worried about that too; that's why the draft proposal was
held out of C++11 so that vendors could try out implementing it and see how it
fared in the real world.

> we should instead look further to optimizations within the preprocessor and
> linker to see if we can improve performance first.

It's not as though nobody has ever tried to improve pre-processor or linker
performance; people have have been looking at those issues since the beginning
of C. There is, fundamentally, no way to improve the performance of the
current system without breaking semantic compatibility with existing programs.

------
cperciva
I don't buy the performance argument: NxM -> N + M only works if every one of
the N .c files is including every one of the M .h files.

If you're spamming #includes like that, you need to fix your #includes, not
redefine the language.

~~~
swift
If a .c file includes U x M .h files on average, where U is a number from 0 to
1, then the work we expect to do at compilation time is proportional to U x (N
x M). This is still O(N x M).

~~~
cperciva
True, but in my experience U tends to decrease sharply as M increases.

------
sev
They mention:

> "‘import’ ignores preprocessor state within the source ﬁle"

I wonder if that would remove specific use-cases where you wouldn't want the
import to ignore the state of the pre-processor within the source file?

Overall, I like it!

~~~
cpeterso
You can still use #include if you want preprocessor state to leak into a
specific header file.

------
dchichkov
It breaks "there should be one-- and preferably only one --obvious way to do
it". And quite a few others.

But as a stand-alone feature, stateless preprocessor includes could be a nice
feature to have.

~~~
dangayle
Is C bound by Zen?

~~~
dchichkov
Zen wisdom crosses all boundaries :)

------
georgeg
Is D not doing this sort of thing already or am I wrong?

------
greggman
It's possible I don't understand the proposal and I'm probably going to get
egg on my face and but I'm not sure I really want this. If I wanted Objective
C or C# or Java or Python I'd use Objective C or C# or Java or Python.

I actually like the preprocessor. I like that I can write code like this

    
    
        #ifdef DEBUG
          #define DEBUG_BLOCK(code) code
        #else
          #define DEBUG_BLOCK(code)
        #endif
    
        void SomeFunction(int a, float b) {
          DEBUG_BLOCK({
            LOG_IF_ENABLED("Called SomeFunction(%d, %f)\n", a, b);
          });
          ... do whatever it was SomeFunction does ..
        }
    

In other languages that I'm used to there's no way to selectively compile
stuff in/out.

I like that I can change the behavior of a file for a single include unit

    
    
       -- foo.cc --
       #include "mylib.h"
    
       -- bar.cc --
       #include "mylib.h"
    
       -- baz.cc --
       #define MYLIB_ENABLED_EXPENSIVE_DEBUGGING_STUFF 1
       #include "mylib.h"
    

because enabling it globally would run too slow

I like that I can code generate

    
    
        // --command.h--
        #define COMMAND_LIST \
           COMMAND_OP(Stand) \
           COMMAND_OP(Walk)  \
           COMMAND_OP(Run)   \
           COMMAND_OP(Hide)  \
           COMMAND_OP(Jump)  \
    
        // make enum for commands
        #define COMMAND_OP(id) k##id,
        enum CommandId {
            COMMAND_LIST
            kLastCommandId,
        };
        #undef COMMAND_OP
    
        // --command.cc--
        // Make command strings
        #define COMMAND_OP(id) #id
        const char* GetCommandString(CommandId id) {
          static const char* command_names[] = {
            #define COMMAND_OP(id) #id,
            COMMAND_LIST
            #undef COMMAND_OP
          };
          return command_names[id];
        }
    
        // make a jump table for the commands
        typedef bool (*CommandFunc)(Context*);
        bool FunctionDispatch(CommandID id, Context* ctx) {
          static CommandFunc s_command_table[] = {
            #define COMMAND_OP(id) id##Proc,
            COMMAND_LIST,
            #undef COMMAND_OP
          }
          return s_command_table[id](ctx);
        };
    

Or this

    
    
        class Thing {
        public:
          void DoSomething();
    
        private:
          #ifdef USE_SLOW_LEGACY_FEATURE
          // needs access to Thing's internals.
          void EmulateOldSlowLegacyFeature();
          #endif
        };
    

Yes, I can try to hide the implementation but again, the reason I'm using C++
is because I want the optimal code. Not a double indirected pimpl. If I wanted
the indirection I'd be using another language.

I love C/C++ and it's quirks. I use it's quirks to make my life easier in ways
other some other languages don't. Modules seems like is ignoring some of what
makes C/C++ unique and trying to turn it into Java/C#

People saying the preprocessor has issues are ignoring the benefits. I miss
the preprocessor in languages that don't have one because I miss those
benefits.

You could say, "well, just don't use this feature then" but I can easily see
once a project goes down this path, all those benefits of the preprocessor
will be lost. You can't easily switch your code between module and include,
especially if it's a large project like WebKit, Chrome, Linux, etc.

Leave my C++ alone! Get off my lawn!

~~~
dietrichepp
They're not ditching the processor, they're just making it so the preprocessor
state is isolated for modules. You can still #define all you want, but it
won't cross the boundary between your code and a module.

~~~
fleitz
Which is pretty much how linking to a shared library works, your #defines and
#includes have no effect.

~~~
dietrichepp
Well, no. If you make a #define it will affect code in the #include. This is
why you sees puts defined as

    
    
        extern int puts(__const char *__s);
    

The reason for the double underscore is so the definitions won't be affected
if 's' is a macro.

------
Executor
Reading this PDF made my day! I can't wait for headerless c/c++. When will
this be implemented???

------
optymizer
I hear D is backwards compatible with C (and C++?). They already have modules:
<http://dlang.org/module.html> . I should use D more often.

~~~
kevinnk
D is not backwards compatible with C or C++, it doesn't even have a
preprocesser. I think what you mean is D code can link C code if you use the C
standard libraries.

~~~
optymizer
Yes, I was referring to ABI compatibility. You are probably referring to
source-level compatibility. If it's compatible at the binary level, it's
compatible.

------
raowarrior
good idea ,but it is not useful for me , i like traditional programing type of
C/C++ ,and my mechines for server are strong enough

------
jfaucett
He's basically just describing a clunkier version of the Go package model

------
zopticity
Looks like they are trying to rewrite Python.

------
jheriko
the one problem i agree with him on is performance - from what i can see his
proposal does something to potentially improve that, but its not clear. i
worry that caching pre-processed files is a red herring - its it really faster
than re-including? what about preprocessor states? what about macros in
include files? etc.

i feel that the preprocessor ultimately ends up with the same amount of work,
just an extra pass for each included header to build a version to be cached…
not to mention the complexity required to handle the multiplicity of pre-
processor states required for this. maybe i am being dim and missing the
obvious.

tbh, i'd rather they made their compiler work properly, like respecting
alignment on copies with optimisation turned on, or implementing the full C++
11, before adding language features to fix problems that nobody really has.

~~~
msbarnett
> i worry that caching pre-processed files is a red herring - its it really
> faster than re-including?

For future reference: In large C++ projects, it's not at all unusual for
greater than 90% of the compile time to be spent parsing (and re-parsing, and
re-parsing, and re-parsing, _ad nauseam_ ) header files.

> what about macros in include files?

The AST of the header is persisted. If the parser can parse macro definitions,
macros will continue to work normally. The only case where you would need to
fall back to #include is those rare times in which you _want_ defining
something in the source file to alter the parsing of the header (and even in
most of those, you should just define the constant in the call to the compiler
eg) "clang foo.c -DWITH_FEATURE_X")

> i feel that the preprocessor ultimately ends up with the same amount of
> work, just an extra pass for each included header to build a version to be
> cached… not to mention the complexity required to handle the multiplicity of
> pre-processor states required for this. maybe i am being dim and missing the
> obvious.

The pre-processor merely slurps the text of the included file into the
including file; the compiler then parses the entire gigantic soup of <text of
source file> \+ <text of all files included by source file>. The semantics of
this require that every included header be re-parsed once per compilation
unit. Imagine you have 5 .cpp files, each containing 200 characters, and each
#including iostreams (which weighs in at roughly 1 million characters). Each
.cpp file, post-preprocessor phase, will be 1 million, 200 characters long. A
full compilation of the project will require the parsing of 5 million, 1
thousand characters. Any subsequent full build will require parsing the full 5
million, 1 thousand characters. Changing one .cpp file will result in the need
to parse 1 million, 200 characters.

In this proposal, by contrast, an included file need only ever be parsed once;
its AST can then be persisted and referenced eternally. In our above example,
the iostreams header will be parsed once, and each .cpp file will be parsed
once. This means a full build, the very first time iostreams is ever
referenced in any compilation on the system, will require parsing 1 million, 1
thousand characters. Any subsequent full build will require parsing merely 1
thousand characters. Changing one .cpp file will require parsing merely 200
characters.

~~~
jheriko
you have completely missed my point. i know full well how much time is spent
parsing these things and how the mechanism works - i don't believe this
proposal will actually improve that. i also don't believe your answers address
the point i was trying to make either... namely that whatever preprocessed
import module thing is created, it still has to be included into the
compilation unit somehow... even if there is some kind of linkage type
solution going on with a lightweight interface - that feels functionally
equivalent to what most of the standard library headers already are - so i
don't understand what could possibly be that much faster or better about it.

not to mention that this is not a problem if you encapsulate your use of
standard libraries properly... maybe 10-20 compilation units have to use it if
you like to split your stuff into files a lot.

standard headers are poorly written/designed by including so much crap
everywhere. why can't i have specific - per function headers which include
minimal stuff?

fix the headers, not the preprocessor.

~~~
msbarnett
> i know full well how much time is spent parsing these things and how the
> mechanism works - i don't believe this proposal will actually improve that.

Well, then you're pretty much 100% wrong in most C++ projects.

I honestly don't know what to tell you here. That persisting header ASTs
between translation units is faster than re-parsing should be trivially
obvious, and if it isn't trivially obvious, then the mere fact that
precompiled headers and ccache dramatically speeds up builds ought to make it
_empirically_ obvious.

The facts just aren't on your side.

> namely that whatever preprocessed import module thing is created, it still
> has to be included into the compilation unit somehow

Well, yes, obviously. In the current model the compiler slurps the header into
the source file, and parse the entire combination, resulting in the parse tree
of the header + the parse tree of the rest of the file. In the proposed model
the compiler pulls the parse tree of the header out of cache and just builds
the parse tree of the file. Since in C++ header parse trees are often quite
expensive to build (since template declarations have to live in the headers
and their parse trees are incredibly expensive to build), this ought to be a
blindingly obvious win.

