They modified the type system so good, idiomatic C code won't compile, but they refuse to make writing unsafe code any safer in any meaningful way.
They added new syntax and reserved words, again breaking C compatibility, but they refuse to clean up C syntax even to the extent of removing the bizarre octal notation which makes 0100 do something deeply, deeply unexpected to anyone who doesn't know some obscure bits of lore.
Finally, they moved a bunch of real complexity to the compile phase, something C largely doesn't do, but they refuse to replace the file-based system they inherited from C with anything which would make C++ compiles even a little bit faster.
C++ Modules - a chance to clean up the language? https://www.reddit.com/r/cpp/comments/agcw7d/c_modules_a_cha...
In short, C++ is a company-driven language instead of a community-driven one like Python and Rust. Some C practices are maintained because voters from the committee agree that these practices are necessary (for their code) and it's impossible to make a completely backward-incompatible change when they have >1M LOC code base.
The C++ team is up against huge corporations that actively lobby for and against any of their changes - and to top it all, they do it for free. Some obscure C feature is bound it get trampled on.
With each rev of the dialect, put all new functions and variables into a new namespace like "std2019:...".
Old code that doesn't know about new features doesn't run into them at all.
Keep the old namespaces untouched: old code finds everything it expects.
There is no bullshit like reserved keywords: all symbols are namespaced in packages. New syntax is just macros named by symbols.
The only kind of breakage left is system requirements (new tools don't target old hardware well due to larger footprints; new dev environments don't run on old dev hardware; implementations drop support for unpopular platforms).
What about changing core semantics though?
Core semantics can be wrapped in an operator that is in some namespace. We develop (lazy ...) and everything in it is lazily evaled.
File-level annotations are possible. Emacs Lisp introduced lexical scope instead of dynamic scope as a file-level option.
There is quite a non-trivial difference between ZetaLisp and Symbolics Common Lisp: both a very large Lisp dialects. ZetaLisp is dynamically scoped, has base 8 integers, the old flavors system is widely used, Fexprs, ...
All these Lisp dialects are provided in the same system and one switches the reader/printer and the packages when changing the language: on can set the language context in a listener and also per file.
I totally understand why you included this, but it's a shame that you felt it to be necessary.
Refusing to make any breaking changes is how we end up with... well, PHP. There's nothing inherently wrong with breaking change. Too much of it can annoy developers, sure. And sometimes it has to hurt a little bit so progress can be made (see Python). But doing little to no breaking changes ever will eventually cause problems and/or headaches.
That said, I wonder how many people moved to (for example) Rust simply because C++ moves slowly. Probably not that many, but surely it's nonzero.
I guffawed and then threw up in my mouth a little bit at this. C++ is riddled with dozens of horrible design flaws and kludges that make me think your definition of engineering is just plain wonky. Build time is one. Because of an inherent O(n^2) build system complexity, many large C++ codebases require massive, massive compile farms to achieve even reasonable turnaround times. They produce huge build artifacts that take practically forever to link and are still only barely debuggable. E.g. a recent audit of the V8 build time revealed that it compiles at an effective overall throughput of 186 lines of code per second. How do we deal with that? Distributed build system and 50+ core workstations.
This is a travesty of engineering.
C is three things:
1. A mid-level language, by which I mean a language where you do all of the mechanics but you abstract most of the machine-specific details. Memory management is manual, but you don't know or care about how malloc() works behind the scenes, and you can't even ask. You get integers with defined semantics up to things like overflow/wraparound, but you don't know or care if they're implemented in terms of multiple machine words or even if there's such a thing as a carry flag. It's midway between a macro assembler and, say, Python, which does nontrivial magic behind the scenes.
2. An unsafe language, with nontrivial undefined behavior (the overflow/wraparound stuff I mentioned above) with no guide rails like the ones Java provides, where if you overflow the language specifies an error will occur. In C, very little checking code of that type is emitted or specified.
3. A systems programming language, where programmers do unsafe things, like writing values to DMA registers, things the language can't abstract away because... well... you have to write an OS kernel in something, after all. This is unsafe by design, as opposed to the above, where things are unsafe because of a safety/speed trade-off.
Rust proponents want to separate 2 from 1 and 3, and make a language where you can do things by hand and do unsafe things on purpose, but the language has more guide rails to prevent you from doing unsafe crap by accident.
C++ apparently wants to separate 1 from 2 and 3, to move more high-level and get more "language magic" (templates, iteration stuff... ) without making the language any safer in any respect. That's just an uncomfortable position for a language to be in.
My point is, it could move Rust-ward and high-level-ward if it ditched some C-isms from the language... but you gave a good explanation of why it won't.
The first systems programming language to introduce this concept was ESPOL , created in 1961, already with the notion of unsafe code blocks.
Even better, according to surviving manuals, binaries with unsafe blocks were tainted and required enabling execution by the admin user.
(I don’t know why Rust was the agent of change in that priority, but I’m very glad for it!)
EDIT: COBOL is good at this, or tries to be. Lots of loopholes but it’s clear from this article anyways that they tried to make it a safe language. But they prioritized features like “call arbitrary C functions” over safety, and you can just write unsafe code without even a hint that you’re doomed. That’s perhaps the essence of what I see as the difference: Rust forces you to declare your unsafe intentions to do unsafe things.
Mozilla did an analysis of security bugs in Firefox and many of them were memory safety issues. That’s part why they sponsored it, at least.
How so? Take std::unique_ptr for example, which exists since C++11. It facilitates using the language in a safer manner, and at a higher level of abstraction (you no longer have to manually malloc and free/new and delete - you just have the concept of scoped ownership), while at the same time not adding any “behind the scenes” magic (e.g. garbage collection) so as not to leave room for any high-level language to be more performant than it is - that’s the real motto of C++, to my understanding.
Stuff the compiler does statically is still magic. Anything which makes it more difficult to see the assembly language through the code is magic.
Did I break a taboo? Am I not supposed to notice that the + operator in C++ programs can represent vastly different amounts of CPU work and memory usage in different contexts? Or that RAII does substantial amounts of work which is hidden from the source code?
Seems like magic is magic regardless of whether it happens at compile time or runtime.
Almost any actual potential leaps in improvement get watered down and neutered way before they have a chance to get near the language.
bond yields in finance
Well the community has more than 1 MLoC of code and is quite reluctant to break compatibility: look at how hard it has been to get Python to really migrate from v2 to v3.
If anything c++ is more willing to make breaking changes (e.g. abandoning the terrible auto_ptr) because of the better tooling (willingness of compiler vendors to put in multi-standard support and appropriate warning flags).
Slept through Python 2 vs 3.
As someone with 30 years of C experience, maintaining one a 60,000 LOC code base that compiles as C and C++, I don't agree, not even slightly.
Each one of the things diagnosed in C++ but not in C is a good idea.
> bizarre octal notation which makes 0100
That would be a serious incompatibility. POSIX code like chmod(file, 0644) stops working. Under no circumstances would it be acceptable to just treat 0644 like 644; it would have to be diagnosed.
Octal notation is so needed here, that if it were removed, programmers would resort to macros like OCT(6,4,4).
What's strange in C is octal character constants, which must be exactly one, two or three octal digits. So \0000 is \000 (same as \0) followed by the literal character 0.
There is no leading zero; there are only octal constants: \123 is octal, not a hundred twenty-three.
0100 === 64
077 == 63
078 == 78
The C compilers at least don't accept 078.
This is some IOCCC level stuff, not distinguishable in all fonts.
I'd rather borrow a couple of ideas from Verilog, including the cosmetic underscore and the ability to represent binary, e.g. 8'b0101_1100
Also, seriously, O and 0 are commonly mistaken in passwords.
If you standardize this the leading zero becomes somewhat unnecessary. But of course it's probably too late for C. You would have to depreciate the old notation for decades before it could be removed.
It would be interesting to note that such warnings would undoubtedly point out a few errors in existing code where people 0 padded a number they expected to be decimal. Many would be harmless, 00001 for example, but a few could have been causing unintended side effects (incorrect initialization values for example).
Well which is it?
Java and (Turbo/Free) Pascal try to have few ways of dividing the materials: Java has class files (and Jar files, and modules), Turbo Pascal has Units. Both involve having a certain namespace correspond to some file the compiler can find and compile, and where you then use the definition from the compiled file.
Lisp and C/C++ have a tradition that goes back far enough that compilation units and namespaces (packages in Common Lisp) are disjoint things. C++ is on the path to making this more complicated with the addition of modules that create another portioning of the compiled stuffs without forcing the others (headers, namespaces, source files) to agree with it.
So, headers are not the problem - it's that for a given entity in the source code (namespace, class, function, whatever) it's not automatically clear to the compiler where to look for it. Which effectively leads to the funky ball of dependencies between source files being variously manually specified in (auto-, C-)make files as well as extracted automatically and still being error-prone.
In python, all code paths are "statically discoverable". Meaning if a symbol exists in a file, I can find where that symbol comes from in that file. And if it's an import, the import tells me where to look for it in another library or relative path.
In c++ when I was trying to learn it was a lot of, "okay so why is 'foo' available here?" "Because it's imported by the 'bar' library that you're importing at the top of the header."
That's not exactly true. modules can monkey-patch themselves and other modules; they can override the import mechanism to do all sorts of things. It's usually considered bad form, but .. it's possible, and some people like it that way.
Practically speaking, when it comes to learning by reading code, it's invaluable.
Macros are side effects and there's really no sane way to constrain them (currently), but C++ modules are an attempt to do so.
In several big C++ source bases that I've worked with, we reversed the process - at compile time, concatenate all the source files together, include everything you need just once. It's massively faster than precompiled headers, at the cost of sometimes running out of compiler heap, and inability to parallelize builds. The loss of parallelism was offset by the much faster compilation, though.
You need use site "caching" to be able to reuse template parsing beyond the parsing step. I think linkers do a little bit of this, but that's after compile time.
Even just caching the parse will probably be a little win in the case of complex C++ headers (esp. system headers with lots of platform defines, etc), but it's unlikely to be a panacea for compile times.
I think the interesting part is what kind of tech can be built on top of the "headers are isolated" parts of the C++ modules proposal. The compiler are people are nothing if not inventive!
Anyway, yeah, modules should do sidestep this challenge.
Response from a CMake dev:
> Well, by the time CMake could discover -MM flags, the build has already been written and CMake (the program) is out of the picture. Linking to a CMake target is also not just "add this library to your link line" either, so a simple response file written somewhere during the build for the linker to use is not sufficient (nevermind that this file may be updated by any TU compilation rule in a library target, something build tools tend not to like too much). I guess combination configure/generate build tools can do this, but CMake is a build generator and does not execute the build at all.
So you don't ever get "undefined symbol"... you instead get "cannot include header"! Not sure if that's an improvement.
Yes, it is a big improvement. The error message tells you exactly what you failed to do. If a target does not export all of its headers correctly, then fixing that fixes it for everyone consuming that target, which scales well to large code-bases.
It is not a coincidence that so many companies (Google, Facebook, Amazon, Twitter, Thought Machine) have converged on this design.
Visual Studio's solution: #pragma lib(foo). That's that problem solved. This is not the big problem.
The big problems are a mix of 3 entangled problems:
- textual substitution by macros with cross-file scope: the behaviour of an included file depends entirely on previous files.
- object memory layout and API are unnecessarily bound together, so you can't change so much as a single function argument without completely recompiling all dependent code.
- a number of things (mostly templates but some strings) are duplicated hugely beacause they're defined in the headers, and then de-duplicated at link time. So compilation ends up disk-bound while the compiler writes out lots of data that will be thrown away.
So, you can change one byte in a header high up the dependency chain and recompile most of your code. Slowly.
Not even close to that easy; platform differences make it a pain to find libraries consistently. (Windows tends to make it most difficult, with Apple close behind for a few libraries.)
These can readily be put in different headers. People often don't, but...
This is arguing that there is a solution to manage a problem, and that because there are (caveated, external, third party) ways to manage it, there isn't a problem. Yes, there are ways to mitigate some of the problems of headers, but all the bookkeeping is part of the criticism.
And there's still duplicate definitions in separate places to keep in sync, you end up having to move all your code into headers if you want to make things generic (with compile time repercussions), and it's easy to mismatch header versions with the libraries for the same file.
Linking to the libX library for X.h is the least problematic of the header problems/criticisms.
Are you thinking of C#?
Haven't demonstrated that headers are not bad. Just that there's a involved, error-prone, half-manual, workaround to "automate" their discovery and use.
The headline by OP seems like a clickbait.