Hacker News new | past | comments | ask | show | jobs | submit login
C++ Pattern Matching Proposal [pdf] (open-std.org)
192 points by je42 on Jan 3, 2020 | hide | past | favorite | 212 comments

Reading over the examples, it seems like there's a lot of good stuff here, though I wondered how possible it would be to extend switch with some of this capability vs adding a new keyword. It looks like this possibility was considered and is addressed in §7.2— basically the flexibility of case labels and break statements being able to appear inside control structures leads to all kinds of chaotic corner cases for switch (Duff's Device and friends) that are incompatible with the proposed semantics for inspect, with no obvious way to signal from the source which one you want.

Seems like `inspect` would turn `switch` into a deprecated practice, which is preferable imho. This way, it's easier to write clang-tidy rules for conversion and to raise red flags in PRs.

Yeah, it makes sense. It's just a bummer when switch feels like a much better-named keyword for the functionality, in the long term.

You almost wonder if they could do something like how `enum` got fixed up with proper scoping as `enum class`. Like, could `switch` retain the legacy functionality with the new stuff triggered when you do `switch from`, `switch if`, or similar. Or there could be new keyword added after the parenthesis, something like:

    switch(<thing>) cases {
       <opt1> : ...;
       <opt2> : ...;
       __ : ...;
This has a clear connection back to the legacy command in terms of a switch always having either case labels or a cases structure. And zero chance of breaking existing code, given that the `cases` keyword would only be valid in that position and could still be used as an identifier elsewhere (though obviously that would be discouraged).

Or even just have a rule that if a switch uses patterns, it can't have any of the weird legacy misfeatures.

I think it's just hard because they really are completely different in behaviour. A conventional switch is just selecting an integer value— under the hood it can be a jump table, a binary search, or something else entirely. This, on the other hand, is always going to be an if-cascade.

That's an implementation question, which seems orthogonal to questions of syntax and semantics.

As for efficiency, we have decades of experience of compiling pattern matching, so i would be surprised if it was always just an if-else ladder. The authors of the proposal to add it to Java think that, compared to explicit if-elses, "it is more optimizable too; in this case we are more likely to be able to do the dispatch in O(1) time":


I imagine this is through approaches which convert the cases to integers, and then do a traditional jump table etc.

"inspect" is weirdly apropos, because it hearkens back to Simula-67 (which used it for what we'd call type switch today - so, a subset of pattern matching). The original C++ syntax borrowed a lot from Simula.

switch is an extremely useful control structure, though needed rarely. It's basically a kind of computed goto, something not found very often outside C/C++ anymore.

If there's a real inspect, people will be able to stop treating switch as a deficient approximation to one, and maybe appreciate it for what it is.

Yeah, it creates a jump table, so basically an O(1) operation in assembly rather than O(n) comparison worst case for every case.

However, this means switch _only_ works with integers, which is rather limited and confuses newcomers. For instance this does not compile:

   void f(std::string const &str) {
     switch(str) {
       case "option_1":
         std::cout << "First option\n";
       case "option_2":
         std::cout << "Second option\n";
         std::cout << "None of the above\n";

The workaround is constexpr hashing

   void f(std::string const &str) {
     switch(hash(str)) {
       case hash("option_1"):
         std::cout << "First option\n";
       case hash("option_2"):
         std::cout << "Second option\n";
         std::cout << "None of the above\n";
It could be as little as converting the final 8 bytes to a uint64_t above and using that. That just happens to be all of them in this case, but there are a bunch of them. Or using minimal perfect hashing. None of this is the language though

Can you guaranty at compile time that any hash you switch on is unique? Else this is just really dangerous if you don't have perfect hashing for the object you switch on.

Switch is used pretty often when UI needs to look different based on certain application states (consider any 3+ state button). I was curious if I used it recently, and, in fact, here it is in some code I wrote a few months ago[1]. It's also used all the time (also happens to be fairly idiomatic) when dealing with Redux, MobX, etc.

[1] https://github.com/dvx/lofi/blob/master/src/renderer/compone...

But that, like most uses of switch, is essentially uninteresting as a control-flow construct. It's just a stylistic point to use a switch here instead of an if or a match. JS's switch has no extra power other than fallthrough anyway.

Here's a poor-man's coroutine (with serializable state!) where switch plays a more interesting role.

    void move_robot(state* s) {
      switch (s->resume_at) {
      case 0:
    #define SUSPEND(resume_pt) s->resume_at = resume_pt; return; case resume_pt:;
    #define PAUSE(resume_pt) s->pause_time = 20; while (s->pause_time) { SUSPEND(resume_pt) }

      for (;;) {
        /* three knight moves, then forward */
        for (s->i = 0; s->i != 3; s->i++) {


Switches are quite useful when you have a finite set of options you want to match against.

c# has been adding pattern matching capabilities to the switch statement and I think it's created a mess. Both syntax wise and possible bug prone. Switch went from choosing a distinct case to matching multiple cases. I'm sure Mads knows more than me though, so it's probably a safe change.

Just skimming section 4, this really just looks like even more syntax being added to the language without enough benefit to justify the additional complexity. At this point, I think I've made the decision to just stick with C++11 indefinitely and backport the rare useful feature. I don't want to have to memorize all of this extra stuff.

What we need is a compiler flag that says, "give me the good parts of C++11/14, but leave out the bad stuff from C++98, and fix some of the inconsistency quirks from trying to be compatible with C89".

I think core guidelines [0] are aiming that. Some of it is already supported by tools like clang-tidy, but as far as I understand the ultimate goal is to have some sort of flag (e.g. --sane) that would only accept 'reasonable' subset of c++ and reject legacy stuff.

However this document is a bit raw, lots of todos, so I'm not sure how close are we to such flag.

[0] https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

I think the only good way to do this is off of the back of C++ Modules. Having it as a compiler flag means that all files in a project, including libraries, must be compliant.

With C++ modules putting an end to the naive #include, translation units could specify the C++ specification they are compliant with, and hence gracefully remove or add new features on a translation-unit level.

This is the idea for the 'epochs' proposal https://vittorioromeo.info/index/blog/fixing_cpp_with_epochs...

This works so long as entire build is your own code. Once you get to libraries written by someone else who happens to have a different “wish list”, I would guess this flags idea wouldn’t work well.

It's just another standard option. Do libraries that are compiled with std=c++14 work with application code compiled with std=c++11? What about the other direction? (I don't actually know because I do build everything from source.) You would basically just be adding a new `std=good` option.

I have seen with clang++ that if I mix and match .o files built with -std=c++11 and -std=c++17 [something I have occasionally done by accident] and they reference the same classes, sometimes they fail to link, not due to standard library types differing but ones I defined myself that can compile with either standard.

The problem with c++ is many libraries are mostly header files, and they are combined with your code basically by cut+paste, so you can't rewlly seperwte libraries from your own code usefully.

Could you please explain more about the process you're using? Are you compiling external dependencies as an integral part of building your own code?

This is the only sane way of using C++. All C++ dependencies in the same repo, because there is no standard C++ ABI.

I am compiling my C++ codebase with gcc, clang, emscripten to wasm, visual studio. Impossible if all dependencies are not present in a "third-party" folder.

Google's Abseil library is intended to be used this way.

Otherwise, use the C ABI between dependencies.

> standard C++ ABI.

All major OS have a standard, very stable C++ ABI. Some of them also guarantee a stable ABI for the library components.

Of course you cannot use a library built for, say, Windows on Linux but that's true for all languages.

Assuming all things use the same rtti and exception settings and all are using same C++ standard library (libstdc++ vs libc++)

And then there are other flags - for instance when gcc/libstdc++ broke std::string in order to switch from a recounted string to a non-refcounted with small string optimisation in order to be C+11 compatible ...

Sure, there are system defaults, but things vary and often recompiling everything and statically linking avoids those issues.

For windows that is a very recent change. The ABI used to change with every compiler rev.

Further without buyin from the actual standard there’s no gurantee a breaking change won’t be forced by the standard at some point.

The ABI used to change with every compiler rev.

That's only because they changed the standard library. The ABI itself didn't change much if at all, I believe you can still get the latest compiler to link with the far more compatible MSVCRT.DLL that's been there since the first Win32 OS (Win95).

I guess it matters whether you consider name mangling to be a part of the ABI. Name mangling used to change quite frequently.

Starting with VS 2015 Microsoft promised to make these changes much less frequent but AFAIK didn't promise to never make changes here. MSVCRT.DLL has kept stability but Microsoft makes no official promises about it. It's considered undocumented.

Source: https://docs.microsoft.com/en-us/cpp/build/reference/decorat...

I wonder which marketing genius decided to rename "name mangling" to "name decoration", and who they thought that lie would fool.

  Undecorated name                   Decorated name
  int a(char){int i=3;return i;};     ?a@@YAHD@Z
  void __stdcall b::c(float){};       ?c@b@@AAGXM@Z
OK, Microsoft.

MSVCRT.DLL has kept stability but Microsoft makes no official promises about it.

That's because it doesn't need to be said. Countless numbers of existing applications would break if it changed, and MS still cares enough about backward compatibility and knows that many huge and important customers depend on them to not do that.

This is really moot since MSVCRT is a C runtime lib with MSVCPRT being the the C++ one. Again no stability guarantees are made with any of this.

Microsoft promises backward compatibility but not forward compatibility. There is no promise you can link new code with old DLLs. Just that older binaries will still execute unmodified. Very recently with VS2015 they promised "some" forward compatibility and to make fewer changes.

In practice there were things you could do to increase the chances of this working. But to call this "stable" would render the term meaningless.

Has msvc really changed mangling in the last, say, 20 year? I know that the compliler has a conformancr issue since forever because they mangle struct and class names differently and they do not want to change it because it would break the ABI.

Yes although I don't have an exhaustive list. Both from personal experience, and this is also confirmed from the link I gave above.

> The decorated naming conventions have changed in various versions of Visual Studio

Not for libraries exposed as DLL or COM.

Although one might argue that is only a C++ subset.

COM is based off of the C ABI not C++. This was done specifically because C++ does not have a stable ABI.

There is no such thing as C ABI, rather Windows ABI, which you can even select several calling conventions, including Pascal based one.

COM is based on the way Visual C++ lays out the VMT, and only masochists use it from bare bones C.

C ABI is basically structs + functions + function pointers. On Win32, functions and function pointers includes calling conventions, but that's not really important.

COM is a C ABI in a sense that its spec defines everything in these terms. Yes, in practice it translates nicely to C++ vtables, but the stability guarantees are defined on the lower level.

From that point of view, it is also a Pascal ABI.

Sure, and several dozen other languages. That's really what it is - the lowest common denominator that almost every language can work with.

As the OS ABI should be.

This is a kind of a chicken and egg thing - it is the OS ABI because everybody supports it, and everybody supports it because it's the OS ABI. I think that the fact that it has the features that it does is mostly a coincidence of timing, and languages popular at the time on platforms that were popular enough. If it happened 15 years later, the feature set might have been larger.

Anything in your headers is (more or less equivalent to being) copy-pasted into your own code. Many dependencies are header-only (which is the only option I'm aware of if the code is templated).

They’re using some header-only libraries, presumably. Which, after all, is one of the only unique benefits of C++ that would lead you to choose it for a green-field project over, say, Rust. Header-only template libs + link-time WPO = inlining of template-instantiated monomorphized methods to their type-strict callers. This is the feature that leads e.g. LLVM to be written in C++.

Rust doesn’t have headers, but can still do this.

Even given the inevitability of disagreements about which are the "good parts", I'm pretty sure this would be my "standard version" of choice if it was available.

The "C++ Core" guidelines get pretty close to standardizing the 'good parts' of C++.

I'm aware of the guidelines and apply them, but that's only a small fraction of the problem. What I need is a compiler flag that enforces the guidelines so that I can write it into my software management docs in place of "-std=c++11".

From my observations of working as a developer for a long time, it seems the majority of developers can be classified in two groups. One group will really advocate for adding lots of features to a language. They are also the ones who don't mind or even seek out constant change for the sake of change. They like the new and shiny, and don't care that much about anything else. They want to rewrite code even if it is perfectly working and the problem it solves does not change, just to make it more "modern". They don't think anything of burdening everyone else with the additional complexity added by their demands, as long as it makes their specific use-case easier. Their attitude can be summed up as "it's not my problem".

On the other hand, the other group values stability and not feature proliferation. They don't want to have a constantly changing foundation, but something stable they can build on. They want a small language which they can more easily master and then use to solve problems with. They think code should not ever need to change again as long as the problem it solves is the same. Their attitude can be summed up as "do what you can with what you have."

The C++ and a lot of newer programming languages (especially web stuff) seem to mainly be composed of people from the former group, which is much larger than the latter, but I think we really need more of the latter group in a lot of the tech industry. My go-to language is still C89 and I don't need anything more, because it gets the job done and it's simple enough to be very portable.

I like to consider programming as an art form with many schools of thought. As with art in other mediums, developers enjoy creating and interacting with their art in different ways.

Conflict builds when developers exist in the same ecosystem with different opinions on what this art means to them. It feels like everyone is accomplishing different goals because we really are.

Some people want to continually iterate their code until it matches a perfect vision in their head; It's less (if even at all) about the functionality and more about how we got there. That obviously frustrates people who care about the functionality far more.

The issue with this way of thinking is that programming lives in a small universe of grammars and not in the real world where the artist can choose any medium to convey his work. So small is this universe that you don't even need art to design the most optimal program. You don't even need science either... you can use logic to calculate the best program design.

Thinking of programming as an art is entirely the wrong direction. Pattern matching, especially exhaustive polymorphic type checking will categorically reduce the amount of logic errors possible with the program. This is logic over art at work.

The main issue for me with c++ is that its overall design has largely been artistic over logical. So tacking on pattern matching makes it an even bigger mess than it already is...

I think you're striking a false dichotomy between "logic" and "art" here. Programming languages are abstractions created by abstract creatures (/hello/). You simply can't solve a programming language, because the parameters for success will be just as invented and unmoored from absolutes as the subject in question.

There is no false dichotomy. There very well is a clear dichotomy.

Another person had a similar notion to you and my response to him is more in depth: https://news.ycombinator.com/item?id=21952737

In this case the language we designed is assembly. The parameter for success is: less possibility for bugs. Exhaustive type matching is a definitive and logical improvement because it shrinks the codomain of possible erroneous states, thus less bugs.

There are certainly more mediums immediately available in a real-world artist's arsenal, but the depth of programming goes beyond the grammar: We do have choices in language (Rust vs. C++), program architecture (APIs exposed, code organization, how it works with the rest of a system), and balancing what "most optimal" even means. With exceptions and constraints of course, I don't think any of these have a universally logical answer.

My favorite example is the Requests library for Python. It dubs itself "HTTP for Humans" -- I find that quite nice from a human perspective, but it may not be the most logical networking choice in all cases. It's an artistic choice by the developer to make it more human-friendly.

At any rate I think the original discussion is, unfortunately, not the greatest example to make this comment on. Comparing pattern matching to switch-cases is like comparing a high-quality brush to a low-quality one.

API's exposed is a user defined requirement. Most optimal is also user defined.

Whenever something is "user defined" that means you make it up because you're actually exiting the world of the simple program. These things are akin to business requirements or UX/UI.

A popular word that's often used to figure out these things is the word "design" as opposed to "calculate." When we "calculate" something we are performing a mechanical operation designed to solve a problem with the most "optimal" solution. When we "design" something we are pulling a solution out of our imagination with no way to verify that it is the most "optimal" solution. Note that the word "optimal" is something we make up and define ourselves.

For example: What is the optimal way to get from point A to point B? One definition of "optimal" in mathematics (a small universe similar to programming) is the shortest distance. For this we have a calculation: a line. Another definition of "optimal" in the reality we occupy as humans (a large universe much bigger than math or the simulation in our computer systems) is the shortest time it takes for a human to move from A to B. The solution to this is "designed" we have several options to choose from but to fulfill the requirement of shortest time... we currently tend to choose a plane as it is the fastest vehicle available. However, we have no way of knowing whether the plane we took is the best possible solution humanity has ever come up with. The amount of possibilities here is so large we can't calculate a solution.

Programming within the bounds of "design" requirements and user specified definitions of "optimal" lives in a small universe similar to a mathematical universe of axioms and theorems meaning that we can very well calculate the most optimal programs rather then design a sub optimal one. There is much research on this topic, I'd look into Prolog, category theory, dependent types, formal methods and that kind of thing to learn more. It's very deep and is basically a whole different topic.

So given a portion of the definition of "optimal" that most people can agree on: "less bugs at zero performance cost," pattern matching is a calculated improvement over if statements and switch cases. Thanks to exhaustive matching you must handle every possible instantiation of a type or the program cannot compile. This is a definitive and logical improvement over all other case handling methodologies.

There is no need for an artistic analogy to illustrate a point, the improvement is definitive and logical.

Whenever a human turns to "art" to solve a problem it literally means they don't have the knowledge on how to find the most "optimal" solution. Also very likely they don't even have a clear definition of what they want as "optimal."

This is fascinating and indeed quite deep. You've given me a lot of great perspective, thank you, there is clearly much I can learn here.

> They don't want to have a constantly changing foundation, but something stable they can build on

For how many generations?

That's an extremely negative characterization of people who want more features in a language. It's not "change for the sake of change". It's change for the sake of making everyone's programs better.

C++ is a good demonstration of both the costs and benefits of adding new features. The costs are obvious: the language and standard library are absolutely full of duplicate features, where the old thing is deprecated in favor of the new thing, but the old thing still has to be supported for compatibility's sake.

But so are the benefits. C++ hasn't always moved so quickly, after all. For 13 years, from its initial standardization in 1998 to its first major revision in 2011, the language was effectively frozen. It was never a small language, but it was stable, not a constantly changing foundation.

It was also absolutely horrid.

This is how you iterate over a std::vector in C++98:

    for (std::vector<int>::iterator it = vec.begin();
         it != vec.end();
         ++it) {
        int number = *it;
        // use `number`
One of the features added in C++11 is the range-based for loop, which simplifies that to just:

    for (int number : vec) {
        // use `number`
The old style was perfectly working, and still works. The problem it solves has not changed. It's just that it is, and has always been, a terrible solution! The noisiness hurts not only ease of use (obviously), but also robustness and understandability: it tends to obscure the actual logic the programmer intended, making code harder to read.

Adding the range-based for loop did burden everyone with additional complexity – but it was clearly worth it. The new style is not just "more 'modern'", it's straight-out better. Sure, code that uses the old style does not need to change. Often it's best to leave it alone. But if so, that doesn't mean the existing code is perfect, just that the benefits of refactoring it don't outweigh the costs (such as the risk of introducing bugs). When writing new code, on the other hand, there's no reason not to use the new style.

Not all features are such a slam dunk. Usually the benefits are not so clear, and often the amount of complexity added is higher. The result is a difficult tradeoff, a competition between different people with different interests, just as you say. People are justified in thinking the benefits are not worth the cost.

But there are benefits, almost always. This pattern matching proposal, for example, adds a ton of syntax and duplicates a feature (the switch statement), and overall it seems more complex than it needs to be. I'm not a fan: I'm not sure whether C++ needs pattern matching at all, and if it does get it then I'd prefer a different design. But if this design were accepted, I'd still enjoy using it in my code! After all, pattern matching itself has decades of history in functional programming languages, and has proven to be an elegant and expressive way to write certain kinds of algorithms and use certain kinds of data structures – things that in today's C++ are rather awkward.

it tends to obscure the actual logic the programmer intended, making code harder to read

    for(int i = 0; i < vec.size(); i++)
        // do something with vec[i]
That's even simpler and more in line with what vectors are usually used for, and when it comes time to debug, you don't have to deal with the template-hell that is the standard library iterator-framework. The for : may look simpler at first glance, but quick turns into a nightmare when you're trying to do some deep debugging (and if you haven't had to debug by reading the Asm, I think you haven't done enough C++...) because it obscures a significant amount of complexity which really shouldn't be necessary anyway.

your simpler code is at least theoretically less efficient - you get an int to size_t (unsigned long long) conversion twice (once when comparing with size() and once when accessing the vec) and you compare to the size every time. This looks like it is irrelevant in release mode, but it may end up having an effect on your debug performance which is important in many domains.

You would have to write

    for(std::size_t i = 0, N = vec.size(); i < N; i++)
        // do something with vec[i]
instead if you still insist in raw loops.

But if you look at https://gcc.godbolt.org/z/8JwJbw, you will notice that the range-based loop is actually the one which will produce the smaller code which may translate into more efficiency, and is also more readable.

Change the vector to std::list and you loose the [i] operator. Now how would you rather iterate that? 5 lines of begin() end() *it++, or a 2-liner foreach.

We haven't even brought const- and reference-correctness into the picture here. That makes the number of potential errors in the it-example even worse. There are also iterators where the size is not known up front, streaming input for example.

The point is, some features in a language can reduce complexity and improve readability (like foreach) while others can increase it (move assignment constructors?). This is only looking at it from a developers point of view though, compiler authors might disagree.

The consequences of using an inspect statement are local to the statement. So the syntax is a thing to learn, but probably not a harbinger of language complexity. It's a leaf node.

Interesting. To me there’s very little totally new syntax, and the benefits are considerable. It’s basically just being able to match on the existing structured binding patterns, as well as on type tags when the expression is polymorphic.

the benefits are considerable

Exactly. Admittedly with enough modern C++ experience and having used pattern matching I find the syntax pretty readable and clear. But even if it weren't I'd probably still be +1 pattern matching.

If they added proper metaprogramming, people could make their own pattern matching and other compile time goodies and they wouldnt have to keep shoving cruft down our throats every 3 years.

Its seems all languages are just copying each other at this point. I know C++ hates to be left out of anything so maybe this is appropriate, but to me its really destroying the ecosystem.

It was great to have different languages having different paradigms but now you can do everything in everything and code bases I'm working on are a confusing mess.

But in this case, pattern matching seems way easier to follow and evaluate mentally than the verbose visitor approach already in the language, or something based on switches/if statements. That seems to be a good way to reduce the amount of confusing mess, at least for new developments.

This logic doesn't make sense. If something increases productivity and code quality, why wouldn't languages copy it? Ideally languages should also remove unnecessary features, which is actually the thing that makes C++ a confusing mess: it's prioritization of backwards compatibility over keeping complexity low.

> languages should also remove unnecessary features

Languages are effectively immutable. Unless you can reach out and edit every single piece of code that breaks when you remove a feature, you can't remove a feature from a language. You can only create a new, almost identical language, and then spend a decade migrating people. This is what happened to Python2/3.

Very few things have been successfully deprecated in C or C++. Even massive security holes in the standard library.

> You can only create a new, almost identical language,

Thats demonstrably false. Python was a loud transition, but most of the language warts and syntax is still there. Same as the Php and perl transitions and the wacky es6 or java. The list goes on.

> most of the language warts and syntax is still there.

> > new, almost identical language,

Emphasis added.

>> new, almost identical language,

Either it means it's a different language or it's a new language. The ambiguity in interpretation leads to a statement of non-meaningful change (tautology) or a statement of important change. The gracious interpretation is the only thing worth responding to, not quibbling over the possible non-statement.

> This is what happened to Python2/3.

Nitpick (and also proving your point): there is no language Python2; you mean Python vs Python3.

> it's prioritization of backwards compatibility over keeping complexity low.

There are other languages on less-shaky foundations, but people don't use them for various legitimate and illegitimate reasons.

It's like human languages. They adopt useful words from each other all the time too. Why shouldn't programming languages do the same?

Restated, nobody under the sun can write "The last, final language anyone could ever need."

Think of programming languages as a seminar where designers are crafting that Ur-language between themselves, and folks like me in the peanut gallery look on.

C++ was always the language for constructing paradigms you choose with underlying mechanism you want.

Just like PL/I, Object Pascal, Ada, Lisp,...

C++ wasn't the first multi-paradigm language, and it won't be the last.

It’s probably the most popular one, though.

I said multi-paradigm plus the ability to comfortably do the implementation to lowest level.

AFAIK Lisp is not that low-level, Object Pascal is okay but generics/codegen are 20 years too late for example, PL/I is a pile of every known feature at the time by design but it's hard to tell now.

What’s the downside of copying the good stuff? I think the examples aren’t confusing at all and are perfectly readable by anybody.

Honestly I’ll be deeply ashamed as a Python programmer writing cascading ifs when even C++ gets pattern matching. (Yeah, I know all the arguments against it.)

What is destroyed exactly by adding a feature to a language that could easily improve readability (which said language reaaaally needs), which you can also opt not to use if you don't like?

You can't opt not to use it if you have to work with someone else's code which does use it.

True, I should have been more precise.

What I meant is the language does not force this on you as The Right Way To Do Things™ and thus it mostly boils down to organizing with peers and agreeing on a consistent set of guidelines (which goes well beyond language features...)

Yes, it affects the ecosystem.

There's a very good answer to that question: simplicity, minimalism, and stability.

C++ is a huge, complex language. It's probably its greatest downside. Making it even more complex, should only be done with very good reason.

If you doubt this, consider the continued popularity of C, a thoroughly anaemic language by today's standards, with no clear advantages over C++ except for its simplicity, minimalism, and that the language changes very slowly. Well, that and its existing adoption levels. It's not quite the case that every feature C has is also in C++, but it's very close, and I can only name one exception: variable-length arrays (an unpopular addition to C) are not officially supported in C++.

C++'s complexity means that:

* It's very difficult to learn. It's extremely difficult to learn well. It's just about impossible to learn in its entirety. This isn't an exaggeration. (Andrei Alexandrescu might be the closest we have to someone who knows all of C++. He's a world famous C++ expert. Mortals don't stand a chance.)

* Different C++ programmers know (and write in) different subsets of the language. Good C programmers know essentially all of C. (I'll admit I'm weak on C's bitfields, and I couldn't tell you every subtlety of its memory model, but when it comes to C++, there may be areas of the language I've never even heard of.)

* C++ style guides (such as Google's one, or LLVM's one) are long and complex documents, by necessity

* We will never have a fully complete C++ compiler that truly matches the language spec. This undermines the spec; the language you're really using depends on your C++ compiler. To put that another way: portability is harmed because different C++ compilers cannot be relied upon to support the same language features.

* C++ compilers are more prone to arcane bugs, than C compilers

* Different C++ compilers have different arcane bugs, harming portability

* It's far easier to develop tools for C than for C++ (compilers, IDEs, etc)

* There are more C compilers out there than C++ compilers, especially for targets like PIC, or for very obscure platforms, or for particular needs such as safety-critical work. There's even a formally verified C compiler ('CompCert') with near-complete support for C99's features. I doubt there will ever be a formally verified C++ compiler.

* C is easier to mechanically reason about; there are more static-analysis tools for C than for C++

* If C++ were simpler it might have given rise to stable ABIs, the way C has. Instead, even different versions of the same C++ compiler might not be interoperable.

* It's less predictable regarding performance. Template metaprogramming can bloat your binaries for seemingly no reason. In C however, all features of the language map naturally to assembly; the programmer can generally predict roughly what assembly will be generated. This matters to those working with operating systems, graphics, high-performance programming, or where side-channel attacks are a security concern.

C++ is also faster-moving than C, meaning:

* It takes more work to maintain your skills for reading other people's code

* Code can age. Old code looks different from new code, unless it's actively maintained, which means work and risks new bugs. If the language rarely changes, this problem goes away.

* If you're developing a compiler, you'd rather a stable language like C, so that you don't have to make a career out of keeping up with the latest additions to the language. Keeping up with the additions to C++ is more than any one compiler-engineer could hope to do.

There are very few 'conservative' languages like C (there are also Scheme and Forth), but it can be a language's greatest strength. Zig is hoping to be another such language, but we'll have to see if it succeeds.

With all of that said, I tend to favour C++ over C, and I really like pattern-matching. It's something that cannot really be 'faked' with templates or macros. (Another personal favourite feature of mine, named arguments, is similar in that regard.) I think it's rather silly that the major OOP languages have until recently completely ignored this brilliant language feature from the functional programming world, as if it adds nothing over switch/case. Even D, an extremely feature-rich language, still lacks pattern matching.

> There are very few 'conservative' languages like C

Go is a simple language, no?

It's higher-level than C, but no historical baggage, so probable comes out to the same complexity to fully understand the language.

Interesting suggestion, unfortunately I don't know Go so I can't really respond.

You absolutely can add pattern matching to a language with powerful macros. One example that comes to mind is the optima library for cl.

I was referring to C's macros.

Oops, I meant C++'s macros. StackOverflow tells me there are subtle differences.

> all languages are just copying each other at this point. I know C++ hates to be left out of anything so maybe this is appropriate, but to me its really destroying the ecosystem. It was great to have different languages having different paradigms but now you can do everything in everything

I don't think you can do everything in everything. Try working with lazy immutable data structures in Rust, for one.

Or look at Dart - now has "non-nullable types," monadic error handling (sort of), but of course the whole thing is on shaky foundations so what's it worth?

> Try working with lazy immutable data structures in Rust.

I don’t think this is that hard; you “just” need to construct the data structures as elements of an object-graph data structure, and then retrieve your data through an explicit thunk of the object-graph itself.

In lazy languages the object-graph data structure is implicit, but that doesn’t mean that the Rust version of the call site code needs to be any more verbose. It’s just the definitions of the data structures themselves that would be more unwieldy. (And you could probably build some generic lazy container types and mostly work with those.)

Think: what Objective-C does with autorelease-pool objects.

Here’s what immutable data structures look like in Rust: https://docs.rs/im/14.1.0/im/

None are lazy, however.

> Try working with lazy immutable data structures in Rust, for one.

Why is that an issue?

Well, it doesn't have the runtime to support laziness properly.

Why would lazy data structures need a runtime? Some kind of GC is necessary for immutable data structures, but wouldn't iterators and generators be enough for implementing laziness?

> wouldn't iterators and generators be enough for implementing laziness?

If you can implement lazy finger trees with iterators I'd love to see it :)

Sure, it doesn't have much of a runtime at all but Rust does give a firm foundation to build on. Using an `Option` or the `once_cell` library is pretty ergonomic. At worse you use a function call to get a reference to the value instead of using it directly.

Agreed, Rust can certainly handle lazy evaluation. All the iterator types for example are lazy, hence the .collect() method which eagerly evaluates a lazy collection.

Hiding the fact that a lazy value needs to be mutated to initialise it is a little more work (since a nice API lets the user treat the value as if it doesn’t need to be mutated), but is certainly possible.

> All the iterator types for example are lazy,

It's not proper laziness as defined in Okasaki's thesis/book. One needs more complex things like lazy finger trees.

The iterator is lazy.

The language is not.

We design our languages which then design us. Variant of Churchill's quote about rebuilding their parliament chambers exactly as before.

Maybe think of language design as a search thru an n-dimension problem space.

There's the perennial trilemma of functional (LISP), imperative (APL), and object oriented (Simula), where your new language lands somewhere within that triangular design space.

Then add extra dimensions for type systems. Nominal vs structural vs whatever.

Then add some more for ideas swiped from declarative (SQL, VRML, LINQ), stack-based (Forth), REPLs (Logo), constraints (Prolog), and whatever else people cook up.

Ya, the churn is nutty making. But it's also awesome at the same time.

> each other

by each other you mean the ml world right ? :p

Functional programming seems to be taking on the world (J inspiring NumPy, ML features being used everywhere).

Functional languages are nowhere to be seen compared to the rest.

Some subset of features inspired in them are used in some languages, but that has nothing to do with functional programming taking over.

Honestly even if we're far from FP as mainstream (which I don't even wish). I woudln't have bet a dollar that so much features from lisp/ml would have crossed over this fast (I mean destruct, lambda, rest args, immutable let by default, option chaining ..).

> option chaining

Where'd that come from? I thought that was monadic-do from Haskell.

well I include haskell is the ml family (I admit it's twisting the words)

With Hooks, React has gone full FP, so, basically, the war is lost. Try finding a bootcamp programmer that wasn't taught React.

Sure, you can still build object components, but it's informally frowned upon.

How is React or hooks "full FP"?

I think we have wildy different definitions of FP.

Yes, haskellers will soon be celebrities.

They are trying to convince people to use C++ instead of Rust for new projects.

No one with old C++ code is going to switch to these future hipster standards, let alone even C++14.

I think this standard bloat is digging its own grave. It will have the opposite effect. People who want the new shiny stuff will use the new shiny thing (Rust). The old timers will stay on C++11 all the way to 2070 when a man or machine finally rewrites everything in $(new lang).

That is not my experience from real codebases at all. Generally, developers start using new features as soon as they become available in the standard (or minimum supported) compiler for the codebase.

I think it's great. No need to switch languages when you just need some feature here and there.

This is a codebase problem, not a language one.

If there's a good idea, why not use it?

Almost all languages are turning complete. The only questions is the syntax: convenience of writing it and how fast it runs on the machine (either/both compile time and run time)

When you have a syntax that you like for whatever reasons (including because you have a lot of legacy code written in it) it makes sense to extend the syntax where possible to make it more convenient if you can do so within the constraints of performance.

Love this proposal! C++ is getting better in leaps and bounds.

It's amazing what credible competition can do. If only C had the same kind of pressure.

Don't you even dare. C is near-perfect in the simplicity of its core. You want bells and whistles, you hook up a library.

Typesafe generics via a better void* would make me super happy. There are definitely other quality of life improvements that could be added or reworked that wouldn't affect the simplicity too much.

I'd say Rust is the improvements to C that I've always been wanting: better type safety, real generics, first-class closures, and OOP without inheritance, only using structs to structure data, leaving code execution to just functions. It's what I hoped Go would become. (Take with grain of salt, I'm just starting to learn Rust.)

I really don't like the monomorphization approach to generics (I think that's the concept?), where the function/struct essentially gets duplicated for each type. It seems to mess with linkage, increase binary sizes, and increase compile times.

Other than that, Rust does seem to be an improvement and less... stressful to program in.

You can always role dynamics dispatch yourself

Dynamic dispatch is in the language too.

    use core::fmt::Display;
    fn show_monomorphic<T: Display>(first: T, second: T) {
        println!("{} then {}", first, second);
    fn show_polymorphic(first: &dyn Display, second: &dyn Display) {
        println!("{} then {}", first, second);
    pub fn main() {
        show_monomorphic(17, 23);
        show_monomorphic("fnord", "slack");
        show_polymorphic(&42, &"quirkafleeg"); // mixed types!
There are things you can do with each that you can't do with the other, but they are often both viable choices.

If you're doing dynamic dispatch then there is still code that's living for each specialization. Sort of solves linking because symbols don't have to be generated and compile times because you hand write the code.

To be fair, I'm only considering basic data structures and algorithms where you can get away with something like foo(void *ptr, size_t size). Maybe generics is too broad a term for that.

C has many rough spots too, especially in organising large and complex programs. There is always scope for innovation and improvements.

>>> There is always scope for innovation and improvements.

but not in C because C has no scope

Could you refer me to a good/standards-compliant library for typesafe generic lists, hashmaps, pattern matching, closures, etc?

You are missing the main strength of C - the transparency and predictability of what machine code gets generated from the source code.

When you write "a + b", you won't end up with kilobytes of machine code just because someone in some header overloaded the + operator.

Ditto for template-style generics and overloaded functions. If you call a function and it doesn't exist, it won't get auto-generated for you on the fly. You get an error. You want a function, you define that function. That's not the flaw of the language. That's its strength.

Ditto for switch-case constructs.

Ditto for closures.

All of these constructs, should they be added to the language, will result in a compiler generating heaps of code from a simple code snippet. But this will no longer be C, because in C what you see is what you get.

> the transparency and predictability of what machine code gets generated from the source code.

This gets said a lot, but unless you're writing C code for a microcontroller (or a PDP11), that isn't even close to being true.

> When you write "a + b", you won't end up with kilobytes of machine code just because someone in some header overloaded the + operator.

You can't overload "+" for integers or floats. If you add two numbers together, you get what you expect, always. If they're not numbers, and somehow you expected `a + b` to work without an overloaded operator, then I don't know what to say.

> You want a function, you define that function.

Then you define it again for another type, then again, then again...

Then you write a macro and now your colleagues hate you.

> All of these constructs, should they be added to the language, will result in a compiler generating heaps of code from a simple code snippet

Not necessarily.

>> You can't overload "+" for integers or floats. If you add two numbers together, you get what you expect, always. If they're not numbers, and somehow you expected `a + b` to work without an overloaded operator, then I don't know what to say.

You can't overload the + operator in C. If you see a+b in code you know it is adding two numbers (or a compile error). In C++ it could be doing anything. In C you may question the size or type (fp vs int) of the variables, but not the operator. + does addition. Always.

> This gets said a lot,

No, this actually doesn't get said a lot, however the simplicity of C compilation semantics is one of its biggest strengths.

> ... but unless you're writing C code for a microcontroller (or a PDP11), that isn't even close to being true.

Do humor us with an example of a C code that compiles into something that is "not even close" to what you'd reasonably expect.

How is basic loop unrolling is "not even close" to the expected result when you explicitly ask for optimization with O3?

This isn’t basic loop unrolling; it’s replacing an entire loop with a closed form mathematical formula.

It's a form of loop unrolling. Regardless of the term, this is still very much in the ballpark of what you'd expect when asking for optimization.

I'm not missing anything, I'm responding to a specific claim--that libraries allow C to compensate for missing language features. This is patently false irrespective of the merits of the features in question.

That said, I think your entire criteria for determining whether a language or feature is good boils down to its intuitiveness to experienced C programmers, which seems like a particularly poor, subjective criteria and it probably doesn't actually even hold for C considering all of the security issues and undefined behaviors that are introduced by and continue to surprise experienced C programmers.

For my money, closures, pattern matching, and templates (not any implementation in particular) are easy enough to reason about and much better than the corresponding bugs they address, but to each his own.

> I think your entire criteria for determining whether a language or feature is good boils down to its intuitiveness to experienced C programmers ...

I see this tired argument over and over. It is especially thrown out when some esoteric language syntax is criticized.

In this case it seems to miss the authors point entirely. My understanding of his argument is that the C language maps reasonably well to machine code. Adding new features to the language could hamper that intuitive mapping between the language syntax and the actual machine code generated.

A valid criticism (which others have made) is that the actual mapping between C code and machine code on modern machines is far less intuitive than one might expect. Another valid response would be a demonstration that "closures, pattern matching, and templates" can be intuitively mapped directly to machine code.

I would be happy if I never see the "you are too blinded by your own language to understand" argument ever again. It borders on ad-hominem and extends no benefit of doubt to the original author.

>For my money, closures, pattern matching, and templates (not any implementation in particular) are easy enough to reason about and much better than the corresponding bugs they address

Then write C++ only using closures, pattern matching and templates outside of the C feature set.

Oh no, did my previous comments read as an exhaustive list of issues with C?

> If you call a function and it doesn't exist, it won't get auto-generated for you on the fly.

People want these features and the end up making them (badly) using Macros. Also, as to "a + b", it's either trivial enough to be inlined immediately by the compiler or obvious enough that it's your fault.

> If you call a function and it doesn't exist, it won't get auto-generated for you on the fly

As opposed to you writing the function manually every time and then it guaranteeing your binary gets cluttered?

C is also not portable assembler as some people claim, modern compilers are way to complicated for it to be that simple (unless you're working on microcontrollers)

If you call function and it doesnt exist you get a warning "implicit declaration of function". Unless you use additional compiler flags.

Not since C99.

But even regardless of that you still get a linking error.

No, you write macros (duck and hide).

> If only C had the same kind of pressure.

ATS :p

Who is using ATS in production?

When I learned it I wondered why would somebody use it instead of Rust or Idris.

I really like the proposal for matching values (integral, strings, tuples). It's very lispy. Explicit representation (intrinsics?) of a map used as an expression.

Though I would prefer the keyword 'choose' over 'inspect'.


I have noob questions about pattern matching on (polymorphic) types. Please humor:

It's syntactic sugar for std::is_base_of<> (Java's instanceof), right?

This is a consequence of C++ (and Java's) nominal type system, right?

So is pattern matching on types necessary (useful) for structural type systems?

I ask because a friend has been advocating structural typing. He's more of a language theorist whereas I'm more of a language mechanic (I don't even rate as a language engineer). While I'm totally on board with whatever he says, I just hope to better grasp the changes.

compare this proposal to the pattern matching you get from Haskell or OCaml and weep.

for example (OCaml):

    let imply = function 
       | (true,false) -> false
       |   _          -> true

Pattern matching across multiple function heads (ML, Erlang) is incredibly liberating. I hear Haskell has it but it's not idiomatic, and I think Elm doesn't support it at all. Frustrating.

I've found half-hearted pattern matching solutions like Scala quite disappointing.

OCaml, F#, ReasonML don't support that, I think it's unique to SML in the ML family.

Interesting, thanks. SML is my only direct exposure to the family, and I recall a conference speaker mentioning that ML (perhaps SML) programmers weren't fond of the feature.

Can you give an example of the pattern matching in ML and Erlang that is not idiomatic in Haskell?

It's the multiple function heads which (again, I've been informed, no direct knowledge) is discouraged.

e.g. https://github.com/macintux/advent-of-code-2019/blob/master/...

You mean a piecewise definition of a function? I don't think this is unidiomatic or discouraged in Haskell at all.

There's some indirect discussion here that seems to contradict that, but I certainly could be wrong.


That's in the context of Elm, Evan gives his reasons but in the context of Haskell such multi-line equations are very common and normal Haskell style.

Translating that into Haskell, I get:

  check_each :: Int -> [Int] -> Bool -> Int -> (Bool,Bool,Int)
  check_each x (y:_) _ last | y==x && x==last = (True,False,x)
  check_each x (y:_) _ last | y>x && x==last = (True,True,x)
  check_each x (y:_) found _ | y>x = (True,found,x)
  check_each x _ found _ = (False,found,x)
  is_valid :: (Bool,Bool,Int) -> [Int] -> Bool
  is_valid (False,_,_) _ = False
  is_valid (True,True,-1) [_] = False
  is_valid (True,True,_) [_] = True
  is_valid (True,found,shortDup) (h:t) = is_valid (check_each h t found shortDup) t
which is ugly, certainly, but compiles fine (ghc). The uglyness mainly comes from the data structures, and I don't know enough about the domain to suggest better ones (there might not be better ones).

Well, it was code I threw together quickly for Advent of Code, so try not to be too hard on it.

Eh, like I said, that might be the best you can do for a messy problem domain. I'm still not sure what you meant by multiple function heads though.

No amount of new features will ever vindicate C++. The language is so complex and so flawed on every level that it can only be deprecated and migrated away from. The only way to save developer time and woes is not to produce any new lines of C++.

Move away from to what? People use C++ because they want templates. There's a very small number of languages with templates.

it would be interesting to see a fork of C++ that only removed features every three years

anything in here about exhaustiveness checking? In my mind that's half the value of pattern matching.

From the Exhaustiveness and Usefulness section:

"inspect can be declared [[strict]] for implementation-defined exhaustiveness and usefulness checking."

"having a __: case makes any inspect statement exhaustive."

"any case that comes after a __: case would be useless"

Why wouldn't they make it [[strict]] by default?

Because it's C++, and the defaults must suck (std::vector::at anyone?)

Why call it inspect instead of match?

Inspect breaks less existing code.

They ought to use "_Inspect".

Because of regex.

The regex header doesn't seem to define anything called "match". It is probably a common variable name though. To me "inspect" implies the wrong action, so it would be nice if there was a better keyword.

This seems so much more clear to read, with less typing.

I would wish for it to work with concepts and state. Then you could write a simple production system in the spirit of Prolog, Erlang, et al. The multi-argument generics is partway down that path, so why not go all the way.

More syntax, more semantics - because that's exactly what C++ needed. At this point I think they may be competing in the most complex languages championship. When are the next Olympic Games?

And it's not like we could just abstain from using new parts of the language. For every feature there is going to be a person who will be excited to use it in an open source project which your company depends on, and sooner or later you're going to have to pay for every inch of complexity introduced to the language. But the people in charge seem to think that there is no cost to it.

To my understanding, this capability set is roughly the same as what C# 8.0 support and what Java also targets. Just with a different keyword.

Does the D language have pattern matching?


But it can be done as a library. Granted, it's not the same as 1st class support, but it's pretty good:


Does this proposal have any support?

At first it seems nice, but is it realistic to add so much new syntax to the language?

Structured bindings need to get rid of the mandatory dummy variables in case of the unused parts of the structure. So something has to be added.

Something has to be added but it could just be an empty comma, which IIRC parallels initialization: use `auto& [a, , c]` to match the first and third elements of a tuple.

How does this work compared to ML's signatures?

Their example #4.3 won't always produce the same result as the "if" version, because x==x and y==y could return false due to operator overloading.

There is no x==x in example 4.3. This is like structured binding where x binds to the first element of the tuple.

If you overload the == operator such that x==x is false then I call UB. Reading and understanding such a codebase would be a nightmare.

NaN == NaN is always false (at least in other languages I know).

This will be the case in any language that confirms to IEEE 754 (so, hopefully, all of them (because the only thing more confusing than properly implementing IEEE 754 is improperly implementing your own bespoke subset)). This property of NaN is required by the spec (and is, for example, why Rust's default HashMap refuses to accept floating point numbers as keys).

I can confirm that this is true in C++ (or at least the common implementations), as required by IEEE. In particular, this means that sorting a container including NaNs is undefined behavior (since typical sorting algorithms rely on the property that for any x and y, exactly one of x < y, y < x, or x = y is true).

Is it actually UB or just stupid? I've never tried to dig in and understand C++ at the standards level, so I don't know, but you don't get to just "call" UB; it has to be UB by the standard.

By itself it is just stupid. Standard defines "EqualityComparable" property and which parts of STL require it to be satisfied. There even some places where it notes that EqualityComparable property isn't required. It is much easier to get UB by having bad < operator.

"It must be our great task, gentlemen, to keep the monkeys away from the typewriters."

-- H. L. Mencken

I know this is different than the regex library but just thought I'd mention...

Declaring regex's like javaScript with be cool.

    var myre = /^hello\w*/;
    std::regex myre("^hello\\\w*");

You can add a literal operator:

    auto operator "" _r(char const* str, unsigned long) {
      return std::regex{str};
To avoid all the backslashes, you can use a raw string literal:

    int main() {
      auto myre = R"(^hello\w*)"_r;

Use raw strings.

    std::regex myre(R"(^hello\w*)");

I don't understand why some people don't like C++.

It almost seems like they forget what it means to compile to machine code.

I can say I have some sort emotional attachment to python, but I can't deny that C++ is just a more capable language, but at least I can admit that I'm just not always competent to always write C++.

Like most people say "just use the parts of C++ you need".

Thankfully C++ is far from the only language that compiles to native.

Many others don't even have the same insane "features".

To clairify (because usually noone ever does and the conversation goes around in circles) "compile to machine code" (in the conext of language design, rather than implementation details) means, idiomatically, to compile specific parts of the program to specific and (allegedly) predictible pieces of machine code.

This is standard in C (and allegedly C++ and rust), awkward but somewhat doable in Java, C#, and some of the weaker LISPs, damn near impossible in Haskell, Javascript, etc, and completely impossible in "real" LISPs (with first-class macros and/or fexprs).

It's obviously always possible to "compile" a (interpreter,somewhat-preprocessed-program) tuple for any language (give or take turing-hard/easy-ness) and you can usually inline/specialize/optimise the interpreter a lot for one particular program (Haskell/ghc is a good example of this), but that's not what people mean when they talk about designing a language to compile to machine code.

There are not that many languages with good support that compile to native.

Java (yes it does, since around 2000), .NET (NGEN since v1.0, Mono AOT, IL2CPP, .NET Native), D, Eiffel, FreePascal, Delphi, RemPascal, Swift, OCaml, Haskell, Lisp, Ada, Scheme, Basic, Go, Fortran.

Just a couple of possible languages, there are many more if we take out the good support constraint.

The majority of those are garbage collected and the one's which aren't are obscure.

So what? They also compile to native code

Garbage collected languages are not really native languages.

I don't see good enough alternatives to C++, which have enough support, not a steep learning curve, good backward compatibility with existing ecosystems, not owned by a single company, etc.

Depending on your definition of "enough support", Rust might be what you want.

> not a steep learning curve

well, it's steeper than most languages, but less so than C++ IMO, especially if you're already familiar with C++ and therefore understand the problems Rust is intended to solve.

If you already know C, C++ is easier. You can still pick what you like in C++.

Still ,there aren't that many languages that compile to native.

Yeah the list is indeed quite small.


When you cross that list with language usage, support etc, you don't have a lot left.

My point is that maybe most of those languages are not suitable, and that nobody really like those languages. Just cross all the arguments I gave about those languages, how they score, etc: C/C++ are the most used language that compiles to native and it gets criticized.

"There are languages people complain about, and languages people nobody uses" Meaning we only complain about language we use.

Says who? They also compile to native code just like C++.

If having a GC makes a language not native, then C++ isn't a native language as well.

ISO C++11 introduced the GC API, C++/CLI and C++/CX dialects use a GC, Unreal C++ has a GC, regular C++ with Boehm GC is another option, C++ Builder has also a GC.

ISO C++ is basically driven by Microsoft, Google, Apple, IBM, given the amount of employees that go to ISO meetings and submit papers.

While it isn't a single corporation, it isn't a community driven process either.

Who can put money on ISO get the features.

I love C++, just would rip off the copy-paste compatibility with C89 if a genie would give me a wish about what to take out from the language.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact