The D language compiler uses a technique I call "poisoning" which has greatly reduced cascading error messages. The idea is whenever an error is found in an AST node, the AST node is replaced with an "error" node. Any combination of an error node with another node is replaced with an error node. Error messages for error nodes are suppressed.
It works far better than attempting to repair the AST into some plausible state.
It's analogous to the propagation of NaN values in floating point code.
I've got a compiler for a hobby language that uses this technique too (I probably got it from you, unless I heard it from someone else). It's really really nice. Super easy to implement and really does cut down on cascaded errors.
I also use it during type checking. There is a special "error" type. If an expression produces a type error, the yielded type of that expression becomes "error". Surrounding expressions that consume that type will see the error type and suppress any other type errors they might otherwise produce. That way, you only see the original type error.
> Surrounding expressions that consume that type will see the error type and suppress any other type errors they might otherwise produce.
We added a slightly-cursed version of this to clang. The goal was: include more broken code in the AST instead of dropping it on the floor, without adding noisy error cascades.
The problem is, adding a special case to all "surrounding expressions that consume that type" is literally thousands of places. It's often unclear exactly what to do, because "consume" means so many things in C++ (think overload resolution and argument-dependent lookup) and because certain type errors are used in metaprogramming (thanks, SFINAE). So this would cost a lot of complexity, and it's too late to redesign clang around it.
But C++ already has a mechanism to suppress typechecking! Inside a template, most analysis of code that depends on a template parameter is deferred until instantiation. The implementation of this is hugely complicated and expensive to maintain, but that cost is sunk. So we piggy-backed on this mechanism: clang's error type is `<dependent type>`. The type of this expression depends on how the programmer fixes their error :-)
And that's the story of how C gained dependent types (https://godbolt.org/z/szGdeGhrr), because why should C++ have all the fun?
(This leaves out a bunch of nuance, of course the truth is always more complicated)
In D, all semantic analysis of a template waits until instantiation time. This is because D is designed so that the syntax parsing does not need a symbol table.
In C++, I solved this problem by simply matching { } in the template body, and accumulating a list of tokens within the { }. Then, when instantiated, the template parameter values were known, and the template syntax could then be semantically analyzed. It was simple and effective.
But I was informed that C++ required the syntax parsing and semantics for non-dependent types without instantiation. I asked why, and the answer was "to check for errors without needing to instantiate it." I responded with "of what use is checking it if it is never used or tested?" And that was the end of that.
> The implementation of this is hugely complicated and expensive to maintain
I quietly revolted and refused to implement that disaster. AFAIK there was never a problem with deferring parsing/semantic until instantiation.
Yes (at least approximately, I'm fuzzy on the details).
These days it supports both. (IIRC the default is legacy/nonstandard, you select the standard behavior with /fpermission-, and VS adds /fpermission- to newly generated projects)
Yeah, that trade-off makes a lot of sense. It's the logical conclusion of the "templates are textual" model. But C++ loves to have its cake and eat it regardless of the complexity, so it's non-conforming.
(I expect it's possible to construct cases where this difference is observable)
I think checking templates in isolation has value. We use statically typed languages in part to make more error classes locally-verifiable. But bolting that into a mostly-textual system is a mess.
(Checking templates in isolation is particularly valuable in IDEs, which tend to share logic with compiler frontends. IDEs only need that much power to do a passable job because the language is so complex, so I don't know which way this argument points)
I think this is very commonly used and possibly independently discovered a lot of times---I had done so for example---because it is a very logical conclusion. Clang has `RecoveryExpr` [1] for this purpose (I couldn't find a GCC equivalent).
`getFoo()` is missing an argument, but you still want to complete Foo's members. If you type `getFoo().getBar()` you want go-to-definition to work on `getBar`.
In clang, we use heuristics to preserve the type in high-confidence cases. (here overload resolution for `getFoo` failed, but there was only one candidate function). This means you can get cascading errors in those cases (but often that's a good thing, especially in an IDE - tradeoffs).
I like to increment the most significant digit of my terminal history size every time I run into a compiler error that exceeds the previous value. On Saturday, it went up from 40k to 50k.
I tutored a C++ beginners course on Linux a few times. I have had people ask me for help because the error messages were so long, their terminal history wasn't large enough to scroll back to the first error. Even if you have basic bash skills, just piping the compiler output to less or head also doesn't work, because the errors go to stderr. It's an incredible source of frustration for beginners.
I know about that mechanism but I use it so infrequently that I have to look it up every time; the syntax doesn't really match anything else I know. It's also hard to look up because even the right search terms fetch a lot of nearby-concept results so I have to dig for the exact idea.
There used to be a search engine called SymbolHound whose entire shtick was to not ignore special characters in the query. Unfortunately it looks like it's gone now, seemingly without so much as a farewell. Their posts just stopped.
Yes, but if someone only just learned about less and piping, they probably won't know about fd 2 being stderr, fd based redirection, and hence don't understand it at all
For my beginner programming class, I have them use a script `compile` which handles compilation and linking, while also enabling enabling all the warnings/errors I think are helpful, and also doing the above redirection. An absolute beginner doesn't gain anything from invoking `g++` directly.
> I tutored a C++ beginners course on Linux a few times. I have had people ask me for help because the error messages were so long, their terminal history wasn't large enough to scroll back to the first error.
A non-problem if you use any IDE which has been able to group error under foldable menus (such things have existed on Linux for longer than some of the people you are teaching may have been on this planet)
If only there was some sort of invention that could be used to replace a virtual teletypewriter that could somehow parse the error messages and put them into a convenient list that could be navigated using simple visual paradigms instead of bits and pieces of a shell scripting engine… /s
You must have a much more advanced relationship to errors than me. I don’t think I could tell the difference between 100k or 40k lines worth of errors, and it was only halfway through writing this message that I realized you might actually be talking about the size of an error? Oh no.
I suspect you'd notice simply because scrolling to the top of your history isn't sufficient anymore to find the first error. (Assuming one clears their shell history before compilation, which is very nearly required in this context...)
Some of the category descriptions are great, eg for "Most lifelike":
Suppose you are given a task of adding some new functionality to an existing code base. You have been told that the guy who wrote it was “really smart” and that his code is of “enterprise quality”. You check out the code and open a random file in an editor. It appears on the screen. After just one microsecond of looking at the code you have lost your will to live and want nothing more than to beat your head against the table until you lose consciousness.
Creative Computing magazine in the 1970's had this kind of competition: most errors out of the smallest program, in any language.
Back that day, it was possible because parsers tried to repair bad input to try to keep going, in hopes of diagnosing as many real errors as possible, so as to reduce the number of iterations. Iterations used to be expensive: punching corrections onto cards, etc.
If the parser repairs bad syntax, it can cause more errors. If a repair involves insertion, there is a risk of getting into an infinite loop of diagnostics, even.
It's kind of anachronistic to have people playing this with C++.
Why is this 2014? Did that error explosion competition die out?
Seems that it died out. There is a note from 2015: "However, we received only a very small number of entries. Because of this we have decided to cancel the competition. Due to this lack of interest we suspect that the competition will not be run again next year."
https://tgceec.tumblr.com/post/110094856488/results-of-tgcee...
I don't know what compiler & flags they ran it on, but with gcc 12's preprocessor it looks much more than that: include files nest to a level of 200 before an error by default. with `-fmax-include-depth=10`, about 200kB of error is printed, and each subsequent level doubles that, so you'd be in for about 3e62 bytes of error output by my calculation...
clang 14 terminates preprocessing as soon as any header exceeds maximum nesting, so only ~6kB of errors are produced by it.
Generally speaking, clang produces much clearer C++ compile errors than gcc. I can't give any specific examples off the top of my head, but I've seen GCC emit hundreds of lines of inscrutable errors where clang spits out one line that tells you exactly what is wrong.
Clang++ has about 6.5 times as many characters in the error message as g++ here.
Though looking at the actual error messages, that seems to be down to a deeper default maximum instantiation depth on clang++ (1024 there vs 900 in g++).
GCC also has (had?) an error limit. But because C++ compilers like to show you actual backtraces of how template instantiations led to the error condition, each single error message can easily have dozens of lines of context. That's how I got GCC to output 1MB of text for a single typo and the following parser confusion.
I've mostly become adept at traversing six mile long compile error crawls to find the bit in my code causing the STL/Thrust complaint. clang is more legible for this than gcc. Still, I keep learning new patterns.
It might be interesting to see how much expansion could be obtained with Rust. The diagnostics can be verbose, and perhaps there's some kind of self-reference that could produce a similar blowup, though I'm not familiar enough with the corners to say.
Did you mean the page not showing the generated error messages? The actual error messages would be many gigabytes worth of text, and aren't as interesting as the entries that produced those error messages.
We can come up with compression schemes that do even better than Deflate. For example, here's the compressed version of the output of the first program, in just a handful of bytes:
I tried the first entry 'Biggest error, category Anything' and was disappointed when the error message was really short. But I had made the 'mistake' of using clang++, with g++ I get the crazy long error as promised.
The difference between an experienced C++ programmer and a C++ novice is that the novice regularly believes that his code is broken beyond repair for eternity and has accidentally destroyed the compiler and most of the hard drive, while the former just yawns at 5,000 lines of error messages and adds the missing semicolon.
It works far better than attempting to repair the AST into some plausible state.
It's analogous to the propagation of NaN values in floating point code.