Hacker News new | past | comments | ask | show | jobs | submit login
How Should Compilers Explain Problems to Developers? (2018) [pdf] (barik.net)
52 points by zdw on March 27, 2023 | hide | past | favorite | 41 comments



I thought this paper was very interesting on the topic: "Mind Your Language: On Novices' Interactions with Error Messages".

One of the things that comes up, is that if the error message suggests a change, one of the risks is that such an edit can make the error go away, but the code will not be correct. By making a concrete suggestion, it tempts the programmer to just apply it without thinking.

This is really interesting to me because it makes sense, and goes against many common ideas about how to make good error messages.


> if the error message suggests a change, one of the risks is that such an edit can make the error go away, but the code will not be correct. [...]

"Error messages: diagnostic is preferable to prescriptive" [1] is a riff on a similar theme, though the author of that blog post is focusing on error messages for Microsoft's C# compiler, which are more likely to be intended for experienced professional developers, so he is coming at the problem from a different perspective than the student-focused one of the paper you mentioned.

[1] https://web.archive.org/web/20060720004516/http://blogs.msdn...


Today I was again amazed by the simplistic-ness of the description of a linking failure by a c++ compiler/linker. It was complaining about an undefined reference without stating where said reference resided. Compiler makers must hate us developers or think we are intelligent enough to have enough with just half a word.


In general, the linker is written by people different to the people who write the compiler.

I don't hate you I just don't control the whole stack.

Also linkers can tell you where the symbol was referenced if that place has a location in the code. This isn't universal but it's practically always been there for me.


Thanks for not hating me :p

What i also find hard to understand is that generally linkers are "single pass" i.e. you need to feed them objects and libs in the correct order (callers prior to callees) for them to solve all links. Not sure if this requirement still is always there.


That requirement vhas been gone for decades if it ever existed, though static library semantics look a bit like that if you don't pass extra flags around them.

Linkers are multiple pass if doing compiler optimisation things. Otherwise they're pretty close to cat + patch some pieces, single pass seems fair.

It's not immediately obvious to me why linkers don't say which functions were making the undefined symbol call. A suspicion is that the linker may not know what the function name is for the corresponding instruction without digging it out of the debug info and the linker probably doesn't have the corresponding parser ready to hand. It probably could narrow it down to a source object file, though libraries make that difficult.


I've written linkers. This one pass thing was gone before the 1980s.


I've seen it being a problem maybe 10 years ago but i'm happy if it is fully a thing of the past.


>Compiler makers must hate us developers

I think that C or C++ were just written by people who never gave a single damn about UX until competition appeared and then things started improving,


C was created by people that didn't have any extra resource to spend on UX. Just another day there was an article here about emulating the computer C was written for in a RP2040 (the Raspbery Pi "competitor" for the Arduino), and yes, it has plenty of space for that.

C++ had more of a choice about it.


> C was created by people that didn't have any extra resource to spend on UX

This is irrelevant. The design of a language has little bearing on the quality of the error messages from any particular implementation. In particular, the constraints on C implementations in the '70s is irrelevant to constraints on implementations in 2023. There's absolutely nothing preventing the GCC/LLVM/ICC/MSVC developers from adding better error messages to their compilers, and the design of the language certainly isn't one.


> The design of a language has little bearing on the quality of the error messages

Actually, it has a lot to do with it.

https://www.digitalmars.com/articles/b47.html


D putting some container types in the language instead of the library is really interesting in this respect. The usual C++ experience of getting compiler failures in some iterator nested within unordered_map is thus avoidable.


Currently writing a compiler that I want to be production strength, so I read your article. Thank you!

It's interesting that your specifically call out that putting source lines in is not good when most modern compilers seem to be moving the other direction and putting them in.

Do you think it could be a preference thing? Would it be good to have the option for the user to turn it on?

Also, thanks for the tip on the spell checker. That is something I'll add. You said they your just used the current symbol table as a dictionary in another comment; that is a great idea too.


The only point that that article makes that supports this point appears to be "semicolons in the syntax of a language can improve error messages". I don't think that that generalizes, especially because the topic of this comment thread is about C, which already has semicolons, and compilers explaining semantic problems to developers, as opposed to syntactic problems.


>C was created by people that didn't have any extra resource to spend on UX

How many years have passed since that time?


Oh, enough to improve things several times (to answer, 5 decades). People did some improvements, but clearly didn't keep-up with the times (and degraded some things too, UB being the elephant in the room).

But large things almost never keep-up with the times.


Something else must be playing here and I'm pretty sure most of the annoyences did not get solved, mostly because of backward compatability guarantees. It is really a pitty that c/c++ did not get pushed away by competing languages that do/did not have so much of a legacy of sub-par development tooling. Friendlier and better designed languages like C#, Rust or even Python still seem to have some disadvantages that prevented that from happening.


C++ essentially killed C. Counterexamples exist but they're hard to find and sometimes involve there being no C++ compiler on the target.

D didn't replace C++. I wonder if that was related to the single proprietary implementation. The language seems more sensible to me.

Rust is aimed at replacing C and C++. There are others in the same field in various stages of development. Zig and carbon come to mind.

There's an uphill battle where lots of existing code is in C++ and people fear changing it. C was partly dismantled by C++ compilers promising better error diagnostics and being able to compile most C with minimal patching.

I personally think the end is in sight for C++ but the consensus seems to be it'll live forever.


D didn't bring a lot to replace C, only C++. Compared to C it's a lot more of high level stuff, for niches where the language is dead for decades anyway, and not much more for the things C was still good at.

But Rust brings you OOP-like structures without the C++ hidden machinery, generics (the two main reason people adopted C++), and a lot of code verification. And the language isn't much more complex than C. That combination is very enticing.

I do believe both languages are about to die (C and C++). It's just that when languages die, nobody notices for a decade or so. Even develovers keep getting more numerous for some time, and have their numbers slowly reduce through decades.


> I personally think the end is in sight for C++ but the consensus seems to be it'll live forever.

As the office Rust shill, I'm actually pretty excited about a lot of Herb Sutter's work [1]; I think there's a reasonable argument to be made that a hypothetical C++29 will be as ergonomic as unsafe Rust, with the not inconsiderable benefit of being able to speak the C++ ABI. I don't think large projects like LLVM or Unreal would dream of rewriting to Rust in that time, so people will still be writing it in some form or another.

[1]: My top 3 would probably be cppfront's argument passing semantics, metaclasses, and deferred_heap, in that order. The latter two, IMO, are improvements on what Rust provides in the area.


D is 100% Boost licensed.


These days yes. Has LLVM and GCC front ends too, very easy to get going with it.

D wasn't Boost licensed when I looked at it as a replacement for C++ about a decade ago. Wiki thinks the backend has been available since 2017. It looks like the source may have been available for longer than that but I missed the distinction between source available and open source if it was. That's how I ended up on the C++ path instead of the D one anyway, seems a credible risk others have the same experience.


The front end was open sourced around 2005 or so, and it was also paired with gcc to have a fully open sourced Gnu D gdc compiler. There's also the fully open sourced ldc compile based on LLVM.

We couldn't get the dmd backend open sourced until 2017, but everything else was open sourced.


(GC)C isn't that old, and in fact was part of a generation of tools that shunned ye olde' "line too long" type errors


I discovered back in the early 80's that it was easier to allocate bigger buffers than to have "line too long" messages and recovery.


The abysmal compile time for these languages may also have encouraged this fairly bare-bones UX philosophy.


>fairly bare-bones UX philosophy.

What do you mean?


I like what I see in the comparison table between Jikes and OpenJDK. Elm does a lot of this in its error messages, including hinting at possible typos in field names. What this lacks is a suggested next action for the developer.


explaining the compiler error messages should really be the first application of LLMs like GPT-4. in fact you can take all the knowledge crap out and just train it on programming info from gcc source code/tests & StackOverflow/GitHub. thereafter have it explain what went wrong. This would be especially useful if you are dealing with template instances and the user explains in a few sentances what they are trying to achieve.

I have multi-decade experience in coding C/C++ but its shameful how much time still gets wasted on these stupid compiler errors.


Or just fix the error message and add a hyperlink to an explanation...

You don't need an AI you just need the right abstractions in the code.

The AI could help you learn if you really don't understand but for just telling you what went wrong you don't need it. In fact the compiler can already fix a lot of these things for you if they bothered to implement it (see fixit in clang)


Exactly. Compiler developers wrote the logic for the errors you see, they (mostly) have the knowledge to explain the error, why do we need a language model trained on all this derived information, when it could be provided upfront together with context from your own source code.


Having a link to a detailed explanation would be amazing. Trying to even find the error message in a sea of c++ template notes is challenging...Then trying to actually parse it while examining the types is...painful.


I think RQ3 is interesting. It is the only proposition in the paper that references crowd-sourcing. I find that interesting because that immediately lends itself to ... god i hate myself ... ChatGPT. We've been asking GPT to write code, but has anyone asked it to debug code?


Interesting that this came up; I was reading https://arxiv.org/abs/2301.02308, on Rust's usability, earlier today. One of the interesting conclusions from that paper was that while Rust's error messages are generally high-quality, they can sometimes be too local, especially for helping get new Rust programmers used to the ownership model. Key quote:

"rustc error messages not only describe the error in the code, but for certain error patterns, suggest edits that may fix the problem. However, these edits are always local and don’t provide any high-level design feedback which may be helpful in making the mindshift."


I think Rust needs to show me a decision tree for why something on the lefthand side of a function call ends up being whatever vscode + lsp shows me. Because the compiler and rust-analyzer are going through various traits and auto-traits to ultimately land on one decision but if I want to understand how Rust works I need to know how it arrives at these conclusions. I don't want to have to jump deep down into language constructs and read the docs just to find out several levels deep that a trait was implemented by some other trait that then had an auto trait impl and that's why it ends up requiring me to write &. Rust being complex is fine but only if they go far beyond what has been standard tooling.


The most useful advance in error messages is that advent of spell checkers for undefined symbols. The one I wrote used the existing symbol table as the dictionary. It works so well (and is so simple) it evokes the "why didn't I do that decades ago!" thought.


> it evokes the "why didn't I do that decades ago!" thought.

In my experience, that thought is a bright neon sign telling you that you have a potentially killer product on your hands.


Languages do vary. PHP has some oddities, but its error messages mostly make good sense. It's much easier to debug than, for example, javascript.


I suppose the oddities you mention include T_PAAMAYIM_NEKUDOTAYIM


Having good error messages is one of the important priorities for me in my compiler, so I made the commitment early and am using codespan_reporting[0] to report the errors, which is going well so far.

0: https://github.com/brendanzab/codespan




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: