Hacker News new | past | comments | ask | show | jobs | submit login
What's in a good error message? (morling.dev)
112 points by kiyanwang on Feb 6, 2022 | hide | past | favorite | 78 comments



Some anecdotes where error messages truly delighted me:

When I accidentally misresolved a conflict after a git pull, the C# compiler, upon encountering the >>>>> markers in the source code, noted that there’s an unresolved merge conflict in that line instead of complaining about bitshifts.

When I tried to bitwise invert a value in Rust with “~x”, which is wrong, instead of complaining about an unknown character, the compiler helpfully explained that the operator to use in Rust is “!x”.

When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.


Apple’s MPW C compiler had some funny error messages, sometimes also useful. Examples:

- This label is the target of a goto from outside of the block containing this label AND this block has an automatic variable with an initializer AND your window wasn't wide enough to read this whole error message

- String literal too long (I let you have 512 characters, that's 3 more than ANSI said I should)

- Too many errors on one line (make fewer)

Whenever I read these again (e.g. on https://www.cs.cmu.edu/~jasonh/personal/humor/compile.html), I wonder what made ANSI pick 509 as the limit for the length of string literals.


> - Too many errors on one line (make fewer)

Thank god it wasn't a C++ compiler...

I have a love/hate relationship with C++, but some of those error messages are just awful. When it comes to template errors in C++, I almost never try to understand any of the gibberish and just look for the file and line where the error was originally reported. Which is a task in itself, since most of the time the file you actually care about is neither at the bottom nor at the top of the pile of files.

On the other hand, error messages other than template-related errors have improved in C++ lately.


“Unexpected friend” was one of my favorites in C++. It seemed fortuitous; it’s good to have friends, right? (Usually the actual error was a missing semicolon.)


The best advice I've gotten on template error messages is that it's either at the top of the output or the bottom...then you've just gotta hope that your terminal has enough of a buffer to see the top[0] :)

[0] yes i know terminals support quasi-infinite scrollback by writing to a file, or i could redirect. but that makes for much less amusing anecdote.


Sometimes I gaze into the stars and wonder how many programming careers were ruined by nebulous "stack overflow", "null pointer exception" and "segmentation fault: core dumped" messages.


Talking about StackOverflow, shouldn’t all messages have a permalink to their StackOverflow unique identifier where people will discuss them ;)


A really evil programming language would give you an error code and a link to create a question on Stack Overflow asking what the hell it means.


Visual Studio already does almost this, if you hover over a Intellisense or Build error it will give you a link to search for it on Bing. Which is extremely helpful when the error is "Linker error".

https://docs.microsoft.com/en-us/visualstudio/ide/find-and-f...


And the all time favourite: "This programm has violated system integrity and must be terminated"


Bus error. (i miss those pizza boxes)


> I wonder what made ANSI pick 509 as the limit for the length of string literals.

Just a guess: In a 512 byte block, a 509 bytes long string literal would leave room for a NUL terminator and a 16-bit length field (prefix).


Well, you have to define a buffer size for the lexer, for both avoiding dynamic allocations and avoid bowing your memory. A 512 bytes buffer can hold 509 ASCII characters, two quotes and a zero terminator.

I don't know why they did it, but the size makes a lot of sense.


But why would the lexer include the delimiting quotes in the token for a string literal?


You probably wouldn't. But a standard should allow the most possible kinds of usage, and there are some situation where it makes sense (like on transpilers).


> When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.

This was something that one of those "Zen of Ruby" style articles brought up as a negative and although I'm not a Ruby person, I tend to agree. exit is not a keyword, but it is reserved so that you can get this message. Nobody can (or at least should) name a symbol exit, and alternative Python interpreters just turn "exit" into exit(). So why can't the normal interpreter do what I meant instead of being pedantic? It is generally user friendly in other cases (the new error messages in 3.10, for instance).


> exit is not a keyword, but it is reserved so that you can get this message.

`exit` is not reserved. It's a callable implemented in pure Python [1] injected into the local scope when in interactive mode. There's no special functionality associated with it—The error message is implemented using the standard `__repr__` behavior. You are free to override the value of `exit` if you so choose.

"Special cases aren't special enough to break the rules."

[1] https://github.com/python/cpython/blob/06b8f1615b09099fae5c5...


Learn something new every day. I hadn't thought to check, just figured it was reserved since iPython used it fine.

Thanks for the correction.


The context is good error messages; what you’re saying is that DWYM (do what you mean) is even better than a good error message. That’s sometimes true and can help beginners. But it also means more special cases and magic, which can make a language hard to learn in a different way. Python and Ruby are almost at the opposite ends of that trade-off, so it shouldn’t be surprising that a Ruby person doesn’t like Python’s solution.

Also, and this is really off-topic, I also completely disagree that the names of built-in functions are off limits in a program, or should effectively be keywords. I'm not saying it's particularly wise to name a variable "print" or something. But should I learn by heart the names of all built in functions before naming any variable? Should I refactor my program whenever a new one is added? Resolving those kinds of conflicts is what scoping is for, and it works.


> The context is good error messages; what you’re saying is that DWYM (do what you mean) is even better than a good error message.

I'm not sure that's true. "Just do what I mean" is an understandable response when you get an error message that hints that the program knew exactly what you meant. It feels like a human responding to "Can I have some water?" with "I don't know CAN you?", but it ignores the fact that the program doesn't really know what the user meant. Sure whoever authored the Python interpreter guessed that 99% of the time if you type `exit` you want to leave, but if they just assume that, then what happens if I'm actually trying to see what's in a variable `ext` (short for extension) and my fingers just auto completed `exit` because it's something I type so frequently (this kind of typo happens to me all the time). Suddenly the interpreter is dead and I have to start over on whatever I was doing.

I'd actually rather the program gives me instruction on how to what it thinks I want to do, and let me decide if it's really what I mean.


Because the built-in console doesn't support any special keywords or syntax. Other consoles already support special keywords and syntax, so it's no problem to interpret "exit" as one of those.


> When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.

This happened to me too once, and I was also delighted. It simultaneously teaches you how to supply an EOF to anything that's reading your input off the command line.

But I've seen people scream about how obnoxious it is that the python shell knows what you want to do, and then lectures you about how to do it properly instead of just doing it for you. I don't understand those people.


I still harbor a certain fondness for "'long long long' is too long for GCC", which is superficially a bit silly while also subtly acknowledging that an ISO-conformant C compiler could support "long long long" as an implementation-defined type.


I think clang got support for detecting unresolved merge conflicts around a decade ago too.


I work on a consumer facing mobile app and have found great success in including a short unique code along with any error messages. This has been helpful to diagnose errors where the same message can appear in a variety of different scenarios, and has significantly improved the user's ability to accurate report errors.

When looking at user complaints, I find that users often report error messages by paraphrasing them. An error message like "XYZ failed to initialize" often ends up being reported as "the app is not initializing" (which naturally could mean a number of things). By modifying this error message to include a code such as "XYZ failed to initialize (CODE-ABCD)", I've found that user behaviour shifts to including the code as opposed to the error message itself. Users instead say things like "I'm receiving CODE-ABCD" which is infinitely more helpful as a developer or consumer facing customer service personnel.

This reminds me of PHP's "T_PAAMAYIM_NEKUDOTAYIM" error which, despite being completely unintelligible, was very quick to resolve due to its uniqueness.


As someone who speaks hebrew and doesn't do PHP at all this was very strange to discover :D


As someone who does neither, this piqued my curiosity!

> in PHP T_PAAMAYIM_NEKUDOTAYIM is the token name for ::, the static separator. It’s Hebrew for double colon.

The rest of the article is an interesting read, but I have no background or opinions… it seems the author and others do not agree it is a good [part of an] error message ;)

https://phil.tech/2013/wtf-is-t-paamayim-nekudotayim/


Some tips I would like to add, talking specifically about logging:

1, Make sure you have a git hash attached, either in the filename, at the start of the file, or when you throw an exception.

This helps massively when you can switch your local env to exactly the code being run when it crashed.

2, Log format should be standardised and have the following:

Timestamp (in utc), log level, guid of the data being processed[1], log message, filename and line of the message that created it.

[1] When dealing with distributed systems, multiprocessing systems or complex dataflows, starting a logging message with "... %guid%: some log message about this ..." can be a massive time saver. The guid could be a literal guid or some serial attached to the data with a type identifier.

3, Try and make your error message unique across the codebase. The filename and line log format helps to track down where the message comes from but if you're just given the text "%id%: Failed to locate TPM in existing KYC lookup being able to ctrl-f that exact message across the codebase and get started debugging saves time.

4, Debug logging isn't helpful when you need to debug something if it isn't turned on. Roll your logging files by the hour if you really have to keep the file size down. But you're going to need those debug messages when you're debugging. If you're generating gigs of log files per day, allow me to introduce you to the concept "cost of doing business"

5, Don't do stupid stuff that makes your log files harder to read. Binary encoded log files that you have to run through a tool to get the data out, archiving after some arbitrary time period into zip files that collect logs over a different time period, anything that is going to put friction between ops and getting the info they need when they're stressed and rushing to fix stuff.

Probably some more stuff but that will do for now.


But you're going to need those debug messages when you're debugging

I second this so much. Clean up your debug diagnostics mess into something readable and useful instead of turning it off. Otherwise you’re deliberately throwing valuable info away. Yes you can re-enable it, but will that bad thing happen again today or in two months? You’ll need them when something goes wrong (and it sure will). There is no point in having logs like “it started”, “it working”, “it failed”. When they ask you how it failed and how to fix that, you’ll be more likely able to answer quickly.


For 4 there are log libraries that let you log on info and debug but only standard print the info logs, then later if an error occurs in scope emit also the debug logs.

This helps save data and performance in happy paths and still retains access to debug information.


That's kinda cool, I didn't know that was a thing but doesn't seem too difficult to achieve with a buffer.

Problem though, most of the cpu time to log a message is in preparing the log statement. Once you hand it off to a logging library you're off the critical path, probably on a background process and then off to the OS and sequential writes to disk (which is faaast).

There's also some benefit from having the other debug statements from previous correct runs to compare. If there's any state being set (you kinda deserve problems, but) it may help with debugging to have the previous messages.

And of course, the most difficult problems to solve are the ones that don't crash but run through feeling fine with values reversed / inverted / off by 1 etc.


> Don't do stupid stuff that makes your log files harder to read.

Like use cloudwatch. I'm still amazed at how bad that interface is for looking through log files. I'd be about a million times happier with just a plain text log file.


My personal favorite: "An unexpected error has occurred." (Apple!). Or the more casual phrasing "This should never happen." (open source!).

I sympathize though. Ironically these messages are often written by people who try to do the right thing by detecting as many error cases as possible. Eventually their creativity fails, and, well...how would you describe an unexpected case within the framework of all the other cases which have simple explanations?

This could be tolerable in some cases/products (presumably the failure case is very rare) -- but for a special sort of torture, apply this kind of error handling 10x (or 100x) throughout the codebase!

(Absolute) Minimum mitigation: add a (unique!) error code to the inscrutable message, e.g. "An unexpected error has occurred, code T-2600.", and give customer support (and debugging engineers) something to work with. Maybe even a crumb for resolution via web search.


> My personal favorite: "An unexpected error has occurred." (Apple!). Or the more casual phrasing "This should never happen." (open source!).

This is a pet peeve of mine. I've been known to write plenty of "This should never happen." error messages, but I pretty much always include a description of what assumption was violated. EX: "User token is null, this should never happen!" (in a part of the program that should be impossible to reach without a token). Even if it's useless to the end user, it's so much more informative to the programmer than a generic NPE and a stack trace, and most likely it at least provides a hint to the end user roughly what's broken.


One of my all time favorites is when mkdir fails with "No such file or directory". Like, yeah… that's why I'm asking you to make it. I've basically hard-wired a section of my brain to recognize this & realize that there's some parent that doesn't exist.

Recently got a beautiful error from Azure AD which is grammatically broken. In the "I accidentally it" sense, whole verb missing. Sensible additions to try to fix the parse of the error message results in a message that says "you can't do it because X isn't true" … except that X is true. (Over week later, and Azure Support missing their alleged SLA yet again … IDK, we're still waiting for an answer.)

Once had an x86 fault with #DIV in some assembly code. Normally #DIV means "division by zero" … and that's what I thought it meant exactly at the time, but it turns out it can also mean that the destination register was too small. Spend a good while wondering why I was getting "division by zero" when the register wasn't 0.


“So what makes a good error message then? To me, it boils down to three pieces of information which should be conveyed by an error message:

Context: What led to the error? What was the code trying to do when it failed?

The error itself: What exactly failed?

Mitigation: What needs to be done in order to overcome the error?”


Printing out an exception stacktrace is generally awful in a non development context.

But hiding them and printing out "failed to run" is even worse.

I've seen people do that to have a "neat output".

Don't be that person.


I'm sad that we forgot the value of core dumping. Stacktraces are cool and all, especially when you're developing and just want to know which assertion didn't hold. In production we have to deal with issues that we can't reproduce, and a a huge step up from stacktraces in that context is dumping core.

I wish there was an easy way to dump core from java inside a docker container, and some sort of software for keeping track of them. Something like coredumpctl and journald, but for k8s.


> I'm sad that we forgot the value of core dumping.

The value of the core dumping has greatly diminished once commercial vendors have started stripping symbol tables from binaries they would ship (IP protection! trade secrets! know-how!), and it has diminished even greater when core dumps started approaching gigabytes in size – the reaction of system administrators was swift and cruel: «ulimit -c 0» at the system level.


Stack traces are in that awkward spot where they mean nothing to a user but are often nearly useless to a developer as well. You really need a good core dump reporting system with something like Breakpad, in addition to whatever you tell the user.


you've made me curious. what makes Breakpad so useful?

[edit: i looked - it looks like a way to get cores off a machine and get a stack trace. which is nice ,but hardly changes the picture very much]


I've had good experiences with "error codes".

In our case, we came up with a way to assign unique codes to standard library exceptions for the language we use(.NET), custom codes for our own exceptions, and mask that with a number that roughly indicates where the exception occurred.

You end up getting codes like 0x800482c6 though which might end up frustrating users, since they have no real way to decode them.


Putting on my "user" hat, a stack trace is a just a long winded way of saying "ABEND" or "core dumped".


I have never seen better error messages than those coming from the Rust compiler, rustc.

I have never seen worse error messages than those generated by GNU GCC's C++ compiler, g++ (except if you count no error message, perhaps).


I didn't try rustc, but g++ error messages have improved a lot in the last decade or two. I think clang made a point of having better error messages, and g++ followed suit.

One reason C++ error messages are so terrible is because of metaprogramming. C++ has a very powerful template system that has been abused a lot, not only that, but it retained the C preprocessor. When you are just using an API (ex: boost), all the complexity is hidden away from you, you just have to use the convenient features, but to the compiler, it is a horrible mess of templates and when you do something wrong, the mess surfaces.

A lot of what makes modern C++ is just cleaner ways of doing what was done using clever hacks before. It had a positive impact on error messages, but C++ is still a really complex, tricky language, with error messages to match.


There are different kinds of error messages in C++, all of them terrible, but each for different reasons.

Template error messages are often best understood as stack traces of the “metaprogramming language”, which isn’t C++, but a complicated type substitution system that beginners don’t understand when they first encounter these errors. So that’s confusing.

But I found that the hardest errors for beginners are linker errors: They look like line noise and contain no easily readable information to point at the source of the problem. It shouldn’t be necessary to explain linker hacks from the 80s when teaching someone how to find out what they forgot to implement.


> But I found that the hardest errors for beginners are linker errors

I am with you on that one. I just didn't consider the linker as part of the compiler. I mean, wtf is a ctor and a vtable?

Now, I know, but when I started programming, the linker was the scariest part for me, and it took me a while to be somewhat comfortable with it. Obviously, error messages didn't help. The good thing is that linkers (static and dynamic) don't change as much as compilers, so it is useful knowledge in the long run. Understanding ELF files really helped me on that one.


I find Rusts error messages a little verbose for my taste. Sometimes it's hard for me to parse what's the actual error, and what's just "helpful hints" about where the real error is.

That brings up and interesting point missed in the OP. The term "Good error messages" is also context dependent. Rust has to have verbose error messages, because those messages are doing a lot of the heavy lifting of teaching people the "high concept" of rust (lifetime analysis).

So we see that good error messages (like all writing) also depends on the audience. Your log message that "config.yaml couldn't be parsed because key x has a wrong value" might be great for someone writing the config.yaml himself, but for someone using a configurator gui, that may be next to useless.

By the same logic I'd also expect Rustc's error messages to become less verbose in the future, as people get more familiar with rust and don't need as much guidance.


Pretty sure Rust took a lot of inspiration from Elm

https://elm-lang.org/news/compiler-errors-for-humans


I always try to follow a similar set of guidelines, but errors in literally any programming language just sucks at it. The proper error object should have these properties:

  system info:
    which module/function
    what it tried to do
    what it expected
    what happened
    important context values
    [how to restart]
    [how to cleanup]
    underlying error object
      …
        …
  user info:
    human-oriented message
    ways to resolve
  formatting:
    format for logging (a stack of messages)
    format as json schema
    format for debugging (a detailed stack of messages interspersed with a call stack)
    format for innocent user
You may recognize first few lines as standard questions to answer before posting an issue. Instead we usually have a message field and maybe some call stack. Error paths can be half of a decent long-running program, but all we have is a stupid string and no standard way to extend that.


In short - Good error message is a message with error details and context. I saved a time for you, guys.


Providing a TL;DR version of an article can be valuable, but only if you hit high enough precision and recall rates.

Here, your comment only talks about the article's main points, giving it a high precision rate. However, out of three key ideas you managed to forget a whole third of the author's conclusion. This makes the recall rate woefully inadequate.

https://en.m.wikipedia.org/wiki/Precision_and_recall


And guidance on how to fix it.


We've come a long way since the TRS-80 had "HOW?", "WHAT?", and "SORRY.".... https://www.trs-80.org/level-1-basic/


It isn’t hard to generate much better error messages for the BASICs of that time. I think they would have had much better error messages if they weren’t as memory constrained as they were.

Also, the statement should be “We’ve come a long way since ed”. Those TRS-80 messages are bloat ;-)


My browser says "An error occurred during a connection to www.trs-80.org. Peer reports it experienced an internal error."

Here's an archived copy:

https://web.archive.org/web/20210416072520/https://www.trs-8...


?

Ed is the standard text editor.


.ass or amalgam scene specification is a source file type. A format file produced by the 3ds Max exporter.

In Halo 2, .ass files generate .scenario and .scenario_structure_bsp files, which can then be edited by other tools.

When Halo 2 Vista was released, an ".ass error" would occur, showing a picture of the lead engineer Charlie Gough's butt.

This has since been patched.


How did they identify the butt?


His face and clothes were in there.


huh, most guys that flexible have butts too small to hold all that.

on edit: I was hoping for a whimsical Wes Anderson detective story though.


Good error messages are an underrated.

Python 3.10 is the first release to be exciting to me, since 3.6. Not because of pattern matching. Not because it adds 2 builtins. Because the error messages are so much better. And 3.11 will do even more.

Who said you needed a buzz word to get a nerd attention?


Postgres, you learn so much from the error messages.

Mysql, on the other hand "please read the manual"


If you're dealing with a spaghetti of microservies, and each throws its own kind of error objects and messages, while also passing them to upstream services, you're essentially playing a game of broken telephone with your error messages.


You're right, I am. At that point half the work becomes shifting around who is responsible for the error message such that you can finally figure out that some firewall wasn't configured correctly.


A trick i've been using for years to help write better error messages is to simply fill in the blank...

"Well basically what happened is _____________"


Code line.

Almost always a trace when it's a modular codebase.

The content that triggers the error i.e. arguments and type structure if resolvable.

I.e Everything to remove needing the bed to attach a debugger.

Maybe a human readable message if it's something being used by non developers but with all of that attached in say a .zip


In some ways, error messages are like crutches for the users. Some of the badly written error messages makes sense to the author of the code but not to the user of the code. For instance, this helpful reply on SO actually made me chuckle when I first read it, since the implication was that the error message makes sense when you understand what is going on, whereas a good error message is supposed to work the other way around

https://stackoverflow.com/a/43180701


There are different types of errors (just a few):

* Validation errors (e.g.: trying to persist something, users want to see as many as possible in one go) * Compile time errors (seen by programmers) * Runtime errors without resolution (user gets 500, logs should contain the details) * Runtime errors with resolutions (users should see something nice, maybe logs contain more info) * ...

This is very important for the message...


I find compiler error messages give an insight into how they work, the order of their processing things like that which can be especially useful when they are closed source.

In general error messages give away alot about the programmer(s) knowledge and abilities, it can give you an insight into if defensive programming methods have been used and other things like that.


In my personal opinion based on years of writing apps that generate error messages, a good error message is "one that stops the *** user emailing me without bothering to actually read the message first".

sadly I have failed in that respect, but it has kept me employed for a looong time, so not all bad ;)


https://www.youtube.com/watch?v=jpVzSse7oJ4 was a good recent talk on error messages


Sad that this had to be written…

But it's sort of like button labels, e.g “Cancel” and “OK” when you want to cancel something – nobody wants to think, they just do :/


Salesforce is the worst when it comes to helpful error messages. I can sink hours trying to figure out what some vague message means.


I think Google API error messages are pretty awful. Good run for the money against SF,



Just copy Elm!


Error: this developer has lost interest in the project, good luck at resolving this error!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: