Some anecdotes where error messages truly delighted me:
When I accidentally misresolved a conflict after a git pull, the C# compiler, upon encountering the >>>>> markers in the source code, noted that there’s an unresolved merge conflict in that line instead of complaining about bitshifts.
When I tried to bitwise invert a value in Rust with “~x”, which is wrong, instead of complaining about an unknown character, the compiler helpfully explained that the operator to use in Rust is “!x”.
When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.
Apple’s MPW C compiler had some funny error messages, sometimes also useful. Examples:
- This label is the target of a goto from outside of the block containing this label AND this block has an automatic variable with an initializer AND your window wasn't wide enough to read this whole error message
- String literal too long (I let you have 512 characters, that's 3 more than ANSI said I should)
I have a love/hate relationship with C++, but some of those error messages are just awful. When it comes to template errors in C++, I almost never try to understand any of the gibberish and just look for the file and line where the error was originally reported. Which is a task in itself, since most of the time the file you actually care about is neither at the bottom nor at the top of the pile of files.
On the other hand, error messages other than template-related errors have improved in C++ lately.
“Unexpected friend” was one of my favorites in C++. It seemed fortuitous; it’s good to have friends, right? (Usually the actual error was a missing semicolon.)
The best advice I've gotten on template error messages is that it's either at the top of the output or the bottom...then you've just gotta hope that your terminal has enough of a buffer to see the top[0] :)
[0] yes i know terminals support quasi-infinite scrollback by writing to a file, or i could redirect. but that makes for much less amusing anecdote.
Sometimes I gaze into the stars and wonder how many programming careers were ruined by nebulous "stack overflow", "null pointer exception" and "segmentation fault: core dumped" messages.
Visual Studio already does almost this, if you hover over a Intellisense or Build error it will give you a link to search for it on Bing. Which is extremely helpful when the error is "Linker error".
Well, you have to define a buffer size for the lexer, for both avoiding dynamic allocations and avoid bowing your memory. A 512 bytes buffer can hold 509 ASCII characters, two quotes and a zero terminator.
I don't know why they did it, but the size makes a lot of sense.
You probably wouldn't. But a standard should allow the most possible kinds of usage, and there are some situation where it makes sense (like on transpilers).
> When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.
This was something that one of those "Zen of Ruby" style articles brought up as a negative and although I'm not a Ruby person, I tend to agree. exit is not a keyword, but it is reserved so that you can get this message. Nobody can (or at least should) name a symbol exit, and alternative Python interpreters just turn "exit" into exit(). So why can't the normal interpreter do what I meant instead of being pedantic? It is generally user friendly in other cases (the new error messages in 3.10, for instance).
> exit is not a keyword, but it is reserved so that you can get this message.
`exit` is not reserved. It's a callable implemented in pure Python [1] injected into the local scope when in interactive mode. There's no special functionality associated with it—The error message is implemented using the standard `__repr__` behavior. You are free to override the value of `exit` if you so choose.
"Special cases aren't special enough to break the rules."
The context is good error messages; what you’re saying is that DWYM (do what you mean) is even better than a good error message. That’s sometimes true and can help beginners. But it also means more special cases and magic, which can make a language hard to learn in a different way. Python and Ruby are almost at the opposite ends of that trade-off, so it shouldn’t be surprising that a Ruby person doesn’t like Python’s solution.
Also, and this is really off-topic, I also completely disagree that the names of built-in functions are off limits in a program, or should effectively be keywords. I'm not saying it's particularly wise to name a variable "print" or something. But should I learn by heart the names of all built in functions before naming any variable? Should I refactor my program whenever a new one is added? Resolving those kinds of conflicts is what scoping is for, and it works.
> The context is good error messages; what you’re saying is that DWYM (do what you mean) is even better than a good error message.
I'm not sure that's true. "Just do what I mean" is an understandable response when you get an error message that hints that the program knew exactly what you meant. It feels like a human responding to "Can I have some water?" with "I don't know CAN you?", but it ignores the fact that the program doesn't really know what the user meant. Sure whoever authored the Python interpreter guessed that 99% of the time if you type `exit` you want to leave, but if they just assume that, then what happens if I'm actually trying to see what's in a variable `ext` (short for extension) and my fingers just auto completed `exit` because it's something I type so frequently (this kind of typo happens to me all the time). Suddenly the interpreter is dead and I have to start over on whatever I was doing.
I'd actually rather the program gives me instruction on how to what it thinks I want to do, and let me decide if it's really what I mean.
Because the built-in console doesn't support any special keywords or syntax. Other consoles already support special keywords and syntax, so it's no problem to interpret "exit" as one of those.
> When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.
This happened to me too once, and I was also delighted. It simultaneously teaches you how to supply an EOF to anything that's reading your input off the command line.
But I've seen people scream about how obnoxious it is that the python shell knows what you want to do, and then lectures you about how to do it properly instead of just doing it for you. I don't understand those people.
I still harbor a certain fondness for "'long long long' is too long for GCC", which is superficially a bit silly while also subtly acknowledging that an ISO-conformant C compiler could support "long long long" as an implementation-defined type.
I work on a consumer facing mobile app and have found great success in including a short unique code along with any error messages. This has been helpful to diagnose errors where the same message can appear in a variety of different scenarios, and has significantly improved the user's ability to accurate report errors.
When looking at user complaints, I find that users often report error messages by paraphrasing them. An error message like "XYZ failed to initialize" often ends up being reported as "the app is not initializing" (which naturally could mean a number of things). By modifying this error message to include a code such as "XYZ failed to initialize (CODE-ABCD)", I've found that user behaviour shifts to including the code as opposed to the error message itself. Users instead say things like "I'm receiving CODE-ABCD" which is infinitely more helpful as a developer or consumer facing customer service personnel.
This reminds me of PHP's "T_PAAMAYIM_NEKUDOTAYIM" error which, despite being completely unintelligible, was very quick to resolve due to its uniqueness.
As someone who does neither, this piqued my curiosity!
> in PHP T_PAAMAYIM_NEKUDOTAYIM is the token name for ::, the static separator. It’s Hebrew for double colon.
The rest of the article is an interesting read, but I have no background or opinions… it seems the author and others do not agree it is a good [part of an] error message ;)
Some tips I would like to add, talking specifically about logging:
1, Make sure you have a git hash attached, either in the filename, at the start of the file, or when you throw an exception.
This helps massively when you can switch your local env to exactly the code being run when it crashed.
2, Log format should be standardised and have the following:
Timestamp (in utc), log level, guid of the data being processed[1], log message, filename and line of the message that created it.
[1] When dealing with distributed systems, multiprocessing systems or complex dataflows, starting a logging message with "... %guid%: some log message about this ..." can be a massive time saver. The guid could be a literal guid or some serial attached to the data with a type identifier.
3, Try and make your error message unique across the codebase. The filename and line log format helps to track down where the message comes from but if you're just given the text "%id%: Failed to locate TPM in existing KYC lookup being able to ctrl-f that exact message across the codebase and get started debugging saves time.
4, Debug logging isn't helpful when you need to debug something if it isn't turned on. Roll your logging files by the hour if you really have to keep the file size down. But you're going to need those debug messages when you're debugging. If you're generating gigs of log files per day, allow me to introduce you to the concept "cost of doing business"
5, Don't do stupid stuff that makes your log files harder to read. Binary encoded log files that you have to run through a tool to get the data out, archiving after some arbitrary time period into zip files that collect logs over a different time period, anything that is going to put friction between ops and getting the info they need when they're stressed and rushing to fix stuff.
Probably some more stuff but that will do for now.
But you're going to need those debug messages when you're debugging
I second this so much. Clean up your debug diagnostics mess into something readable and useful instead of turning it off. Otherwise you’re deliberately throwing valuable info away. Yes you can re-enable it, but will that bad thing happen again today or in two months? You’ll need them when something goes wrong (and it sure will). There is no point in having logs like “it started”, “it working”, “it failed”. When they ask you how it failed and how to fix that, you’ll be more likely able to answer quickly.
For 4 there are log libraries that let you log on info and debug but only standard print the info logs, then later if an error occurs in scope emit also the debug logs.
This helps save data and performance in happy paths and still retains access to debug information.
That's kinda cool, I didn't know that was a thing but doesn't seem too difficult to achieve with a buffer.
Problem though, most of the cpu time to log a message is in preparing the log statement. Once you hand it off to a logging library you're off the critical path, probably on a background process and then off to the OS and sequential writes to disk (which is faaast).
There's also some benefit from having the other debug statements from previous correct runs to compare. If there's any state being set (you kinda deserve problems, but) it may help with debugging to have the previous messages.
And of course, the most difficult problems to solve are the ones that don't crash but run through feeling fine with values reversed / inverted / off by 1 etc.
> Don't do stupid stuff that makes your log files harder to read.
Like use cloudwatch. I'm still amazed at how bad that interface is for looking through log files. I'd be about a million times happier with just a plain text log file.
My personal favorite: "An unexpected error has occurred." (Apple!). Or the more casual phrasing "This should never happen." (open source!).
I sympathize though. Ironically these messages are often written by people who try to do the right thing by detecting as many error cases as possible. Eventually their creativity fails, and, well...how would you describe an unexpected case within the framework of all the other cases which have simple explanations?
This could be tolerable in some cases/products (presumably the failure case is very rare) -- but for a special sort of torture, apply this kind of error handling 10x (or 100x) throughout the codebase!
(Absolute) Minimum mitigation: add a (unique!) error code to the inscrutable message, e.g. "An unexpected error has occurred, code T-2600.", and give customer support (and debugging engineers) something to work with. Maybe even a crumb for resolution via web search.
> My personal favorite: "An unexpected error has occurred." (Apple!). Or the more casual phrasing "This should never happen." (open source!).
This is a pet peeve of mine. I've been known to write plenty of "This should never happen." error messages, but I pretty much always include a description of what assumption was violated. EX: "User token is null, this should never happen!" (in a part of the program that should be impossible to reach without a token). Even if it's useless to the end user, it's so much more informative to the programmer than a generic NPE and a stack trace, and most likely it at least provides a hint to the end user roughly what's broken.
One of my all time favorites is when mkdir fails with "No such file or directory". Like, yeah… that's why I'm asking you to make it. I've basically hard-wired a section of my brain to recognize this & realize that there's some parent that doesn't exist.
Recently got a beautiful error from Azure AD which is grammatically broken. In the "I accidentally it" sense, whole verb missing. Sensible additions to try to fix the parse of the error message results in a message that says "you can't do it because X isn't true" … except that X is true. (Over week later, and Azure Support missing their alleged SLA yet again … IDK, we're still waiting for an answer.)
Once had an x86 fault with #DIV in some assembly code. Normally #DIV means "division by zero" … and that's what I thought it meant exactly at the time, but it turns out it can also mean that the destination register was too small. Spend a good while wondering why I was getting "division by zero" when the register wasn't 0.
I'm sad that we forgot the value of core dumping. Stacktraces are cool and all, especially when you're developing and just want to know which assertion didn't hold. In production we have to deal with issues that we can't reproduce, and a a huge step up from stacktraces in that context is dumping core.
I wish there was an easy way to dump core from java inside a docker container, and some sort of software for keeping track of them. Something like coredumpctl and journald, but for k8s.
> I'm sad that we forgot the value of core dumping.
The value of the core dumping has greatly diminished once commercial vendors have started stripping symbol tables from binaries they would ship (IP protection! trade secrets! know-how!), and it has diminished even greater when core dumps started approaching gigabytes in size – the reaction of system administrators was swift and cruel: «ulimit -c 0» at the system level.
Stack traces are in that awkward spot where they mean nothing to a user but are often nearly useless to a developer as well. You really need a good core dump reporting system with something like Breakpad, in addition to whatever you tell the user.
In our case, we came up with a way to assign unique codes to standard library exceptions for the language we use(.NET), custom codes for our own exceptions, and mask that with a number that roughly indicates where the exception occurred.
You end up getting codes like 0x800482c6 though which might end up frustrating users, since they have no real way to decode them.
I didn't try rustc, but g++ error messages have improved a lot in the last decade or two. I think clang made a point of having better error messages, and g++ followed suit.
One reason C++ error messages are so terrible is because of metaprogramming. C++ has a very powerful template system that has been abused a lot, not only that, but it retained the C preprocessor. When you are just using an API (ex: boost), all the complexity is hidden away from you, you just have to use the convenient features, but to the compiler, it is a horrible mess of templates and when you do something wrong, the mess surfaces.
A lot of what makes modern C++ is just cleaner ways of doing what was done using clever hacks before. It had a positive impact on error messages, but C++ is still a really complex, tricky language, with error messages to match.
There are different kinds of error messages in C++, all of them terrible, but each for different reasons.
Template error messages are often best understood as stack traces of the “metaprogramming language”, which isn’t C++, but a complicated type substitution system that beginners don’t understand when they first encounter these errors. So that’s confusing.
But I found that the hardest errors for beginners are linker errors: They look like line noise and contain no easily readable information to point at the source of the problem. It shouldn’t be necessary to explain linker hacks from the 80s when teaching someone how to find out what they forgot to implement.
> But I found that the hardest errors for beginners are linker errors
I am with you on that one. I just didn't consider the linker as part of the compiler. I mean, wtf is a ctor and a vtable?
Now, I know, but when I started programming, the linker was the scariest part for me, and it took me a while to be somewhat comfortable with it. Obviously, error messages didn't help. The good thing is that linkers (static and dynamic) don't change as much as compilers, so it is useful knowledge in the long run. Understanding ELF files really helped me on that one.
I find Rusts error messages a little verbose for my taste. Sometimes it's hard for me to parse what's the actual error, and what's just "helpful hints" about where the real error is.
That brings up and interesting point missed in the OP. The term "Good error messages" is also context dependent. Rust has to have verbose error messages, because those messages are doing a lot of the heavy lifting of teaching people the "high concept" of rust (lifetime analysis).
So we see that good error messages (like all writing) also depends on the audience. Your log message that "config.yaml couldn't be parsed because key x has a wrong value" might be great for someone writing the config.yaml himself, but for someone using a configurator gui, that may be next to useless.
By the same logic I'd also expect Rustc's error messages to become less verbose in the future, as people get more familiar with rust and don't need as much guidance.
I always try to follow a similar set of guidelines, but errors in literally any programming language just sucks at it. The proper error object should have these properties:
system info:
which module/function
what it tried to do
what it expected
what happened
important context values
[how to restart]
[how to cleanup]
underlying error object
…
…
user info:
human-oriented message
ways to resolve
formatting:
format for logging (a stack of messages)
format as json schema
format for debugging (a detailed stack of messages interspersed with a call stack)
format for innocent user
You may recognize first few lines as standard questions to answer before posting an issue. Instead we usually have a message field and maybe some call stack. Error paths can be half of a decent long-running program, but all we have is a stupid string and no standard way to extend that.
Providing a TL;DR version of an article can be valuable, but only if you hit high enough precision and recall rates.
Here, your comment only talks about the article's main points, giving it a high precision rate. However, out of three key ideas you managed to forget a whole third of the author's conclusion. This makes the recall rate woefully inadequate.
It isn’t hard to generate much better error messages for the BASICs of that time. I think they would have had much better error messages if they weren’t as memory constrained as they were.
Also, the statement should be “We’ve come a long way since ed”. Those TRS-80 messages are bloat ;-)
Python 3.10 is the first release to be exciting to me, since 3.6. Not because of pattern matching. Not because it adds 2 builtins. Because the error messages are so much better. And 3.11 will do even more.
Who said you needed a buzz word to get a nerd attention?
If you're dealing with a spaghetti of microservies, and each throws its own kind of error objects and messages, while also passing them to upstream services, you're essentially playing a game of broken telephone with your error messages.
You're right, I am. At that point half the work becomes shifting around who is responsible for the error message such that you can finally figure out that some firewall wasn't configured correctly.
In some ways, error messages are like crutches for the users. Some of the badly written error messages makes sense to the author of the code but not to the user of the code. For instance, this helpful reply on SO actually made me chuckle when I first read it, since the implication was that the error message makes sense when you understand what is going on, whereas a good error message is supposed to work the other way around
* Validation errors (e.g.: trying to persist something, users want to see as many as possible in one go)
* Compile time errors (seen by programmers)
* Runtime errors without resolution (user gets 500, logs should contain the details)
* Runtime errors with resolutions (users should see something nice, maybe logs contain more info)
* ...
I find compiler error messages give an insight into how they work, the order of their processing things like that which can be especially useful when they are closed source.
In general error messages give away alot about the programmer(s) knowledge and abilities, it can give you an insight into if defensive programming methods have been used and other things like that.
In my personal opinion based on years of writing apps that generate error messages, a good error message is "one that stops the *** user emailing me without bothering to actually read the message first".
sadly I have failed in that respect, but it has kept me employed for a looong time, so not all bad ;)
When I accidentally misresolved a conflict after a git pull, the C# compiler, upon encountering the >>>>> markers in the source code, noted that there’s an unresolved merge conflict in that line instead of complaining about bitshifts.
When I tried to bitwise invert a value in Rust with “~x”, which is wrong, instead of complaining about an unknown character, the compiler helpfully explained that the operator to use in Rust is “!x”.
When I was in the Python interpreter and entered “exit”, which doesn’t work because exit is not a keyword, I got a message that told me how to actually exit.