Hacker News new | past | comments | ask | show | jobs | submit login
Modular Errors in Rust (sabrinajewson.org)
134 points by mplanchard on April 9, 2023 | hide | past | favorite | 27 comments

I encourage people to check out my SNAFU crate [1]. I urge users to create many distinct error types (usually enums but also structs) and compose them — so much so that I advocate that you never create each specific error in more than one source location. That means that the collection of error types produces a unique trace through your program, uniquely identifying the source of the error with no runtime cost (compare this to a runtime-collected stacktrace / backtrace).

Here's how the final example in the blog post would look like with a quick transform to SNAFU: https://gist.github.com/shepmaster/fb7f4c9519a074ea7186ca7b7.... If I spent more time, there'd probably be a bit more changes, but hopefully that gets the idea across.

[1]: https://docs.rs/snafu/

What for? What do you expect the user of your library to do with this detailed information?

That's an interesting question that I find difficult to answer. To me, it's as if you are asking "why provide any detail about what happened?". The nonsense end result of that line of logic is that you return a single boolean that corresponds to failure/success, so I assume that's not what you mean.

Contrast the two error messages:

> one end of range is not a valid hexadecimal integer

> the beginning of the range ("0xZZ") is not a valid hexadecimal integer

The latter has more information and points the user to the general area of the problem, and probably allows them to fix the problem themselves. It could be even more improved by providing line/column information.

By splitting out more and more error types (and cases), and artificially limiting yourself to using each error once, you push yourself away from "range is bad" style errors and more specific (and ideally descriptive) errors.

I'm not saying you shouldn't expose it, but why expose it as distinct error types? Why expose it as a programmatically-accessible information at all?

I can see how the developer debugging the library might want it, but what is the advantage of this information being a type/enum/... over the same information in a string, if the only purpose is for a developer to read it?

> Why expose it

I read "expose" in two possible ways: (1) as a public- / user-facing API (2) existing at all.

When an error type is a public API that is bound by semver, I do think you should be very careful about what you expose. I suggest starting with an opaque [1] error type and only exposing exactly what you are comfortable with supporting. That may boil down to basically only a string (a.k.a. the `Error` trait).

> Why expose it as a programmatically-accessible information at all?

A few things come to mind...

I'm a huge fan in testing my error cases as much as possible. To that end, a bunch of my tests are semantically `assert_matches!(Err(MyError::Case { .. }), function_result_value)`.

Not all errors are equivalent. For example, if my library fails to read a configuration file, perhaps the caller of the library can recover from that by downloading a file. However, this requires the caller to be able to tell what caused the error.

Interacting with an external API, such as HTTP status or command line exit codes. In this case, you can categorize your errors into domains like "server error" / "client error" / "authorization error".

These could all be done by string matching, but that tends to be comparatively brittle.

> the same information in a string

Aggregating strings like that requires dynamic allocation, which isn't universally available in Rust programs. For example, SNAFU works fine in a `no_std` environment and I know that it's been used in cases like embedded and Windows kernel drivers.

> if the only purpose is for a developer to read it

I don't think that's always true.

[1]: https://docs.rs/snafu/latest/snafu/guide/opaque/index.html

It fits better with the language. It doesn’t cost much (anything?) and in simple cases (errors with no dependent fields) returning it is akin to returning an error code.

> The nonsense end result of that line of logic is that you return a single boolean that corresponds to failure/success, so I assume that's not what you mean.

I frequently find that is all that is necessary, and is exactly the correct solution, not "nonsense" whatsoever. Perhaps even that boolean is overkill: an unrecoverable bug should perhaps instead log and then terminate, producing no error condition for the caller of your library to worry about attempting to recover from whatsoever. Detailed error reporting can be done without obscenely distinct error types, and in fact detailed error reporting frequently benefits from a focus on the task itself instead of trying to structure data to leave the task of actually reporting the error to someone else, and kicking the can down the road.

E.g. for parsing focused errors like this I'd be more interested in pretty printing the source line in question, underlining the error, etc. - I wrote https://github.com/MaulingMonkey/json-spanned-value for helping ease the display of bad JSON data in a manner that integrates with your IDE for ease of fixing said data by jumping directly to the cause.

And then, admittedly, there are times when different errors should be recovered from differently. When the caller might wish to retry an operation after some errors, try a different operation after other, report file+line+range information for yet other errors, etc. - these have very concrete answers to "What do you expect the user of your library to do with this detailed information?" that are not difficult to answer at all.

> Contrast the two error messages:

Both are UTF8 strings without types adding anything obviously useful. A silly demo screenshot of errors that will open the offending document if closed, and navigate to the line/column of said error:


Which operates by immediately dumping the "errors" to terminal without any error types involved whatsoever... nor even a boolean branch! Well, a real program might set a boolean so the CLI knows to exit(1) instead of using partially parsed data...

> Which operates by immediately dumping the "errors" to terminal without any error types involved whatsoever

I haven't looked at your code, but if I take this sentence at face value, I'd be very hesitant to use your library in many contexts.

One example would be anything similar to a web server, such as an API that accepts JSON via a HTTP POST. It would be very strange to have my JSON parser print to the console where no one is reading!

A lot of your comment indicates similar focus on a human interacting with a terminal, which is a very valid usecase, but not the only usecase.

> instead of trying to structure data to leave the task of actually reporting the error to someone else, and kicking the can down the road.

I agree with this in broad strokes. I think that it's very important to use your error types to ensure that they provide value. For the JSON example, I think that I might have an error that indicates the specific error as well as a byte offset that the error occurred. This could be built on by higher level errors that convert to line/column numbers or attach an excerpt of the bad input, as appropriate. These types could have methods or implement traits that allow formatting for the console (considering optional coloring, etc.) or be formatted for a logging system.

> an unrecoverable bug should perhaps instead log and then terminate

I technically agree, but I have a very high bar for when a library I write is allowed to unilaterally terminate the process — it's simply too drastic of a decision to make in most cases.

I also hope that the act of logging is appropriately abstracted. Thankfully, Rust only has a few common logging interfaces so it's easy to fall into the pit of success there.

> an unrecoverable bug should perhaps instead log and then terminate

For an application, sure that can be fine.

One needs to be careful in a library, because it is not always the case that what seems like an unrecoverable error to the library author is always unrecoverable for the application.

Consider a library designed to retrieve process information from "/proc". I could very easily see a library author concluding that if "/proc" does not exists that is an unrecoverable error worthy of termination. After all, in that case, the library is useless.

The application that wants to use the library may strongly disagree, and may be expecting that /proc might be unavailable in some locked down environment in which it sometimes runs, and has some fallback it can use in those environments.

If you return a error the application can easily fallback. Otherwise you are forcing the application to do something like check for /proc existing before even calling your library.

A library with global state may assume that if some invariant of that global state fails to hold (and this got detected) that this should be treated as unrecoverable. And well this one can also vary. Sometimes having the process die here can be the right approach, especially for certain development scenarios, since this means there is either a code bug, nasty undefined behavior, or possibly a compiler bug going on.

But depending on the nature of the library it might be possible that there is a sensible way to reset things without bad side effects, in which case, it could potentially make sense to return an error indication that the application could potentially handle by unwinding to a state where resetting your library is sensible, and then resetting it.

I would expect that the primary use is for those debugging the library. Though this could be users, it would also be developers of the library.

Though, even as a library user, knowing the source of an error can be useful for working around the issue, temporarily, or maybe even figuring out that I was using the library incorrectly.

Better understand what happened, and if/how you can fix it.

I'm not saying why have error reporting at all, I'm asking about this specific approach, for example as opposed to the article being discussed.

> as opposed to the article being discussed.

I'd actually say that the article and my suggestion are basically compatible. If you check out my gist [1], you'll see that I have basically the same number of user-facing types (or maybe a slightly smaller number as I merged a few structs and enums in some spots).

[1]: https://gist.github.com/shepmaster/fb7f4c9519a074ea7186ca7b7...

I'll definitely be thinking about errors in a new way, so thanks!

> a success story, look no further than the Rust compiler itself; I don’t think it would be an exaggeration to say that Rust enjoys the current popularity it does because of how good its error messages are, and how much effort was put into it.

The compiler diagnostics use a very different system than your error types. (eg https://github.com/rust-lang/rust/blob/master/compiler/rustc...).

I think your approach adds a lot of value to a very generic library, but that often all or most of the reasonable users of a library will only propagate. If so, advanced error messages can be a waste of time. For example, a library for writing to the terminal will probably be used by applications that exit with an error message if a write fails, so it's best API might be an opaque wrapper around a string with a very specific error message.

Absolutely! The right kind of error type really depends on use-case — I targeted this kind of use case because I see people get it wrong most often, but it’s definitely not a one-size-fits-all solution.

We've been trending toward more localized errors in our codebase at work for a while, largely because of issues like the ones mentioned in this article, although we haven't gotten quite as far as is described here. I'll be adding this to our internal list of articles for rust engineers to read as a part of establishing patterns and idioms!

I do think thiserror can help with a fair bit of the boilerplate, especially when making lots of error types. It'll be interesting to see how we can get the patterns described in this post playing nicely together with it.

Minor typo near the top: 0x0800 should be 0x0080. Otherwise a very nice article, if you're interested in Rust.

> It is thankfully common wisdom nowadays that documentation must be placed as near as possible to the code it documents, and should be fine-grained to a minimal unit of describability (the thing being documented).

I think that documentation which describes each part of a codebase in isolation, without explaining the project's architecture and high-level information and control flow, is a common failure mode of fine-grained "doc comment" based documentation (Doxygen, to an extent Rustdoc, possibly Javadoc but I don't write much Java). I'm looking forward to more projects adopting https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html or similar, particularly to help code contributors who don't already have a maintainer's birds-eye picture of the codebase.

> The codebase becomes more modular. Individual parts can be extracted into different crates or projects if necessary, and strong abstraction boundaries make the code easier to understand in small pieces.

I'd argue that a codebase with too many interacting parts, abstractions or indirections, and possibly excessive modularity, can sometimes result in a system more intimidating to learn than a more straight-line approach (http://number-none.com/blow/john_carmack_on_inlined_code.htm...).

Otherwise I'd largely agree with this post. Another reason to not define the errors returned by a function separately from the function itself, is because of DRY and "every piece of knowledge must have a single, unambiguous, authoritative representation within a system". In this case, allowing a function to return a new error should not require adding that error in a global enum, and separately adding that error return in the actual function located far away from that enum. And finding which function an error value comes from should not require searching the whole codebase for 1 or more occurrences of that value's type.

Finally people are rediscovering why Java (checked) exceptions were so good :)

Rust's error handling is roughly isomorphic to the Platonic ideal of checked exceptions. In both cases you're forced to deal with (recoverable) errors and you know which lines of code can produce (recoverable) errors and which cannot.

However, the problem is that Java didn't implement checked exceptions well, as Joe Duffy articulated.


> 1. Exceptions are used to communicate unrecoverable bugs, like null dereferences, divide-by-zero, etc.

> 2. You don’t actually know everything that might be thrown, thanks to our little friend RuntimeException. Because Java uses exceptions for all error conditions – even bugs, per above – the designers realized people would go mad with all those exception specifications. And so they introduced a kind of exception that is unchecked. That is, a method can throw it without declaring it, and so callers can invoke it seamlessly.

> 3. Although signatures declare exception types, there is no indication at callsites what calls might throw.

> [...]

The divide in Rust between recoverable Errors and unrecoverable panics is much cleaner (even if it's impossible to get right in every detail).

I tink unrecoverable panics in Rust and Go are a bad design decision. There are very few situations (if any) where you want a completely unrecoverable error. At the very least you'd want to log it somewhere.

No wonder there's a `catch_unwind` in Rust since 1.9 and `set_hook` since 1.10 to try and awkwardly work around this design.

I think people haven't really wrestled with why Java checked exceptions were so bad. IMO the big problem was a lack of expressiveness. As a simple example, you should be able to write "map" in a way that says "this might throw whatever its argument throws" rather than saying "this can't throw anything" or "this might throw anything" if you want something reasonably generic.

The fatal flaw of Java errors: Java not have algebraic types.

That is.

"Just add this weird trick (algebraic types) and now it works"

A user of a library code such as this sometimes wants to catch all errors related to the library and doesn’t care whether they arose from a blocks.ParseError or a blocks.DownloadError or whatever.

I like the granularity of this approach, especially that a function can indicate that it only errors out in some very specific way, such as a blocks.DownloadError.

But how do we get the best of both worlds — what would be the best way to gather together blocks.ParseError, blocks.DownloadError, etc. into a single blocks.BlocksError type?

To a certain extent both approaches are viral. If your library uses the one big enum approach you'll want to add a DepFoo variant, and if you want to expose granular errors you'll need that info from your deps.

I increasingly think the best of both worlds is either extreme, but never the middle ground. Either you actually need very granular errors or you should just use anyhow/eyre and have opaque errors with good human readable descriptions. The middle ground still takes a lot of work, but doesn't seem to give correspondingly better results

After thinking this over, my inclination is to go to “both extremes” (I wouldn’t call it a “middle ground”).

1. Provide blocks.ParseError, blocks.DownloadError, etc. Individual functions use these, making it feasible to perform fine-grained error handling by matching on the type — as advocated in the article.

2. Provide a library-wide blocks.BlocksError non-exhaustive enum which is one layer of abstraction above the others and implements From for each of them. Define blocks.Result using this blocks.BlocksError. This allows users to write a function invoking library code which returns a blocks.Result, supporting the propagation use case.

It doesn’t seem like adding blocks.BlocksError on top of the others is that much more work. The hard part is creating the localized modular error types.

It's writing all the ParseError, DownloadError, etc that I'm suggesting you might want to avoid in certain cases

Unfortunately due to the fundamental design of Rust error handling eyre & anyhow capture backtrace information at the moment their type is created. So the lower down you use it the better error messages you'll get. And "ParseError on line 134 in file foo.rs" can be much more valuable than "FooError on line 5 in file foo.rs: Caused by ParseError".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact