Hacker News new | past | comments | ask | show | jobs | submit login

That's a thought provoking article, and I see no problem with the proposed error management technique as a whole, for that particular project.

This however is a crypto library designed to run on consumer OSs, as part of programs which will likely offer much more functionality that generating random numbers.

In general, a bug is a software fault, which is a passive flaw in the program, introduced by a programmer. Faults can manifest and cause the program to behave in an uninteded way, which in turn can lead to failure - the program can no longer perform its function.

Any point on the fault -> error -> failure chain can be an intervention point. The relevant point for this discussion is the detection of errors and preventing them from becoming system failures, and if not possible, handling those gracefully.

Let's agree not to discuss recovery attempts and assume that the system is instead moved directly to a safe state when such a bug/error is detected.

A crash-only safe state is simple and quite easy to implement, but whether it's the best approach for a particular software depends on the dependability requirements of that software. Abandoning the current operation and returning to the top level execution context is an alternative that shouldn't be so easily dismissed.

In the BoringSSL case, there doesn't seem to be a reason to abort. The error condition is known, can be detected and failure can be returned just as easily. Panicking is also fine, if the parent program can react to it.




> This however is a crypto library designed to run on consumer OSs, as part of programs which will likely offer much more functionality that generating random numbers.

Are you referring to getrandom(2)? Unless you are using `/dev/random` (i.e. GRND_RANDOM) instead of `/dev/urandom` (which by the way you don't need to use [1]), the only case that getrandom(2) blocks or fails is the very beginning of the machine startup where not enough entropy has been collected. It is not something you would expect to occur more or less randomly.

[1] https://www.2uo.de/myths-about-urandom

> In general, a bug is a software fault, which is a passive flaw in the program, introduced by a programmer. Faults can manifest and cause the program to behave in an uninteded way, which in turn can lead to failure - the program can no longer perform its function.

The OP does explicitly say that it may be justified to make the function's API infallible. They strive to simplify the error case to handle (e.g. verification failure and other recoverable errors are combined to ease the error handling), and they are expected to exercise this right only when there exists no good and reasonable error handling strategy.

By the way, it seems that Mundane actually does not panic but aborts the entire process [2] with a rationale that panic handling in Rust is not as trivial. This decision can be problematic by its own, but I found that aborts are only used to guard against generally improbable error cases, e.g. linking or calling to a different library that happens to provide the same set of symbols as BoringSSL. If you say that this should be caught gracefully, uh, I'd say that you should also guard against an invocation failure due to dynamic linkage failure for the sake of user experience...

[2] https://github.com/google/mundane/blob/8aaa1c8/src/boringssl...


You can factor out whatever functionality of your system is that uses crypto in a separate process and then the crash is simply "my crypto process died", which you can handle and recover from. I think it's ok to force people to cleanly separate functionality from their main process when its failure doesn't have a meaningful recovery process and a half-assed recovery can lead to catastrophic security problems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: