
The Scourge of Error Handling - mmastrac
http://www.drdobbs.com/architecture-and-design/the-scourge-of-error-handling/240143878?
======
pcwalton
Error handling is indeed very hard to get right. In Rust we've been
experimenting with different mechanisms:

* Much of our code uses the Result<T,E> type for local error handling, which is very similar to the way Haskell handles exceptions with the Error monad.

* For long-distance, fatal error handling, Graydon was very influenced by the "crash-only software" paper [1]. There is a `fail` statement which brings down the task permanently with no chance of recovery. (The only code that executes after a `fail` expression is evaluated is the set of destructors attached to the data the task owns.) Of course, other tasks can continue executing and might restart the crashed task.

* For long-distance, nonfatal error handling there is a new condition system like Common Lisp's -- you register a handler and that handler gets called whenever an error happens. The handler could tell the function that signaled an error to restart with a new value, to return a value of the handler's choice, or to fail the task (the default in most cases).

The hope is that this is a more robust and performant model than the
traditional exceptions model, while not being particularly verbose.

[1]: [https://www.usenix.org/conference/hotos-ix/crash-only-
softwa...](https://www.usenix.org/conference/hotos-ix/crash-only-software)

~~~
joe_the_user
Crash-only code sounds good.

It is worth remembering that the convention in early c development was to
check for null values whenever doing memory allocation. In most PC programs,
this convention added considerable overhead more or less for nothing since
once a program runs out of memory, recovery really possible (oh, and Linux
doesn't even return zero even when out of memory, it just boot programs).

~~~
fleitz
Checking for null values from malloc does not create serious overhead, it
creates minor overhead for your branch predictor, unless for some reason
malloc frequently returns null.

A single context switch back to the kernel generates far more load than the
null checks could ever hope to accomplish.

~~~
joe_the_user
I mean all those checks create considerable mental overhead from code-bloat.

------
michaelfeathers
I feel the same way about error handling, but I think it is more of a design
issue than a language issue. Ideally, an application has a barrier that deals
with anything from the outside world that can cause an error. Past that
barrier, code can concentrate on the main goal, not errors.

Bertrand Meyer once said that exceptions are for cases where you can't tell
whether an operation will succeed or not before trying it. Generally, that
happens in I/O, system calls and input validation. The problem with a lot of
error handling is that it moves beyond that realm and mixes with the logic of
the system.

~~~
stcredzero
_> The problem with a lot of error handling is that it moves beyond that realm
and mixes with the logic of the system._

What is the motivation for this? What are the perceived pains that programmers
are trying to cure which are not, "cases where you can't tell whether an
operation will succeed or not before trying it?"

~~~
yock
Most often I think it's due to poor separation of concerns. The programmer is
about to push an input down a stack of method calls several layers deep.
There's the success condition, the failure condition, then the exception which
captures unexpected behavior from the rat's nest of code he just called.

~~~
stcredzero
How often can this be addressed with orthogonal finite state machines? (For
example: one which embodies whatever business process, and another that
embodies errors and failures with IO and the network.)

------
VexXtreme
I personally avoid reporting errors with null return values, though that
approach might actually be useful in some situations. I've seen cases where
less-than-skilled developers had an irrational fear of exceptions and just
treated them like undesirable error states instead of an error reporting and
handling mechanism. It resulted in systems where every layer of the call stack
was polluted with blocks such as:

if (result != null) { } else ...

This usually results in the real cause of an error being completely and
utterly lost somewhere in the call stack, as using this kind of binary
approach to error reporting is inadequate to communicate what really happened.

Question:

How do you handle data concurrency issues in big multi-user systems? For
instance, if your lowest level service (usually data access layer) throws a
data concurrency exception due to some other user modifying the data that the
current user is trying to modify, how do you communicate to your user that the
data has changed and that they are supposed to refresh the web page, if you
are masking the real cause of the problem by just returning null values from
the lowest level of the call stack?

I personally handle cases like this by allowing the appropriate exception to
propagate to the highest level so that I can make a judgment call whether it's
the kind of error that needs to be reported to the user, silently logged or
something else. On lower levels I still might have try-catch blocks for
logging purposes and in certain situations I might rethrow exceptions if the
error needs to be reported to a higher level.

------
sgt101
"the conventional use of a null return both as an indicator of an error
condition and as an actual data item"

Yet, this is a thing that I have never done, nor would never do. Can someone
explain to me how it makes sense for this to happen in any other mode than as
a genuine error? If I want to indicate an error condition I throw an
exception, perhaps even one that I've written a class for. This allows the
compiler to check that I've put proper handling in the code to deal with this
condition. If I am stupid and I handle it with { e.printStackTrace();) well,
this gets me no where... but if I listen to the compiler and write in handling
code that repairs the condition then all proceeds nicely, as if the problem
has been properly dealt with and all..

Or is it just me?

return null; just says "your problem f*wad" to me...

Not acceptable.

~~~
Danieru
In C you have _no_ exception handling. The closest you can get is seqfualts.
With a segfault the debugger can give you a backtrace at almost the exact
issue point.

The easiest way to get segfaults is by dereferencing NULL. In C dereferencing
NULL is invalid. Thus returning NULL on failure will result in something close
to what raising an exception in other languages does.

In any language with real exception handling it makes much less sense,
ignoring optimization.

~~~
apaprocki
Dereferencing NULL is undefined -- on AIX the page at 0x0 is readable.

------
jacques_chester
Perhaps some taxonomy is required to help us deal with this.

Some errors are essential/problem domain errors. They represent an
impossibility with regards to the purpose of the code. I place validation code
under this heading.

Some are accidental/solution domain errors. They represent a failure of the
environment or computing platform. They are present to allow the code to
decide what to do when a design assumption (network availability, disk
availability, RAM availability etc) is broken.

There's also the question of who must handle the error. Exceptions allow
errors to propagate up a stack. Both C-style and multivalue return error
handling styles force inline handling. Common Lisp conditions allow out-of-
band error handling (some other agent freezes execution and steps in).

To me the core problem is this:

1\. Error handling obfuscates the purpose of a piece of code. It hides the
happy case and alternative cases.

2\. Error handling is an unavoidable requirement of all code. Things go wrong.

How best to reconcile these problems?

It would be nice to have an environment that hides and shows code in terms of
the path of execution. So you'd have a happy path view, views for each
alternative path, a view for the malloc fail path and so on.

No idea how that could be done. Sounds dangerously like generated code; or
alternatively a self-contained image environment à la Smalltalk/Lisp.

