

Why Attacking Application Exceptions is Important - themcgruff
http://37signals.com/svn/posts/3070-why-attacking-application-exceptions-is-important

======
jtbigwoo
Many development groups treat fixing minor exceptions as a once-in-a-blue-moon
event. Sometimes it's just a matter of making exceptions more visible to the
developers on a day-to-day basis.

Several years ago I inherited an application that averaged ten customer-
discovered problems (with only a few hundred customers) and a system outage
each quarter. The logging system was set to email all exceptions to the
developers (there were a bunch of low-level exceptions every day.) The
outgoing lead developer had an outlook rule that funneled all logging email
messages to the junk mail folder; she suggested I do the same. I never did and
the annoyance of all those messages in my inbox did wonders for my motivation.
After two years, the system virtually never went down and we had almost no
customer complaints.

~~~
dennisgorelik
> outlook rule that funneled all logging email messages to the junk mail
> folder

That's a telling sign of incapable software developer.

------
gordonguthrie
On of the greatest things about Erlang is that stability is orthogonal to
correctness.

Essentially you have a process that does something (a worker) and a second
process that will be informed if the first process dies (a supervisor).

With that core construct you build a supervisor tree which restarts subsystems
(using OTP).

You can have every worker process crash out (ie no correctness) and have macro
stability (the supervision tree restarts worker processes correctly).

The consequence of this is that you only code the happy path . Try/catches are
very rare (typically at hypernumbers we wrap end-user input with try/catch and
only have one try/catch per 25kloc or so).

The VM logs all these crashes. You check 'em out and fix them.

Happy path programming is fantastic. If I use try and use your library in ways
you didn't expect it will crash (and tell me that our expectations are out of
whack).

Every try/catch that you write wraps up and hides low-level bugs which you
never fix. In languages where crashed bubble up the call stack there is a
disjoint between correctness and stability - you need to tolerate a certain
degree of errors to get stability.

Turns out actually fixing them is good.

~~~
luser001
Why are so sure that merely restarting the crashing process and retrying the
operation will make it work correctly the second time?

Did I missing something in your explanation?

~~~
gordonguthrie
It doesn't work the second time. It does capture the bug and mark it though.
The system is stable and up - it just doesn't work.

