
Probabilistic Assertions: Crashing When Something Feels Wrong - slackerIII
http://www.spiteful.com/2008/04/14/crashing-when-something-feels-wrong/
======
blinks
Yay for automated monitoring software. Nagios (<http://www.nagios.org/>) does
this for networks (and is extendable for some other things). At my old job we
used Hobbit (<http://hobbitmon.sourceforge.net/>) to watch our Java server
instances (memory usage, etc.). There’s no reason why these monitoring
programs couldn’t be used to monitor internal program statistics, as long as
those stats were made available.

Generally you monitor from your internal network, and then provide some hook
for the monitor to get information that’s only accessible from there. (SSH or
a limited-access URL, etc.)

Monitoring programs are super-powerful and generally complex. Check them out —
it’s a good skill to have when working with production software.

(I also posted this on the article.)

------
angstrom
I've worked with threshold logic like that for collecting and analyzing
traffic on telephone switches where an alarm or notification would be
generated if the threshold was broken.

Personally I would never want to debug something like that using a statistical
probability that something _might_ have gone wrong. Better to fail gracefully
with something like multiple chains so that when a request chain goes down it
gets logged, cleaned up, and recreated.

Worst case scenario they get a request timeout warning.

------
pmorici
This sounds like circuit breakers for software. Instead of an over current
condition you've got excessive busynesses.

------
Tichy
Maybe frequent backups would be a better solution?

~~~
palish
The "delete all of a user's files" example was just one of many. It is
sometimes valid to force a crash when the code detects an impossible case (or
when you choose not to recover from that case for simplicity). If each
component either functions correctly or the program crashes, that can
significantly reduce debugging time. There aren't any silent failures from
components that simply ignore invalid input.

That said, with procedural languages I try to write code that recovers from
all reasonable cases and assert the rest.

~~~
Tichy
What if the validation code itself contains the errors. It might be a fallacy
to assume problems can be solved in that way.

~~~
palish
"Solved" certainly shouldn't mean "avoided entirely". One way to write good
software is to quickly recover from mistakes. So instead of staring at your
code while you triple-check each corner case, you approach the problem from
the other direction by testing each code path until you're confident that the
code runs as it should for the inputs that matter. (Be sure to assert for all
other inputs).

In that light, it is easy to see why it can be good to crash for invalid
cases. Since you don't handle a bunch of failure cases, you end up writing
less code. And since less time is spent in the debugger, more time is focused
on the correct task: achieving architectural goals rather than solving
structural problems.

After you acquire a deeper understanding of the architecture, it is best to
redesign (throw away) your previous attempt. After the second and especially
the third iteration you will have written a solid and elegant program in a
relatively small amount of time. Programming is about trusting yourself to
make decent decisions based on your current knowledge.

As with anything, there is only a finite amount of time to solve a problem. So
if you don't have a lot of time then don't fret if your code isn't perfect (or
reusable) as long as it works.

