
7 Good Rules to Log Exceptions - soundsop
http://www.codemonkeyism.com/archives/2008/12/16/7-good-rules-to-log-exceptions/
======
sokoloff
Decent article overall (fairly short and not particularly controversial
though).

One item I would argue with is the second half of this: > You need to decide
what critical means for you (most often it means losing money).

Just because you're losing money doesn't make it critical, at least not for
any site that operates at scale. If you log as critical everytime you knew
you're losing small amounts of money, you'll also have log files full of tiny
taxes you've paid. "Can't reach US credit card payment provider" probably
sounds critical, but isn't all that critical if you can place that order into
a stored state and retry it with the provider later. Sure, that loses money
because some of those payments will fail, and some users won't retry after an
email prompt, but that doesn't make it critical.

In my experience (on the dev side for years and now on the operations side),
developers log errors at one level more severe than is actually warranted.
("Divide by zero" is almost never a fatal error from the perspective of the
site operations team. :) )

~~~
DenisM
One usefuyl criteria is that which will affect a group of customers rather
thank a single person.

------
Hexstream
I think a good approach might be:

Log _everything_ , and then filter out the information you know won't be
interesting with rules. The remainder is what's interesting (if it isn't, add
to your rules to filter it out).

This way, if later on you're trying to track something which requires
information you usually don't find interesting, you just update your rules and
you have all you need.

------
edw519
_If you log everything you will probably get too many log entries to have a
meaningful reaction to exceptions in your log._

In theory, this makes perfect sense. In reality, you will almost always wish
you had logged more. Having good detailed data will enable you to discover
patterns when you can't recreate problems.

Disk space is cheap. Log it and archive or dispose it later. You can't see
what you don't save.

~~~
cconstantine
Generally you're right. To know what will be helpful for debugging you have to
know what will go wrong, and if you know what will go wrong you may as well
fix it now instead of after it affects a user. Also, in any sufficiently
stable system errors become _very_ hard to reproduce so you may not have the
luxury of reproducing the error before finding the root cause of the problem.

Large log files don't scare me for I have The Shield of Grep, and The Sword of
Awk.

The only issue I've seen with logging _everything_ is that some file systems
have troubles with files larger than 2gigs, and changing file systems may not
be an option. Depending on how quickly you rotate your logs you can hit that
limit and stop logging messages or cause errors elsewhere. With a log rotation
period of 1 hour I've hit this limit, and it's rather annoying.

~~~
DenisM
Why not rotate log files based on size?

~~~
cconstantine
A better idea may be to not log to file. If you were to log to a database you
could record things like date/time, session_id, service_name, module_name, and
a generic message and do clever SELECTs on your database of events/messages
and it would be easier to answer questions like 'Is this exception clustered
with time?' or 'is module X ever called in a session that results in an error
between noon and 1pm?'

Cleaning old messages is potentially easier as you can remove lower priority
messages first, or simply delete any message older than X.

Of course, if you're logging to another remote service (like a DB server), how
do you record low level events in communicating with that service, or how do
you record events when the db server is down?

~~~
joestrickler
Log critical/connectivity errors to a local disk.

