
Ask HN: Basic Error Alerting Practices - ramenmeal
Hi HN,<p>Our team currently has an issue with handling and alerting on errors that occur in our application.<p>Right now, an error that should never happen, ie a bug in the system&#x2F;unexpected behavior will be logged as an error. Also any connection failures or other transient errors will be logged as an error. These logs feed into the same logging system, and alerts (pagerduty) are generated when any error is recorded.<p>We would like to set a tolerable threshold for transient errors like connection failures before an alert is triggered, but also be alerted immediately for the unexpected behavior type logs.<p>I&#x27;m sure this is a common scenario for any company as it scales, I&#x27;m curious what practices or strategies are used for this.
======
billoday
For us, it tends to be driven by the language/logging framework used. When
using syslog-type logging messages, we log at critical/fatal for the REALLY
bad ones. When using npm-type logging, we try to have the string CRITICAL in
the message, triggering the alert on a single occurrence of such messages.
General alerts have an x over y time trigger based upon usage/traffic/noise.
We also generally encourage using warn as the default error state, only
hitting error when things are broken.

