

Silencing Many Hospital Alarms Leads To Better Health Care - dredmorbius
http://www.npr.org/blogs/health/2014/01/24/265702152/silencing-many-hospital-alarms-leads-to-better-health-care

======
dredmorbius
My immediate though on hearing this story on NPR was "well, this is pretty
much exactly the situation that affects much of system, site, and application
monitoring".

For the engineer designing a system, providing an alert or alarm is the safe
option. If something goes wrong and you _don 't_ alert on it, there's hell to
pay. If something _doesn 't_ go wrong and you do ... well, CYA and all that.

The problem is that the person (or team) responsible for assessing alarms has
to constantly filter signal from noise, and this can be difficult. Similarly
for filtering out relevant from mundane information in system logs.

I've spent months re-engineering and tweaking Nagios and other monitoring such
that noncritical situations _don 't_ trigger alerts, and properly configuring
relationships such that fundamental errors suppress derivative ones. If you've
configured Nagios to, say, send pages an alerts, you can rapidly DoS your own
sysadmin / DevOps team when something _does_ go wrong simply from the cascade
of alerts.

More specifically: following a few top-level health indicators, and _logging_
other metrics, has proven to be one of the most effective means of maintaining
system stability. Literally a periodic load of a top-level (or representative)
page(s), with response time indicated, and color-coding to indicate if you're
in the healthy (green), warning (yellow), or unhealthy (red) zone, plus
logging of results. It's the sort of thing that can be crafted in a small
shell script and displayed in a corner of your monitor.

------
qwerta
This was problem on russian space stations as well. There was constant stream
of alarms, and astronauts practically ignored them after 6 months.

