Hacker News new | past | comments | ask | show | jobs | submit login

A far-out old employer of mine decided that their standard format for alerts, sent by applications to the central monitoring system, would include a field for a URL pointing to some relevant documentation.

I think this was mostly pushed through by sysadmins annoyed at getting alerts from new applications that didn't mean anything to them.

When you get an alert, you have to first understand the alert, and then you have to figure out what to do about it. The majority of alerts, when people don't craft them according to a standard/policy, look like this:

  Subject: Disk usage high
  Priority: High
    There is a problem in cluster ABC.
    Disk utilization above 90%.
It's a pain in the ass to go figure out what is actually affected, why it's happening, and track down some kind of runbook that describes how to fix this specific case (because it may vary from customer to customer, not to mention project to project). This is usually the state of alerts until a single person (who isn't a manager; managers hate cleaning up inefficiencies) gets so sick and tired of it that they take the weekend to overhaul one alert at a time to provide better insight as to what is going on and how to fix it. Any attempt to improve docs for those alerts are never updated by anyone but this lone individual.

Providing a link to a runbook makes resolving issues a lot faster. It's even better if the link is to a Wiki page, so you can edit it if the runbook isn't up to date.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact