

Tell HN: false alarms from Pingdom - paraschopra

We were changing some settings in the servers today when all of sudden Pingdom sent a flood of messages alerting that servers were done. I was like OMG-we-are-dead! Thankfully, it was false alarm. Just wanted to tell Pingdom users on HN to not worry about those false alarms. Although the irony of false alarms and non availability of Pingdom.com hasn't been lost on me.<p>PS: users tweets show the seriousness of this error. Sample a few:<p>&#62; SysAdmins across the world are having heart attacks due to @Pingdom issues. #pingdom<p>&#62; Thanks Pingdom. We haven't had a good fire drill in a while. I didn't need to finish lunch anyway. #pingdom @pingdom<p>&#62; @pingdom - you're killing me with the false positive alerts @ 5am<p>&#62; @pingdom just got an alert that my sites are down since Jan 1, 1970. My website is older than I am!<p>Because of this serious goof up, we are thinking of switching monitoring service. Do you know any good alternative to Pingdom? Heard Watchmouse is good.
======
pedoh
Any time you put your eggs in one basket you're setting yourself up for
potential problems. Use Pingdom, but then add in either a direct competitor or
someone else like Keynote or Gomez (no endorsements of any of the above by me,
by the way). Then when one goes down, there's only momentary panic, and when
they're all going off, you can panic for real.

I wouldn't jump off of Pingdom because you think someone else is good. Unless
that someone else can provide some pretty hard evidence, their architecture
may be just as susceptible to a hardware failure as Pingdom.

I would expect Pingdom to write up a report on what happened and write up what
they plan on doing to prevent it from happening again. I don't see it anywhere
on the Pingdom site, yet (I'm not a customer).

~~~
pedoh
Here's Pingdom's response:

[http://blog.pingdom.com/2011/06/22/about-
yesterday%E2%80%99s...](http://blog.pingdom.com/2011/06/22/about-
yesterday%E2%80%99s-pingdom-outage/)

If I were a customer, I would think this is a good response. It's essentially,
"Here's what went wrong, and here's how we're going to fix it." If I were
Pingdom, I'd go one step further once they've made the changes they're going
to make. I'd write another post that says, "Remember that outage? Here were
the steps we said we would make. Today, we're announcing that 100% of those
steps have been taken, and to prove it, we just did a firedrill that
replicated that outage, and everything is fine."

------
someone13
Just so you know, apparently Pingdom had a hardware failure at their
datacenter, so the site is unavailable (though no data will be lost)[1]. And,
if your site was down in the past hour, then you'll be getting delayed
alerts[2].

[1] <https://twitter.com/#!/pingdom/status/83191625874018306>

[2] <https://twitter.com/#!/pingdom/status/83253096096088064>

