Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We use pingdom to measure external uptime, so network issues will count as downtime as well.

Do you have any other questions about methodology? I'd be happy to elaborate.



No not really, although again the value of the stats all depends on what is being monitored and how, etc. For example, Pingdom can report that an SMTP interface looks just great while at the same time it is impossible to send mail on that port because of a bad disk or another failure somewhere else. We use pingdom too and it is a great tool and transparency is definitely great but any number of user impacting events can and do go unnoticed (depending on how you are monitoring and what you are monitoring...)

A great example is that as I write this, I just realized that our public Pingdom network status reports are offline... I count this as downtime against our global availability stats, but it isn't an event that would show up in my pingdom reports :)


Stats are back... weird :-) http://about.hover.com/networkstatus (as an aside, making it easy for customers to see uptime/downtime and network events eases the burden on customer service and makes it much easier for potential customers to check your credibility. easy to implement and definitely a plus for the business overall).


That's nice but feels more like a status page to me than a way to gauge long-term uptime (which is what we were going for here).

For our application outages, it's been pretty well correlated that if pingdom can't get a 200 OK on the test pages we've set up, it's been down. And I don't think we've had much if anything slip in under a 200 OK but still being down.

I'm sure we're still off by a couple of minutes here and there, but the big picture should be quite accurate.


Its both, but primarily status. If you drill down, you'll see the long term stats.


Yeah, but what is Pingdom hitting? I.e., is it the front page of each "app" or is it running something per app that tests the entire stack?


(I work at 37signals)

Our test logs in, causes some data to be fetched from the database, and renders a page which we then check against what it should return.

We haven’t (to date) had either any false positives or false negatives (when it alerts, the site is really down, and if it doesn’t alert, the site is really up).

This obviously isn’t a replacement for functional or integration tests to ensure that a commit doesn’t cause a piece of the app to stop working, but it does test the full infrastructure stack to make sure that it’s performing the way we expect it to.


Thanks Noah. If it's hitting the DB then that's great.

I'm just trying to point out that the definition of "uptime" is hazy at best. :)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: