There are so many free, cheap, and other services like this that it's hard to imagine what a company might differentiate on to become the "next big thing."
PagerDuty is an alerting system which plugs into any monitoring system (Pingdom, Nagios, Cloudkick, etc) and alerts your team via phone, SMS and email when problems are detected. We add advanced alerting features, like 2-way voice and SMS alerts, automatic alert escalation, and on-call duty scheduling to these existing tools.
You're right though, in that many people, on first glance, confuse us with a server monitoring or website pinging system. The "pitch" has gotten better over time, but it's still something we have to work on to improve.
Pingdom already does that. I'm not clear what you're offering here...
Actually, Pingdom is one of the most common services used in conjunction with PagerDuty.
At a higher level, what we're trying to provide is an on-call management and alert dispatching tool. What PagerDuty does is let you control who, how, and when people are notified when problems occur. In contrast, monitoring tools like Pingdom and Nagios focus more on detecting problems. While they have some native alerting functionality, we think with PagerDuty's advanced alerting, they can function all the better.
Maybe you should highlight that? My first impression was "what, yet another pingdom"? Definitely need a bit fine tuning there.
Wondering who will be your main target customer? Any website that has more than one sysadmin?
The only thing that this does that is really new is two-way SMS.
I also think PagerDuty's ability to graphically define the on-call schedule and escalation rules is much nicer than mucking around with Nagios's configuration files, but I'm a bit biased :)
Couple of other questions for the team:
A Zabbix plugin forthcoming? Do you have to respond to alerts in your interface or can our monitoring software let pagerduty know the alert has been handled?
Though we already have a lot of the functionality you provide through a few custom scripts we don't have the scheduling of engineers which I've been meaning to write for a while (but doing it manually with a small team wasn't enough of an issue). So certainly a service I would consider using, if not on this project, my next one.
In terms of setting a formal SLA, we haven't done so mainly because we're not sure how to go about implementing this. I've checked the SLAs of a few hosting and cloud providers including AWS, Rackspace, Linode and Slicehost, and I haven't found a compelling example to work from. Some of these guys don't have an SLA (they try their best) and the others give you only a portion of your money back.
The whole point of an SLA is to incentivize us to never go down. In our case, we know that if we ever go down, we will lose our customers; that's incentive enough :). Having said that, we may still add an SLA guarantee as part of a larger "enterprise" pricing plan.
We definitely plan on adding plugins for all the popular monitoring systems. We've also released an integration API to allow PagerDuty to integrate with any system that can make an HTTP API call (or call a command-line script that can do this).
I'm pretty sure Zabbix will work with PagerDuty right now, via the integration API. We'd love to work with you to set this up. Please send me an email at email@example.com.
Suppose someone doesn't respond to a page. Is it because they were too far asleep to hear the paging device? Because the paging device didn't work? Because some other problem kept them from working on the page remotely? Because their carrier blocked the page? Because you broke down? Because the problems in their system kept them from sending you the information in the first place?
There are a lot of points of failure. And your service is not one of the more likely ones to break. Furthermore if there is a dispute, whose records win? They didn't respond to a page, your records say they never sent the page. They blame you, how do you resolve that?
Therefore I'd suggest offering an SLA, but make it be something like, "If you missed a page and are convinced that it was our fault, we'll refund the last X months." From your point of view it is a no questions asked refund policy, that carries with it the consequence that that person is not allowed to sign up for your service. (Unless, of course, you're convinced it was your fault they didn't receive their page.) But whatever you do, be careful not to accept potential liability for something that likely was their problem.
I would also suggest that you share best practices. For instance an important one is that companies need to provide a well-defined escalation path. Recognize that humans fail (whether because of not waking up, being in the process of driving, etc) and so people are unreliable components that need a fall-back mechanism. The act of educating your clients about things like this will help them avoid problems that could cause them in an imperfect world (ie the one we live in) to become unhappy with you.
(Either way is fine, really... but arguing over "fault" is not a productive activity.)
I've had a server go down for a large group of users because of a malconfigured routing table between them and the server. If we'd had an expensive SLA, there would have been significant "what the heck is it we're paying for, then?" discontent.
last month I paid out almost fourteen grand in SLA credits because I didn't stop a DDos within my allowed 0.5% downtime. Was it my fault I got DDos'd? no. However, i was the only one in a position to do something about it. (and really, if I wasn't tired and generally an idiot, we would have been down for an hour rather than 8.)
You do need clear lines, though. if you need connectivity from point A to point B, that's easy, I can guarantee that. But defining connectivity to 'the internet' is harder. there are cases where I've got good connectivity to most places, but you can't get to some ISP in dallas, because they've hoarked up the routing table.
Right now, I play that sort of thing by ear. If only one customer is having the problem, I try to figure out where it is and if I can't figure it out, it's not that big of a deal to give them a credit. If many customers are having the problem, well, then I have a problem, and really, it's my job to figure out where that problem is and to work around it... even if that problem is a misconfigured router at some other ISP. I mean, really, what is the customer going to do about that sort of thing?
this is the point of having a SLA; it aligns the interests of the service provider with the interests of the customer.
However, what ultimately made the deal was not an SLA, but a new vendor that showed substantially deeper pockets than we expected. A meeting was organized with their sales guy and FOUR suits showed up in a black sedan. Most gangsta display of power, and our glass cubicles were gassed down with the musk of cigar, Brut and Drakkar Noirs.
Interesting. They don't include a free/freemium account, only paid ones with a free trial. I have been wondering about this.
I've always assumed the best business model is to offer a free plan for everyone that is not limited to time but with fewer features or some other limit/constraint like number of users, amount of storage etc.
I wonder how the two models compare. Because I know a lot of people simply will not sign up for anything, even if there is a free trial. People just want something free they can start using and that they don't run into walls - a la Google Docs, Gmail, Basecamp free account, etc...
The other reason is that we see PagerDuty as solving a real "hair on fire" problem, and we think if you're one of the businesses that needs this, it's reasonable to pay a certain amount for the service. I'd like to hear your thoughts on this.
Your target audience/market is obviously not the casual user/blogger type so it makes perfect sense.
Of course, the two-way SMS that lets you wake up the other guys if needed would break under this.
PagerDuty starts at $12/month. For a personal account that's a huge chasm for me to cross, but for business it feels like nothing. It's probably not even my money.
In business you're just not use to getting much for free, especially service - my bank charges me to write a check, my ISP charges me more to get business DSL to the office than home DSL, etc.
Maybe we're just resigned to that, but having a no free account policy just rides that waves and presumably increases profits (forces conversions to paid, ensures no loss-making free accounts)
My thinking is that if more people are using the service, that also doubles as an advertisement if the users tell some of their friends, recommend to coworkers etc.
Under the original free plans we saw a lot of people just signing up for free to play around with it, only to end up paying a couple weeks later. By making it an actual free trial system, we're able to put a lot more pressure and messaging throughout the upgrade process.
Though I think you may already have us beat by being YC funded, the jury is still out on that!
And I agree with the free trial model over freemium. Your service is worth paying for. Period. The trial is used for determining if your service actually works as expected. And you don't get network effects the more people that use your system. So there's really no point for freemium.
Keep up the good work guys! Very exciting!
congrats Andrew, Alex, Baskar and the rest of the team!
Btw, I've left you a "gift" in your Twilio account... whenever you happen to check your account balance :)
We've found the automated phone calls to be a much more reliable way of getting the alert out. We can tell right away that the message has been received by asking the listener to press a button on their phone, and repeat or escalate as needed if we don't hear the tone.
It's a similar level of cleverness as Lobby7, if anyone remembers that one...