

PagerDuty (YC S10) Makes Sure Your Team Knows When A Server Goes Down - alexsolo
http://techcrunch.com/2010/07/16/yc-funded-pagerduty-makes-sure-your-team-knows-when-a-server-goes-down/

======
andrewljohnson
I'm usually pretty optimistic on new YC start-ups, but I'm not sure the world
needs another uptime monitor.

There are so many free, cheap, and other services like this that it's hard to
imagine what a company might differentiate on to become the "next big thing."

~~~
alexsolo
I agree. That's why we don't do uptime monitoring, or any kind of monitoring
really.

PagerDuty is an alerting system which plugs into any monitoring system
(Pingdom, Nagios, Cloudkick, etc) and alerts your team via phone, SMS and
email when problems are detected. We add advanced alerting features, like
2-way voice and SMS alerts, automatic alert escalation, and on-call duty
scheduling to these existing tools.

You're right though, in that many people, on first glance, confuse us with a
server monitoring or website pinging system. The "pitch" has gotten better
over time, but it's still something we have to work on to improve.

~~~
btilly
If you haven't done it then I'm going to suggest that you also work on mobile
apps to make phones be as loud and annoying as possible. Yeah, it sounds
stupid. But a random Blackberry can be as loud as a Skytel pager, even if it
isn't by default. And it is worthwhile for someone on duty to make it so.

~~~
lsc
you know what I want? I want some sort of arm band to put my cellphone (or,
better, a giant bluetooth vibrator) so that when I sleep, I can be woken by my
pager without waking other people who may also be in the bed.

~~~
epochwolf
Here you go: <http://www.thinkgeek.com/gadgets/cellphone/9e31/>

~~~
lsc
Ooh, thanks! that looks like it might solve my problem.

------
hartror
_Main Question_ What uptime guarantees do you guys make? I saw on your answer
on your FAQ but if I was selling this to the powers that be I don't know if
your answer would cut it.

Couple of other questions for the team:

A Zabbix plugin forthcoming? Do you have to respond to alerts in your
interface or can our monitoring software let pagerduty know the alert has been
handled?

Though we already have a lot of the functionality you provide through a few
custom scripts we don't have the scheduling of engineers which I've been
meaning to write for a while (but doing it manually with a small team wasn't
enough of an issue). So certainly a service I would consider using, if not on
this project, my next one.

~~~
alexsolo
We've taken steps to minimize outages as much as possible. The system is
distributed across 3 data centers, with fast automatic rollover in case of a
data center outage. We've architected the system to ensure we never drop
alerts. PagerDuty integrates with monitoring via email or API; if we receive
the message on our end, we guarantee you will be alerted. We've had a few
incidents where we have delayed sending out the phone call or SMS alert for a
few minutes, but we've never dropped an alert.

In terms of setting a formal SLA, we haven't done so mainly because we're not
sure how to go about implementing this. I've checked the SLAs of a few hosting
and cloud providers including AWS, Rackspace, Linode and Slicehost, and I
haven't found a compelling example to work from. Some of these guys don't have
an SLA (they try their best) and the others give you only a portion of your
money back.

The whole point of an SLA is to incentivize us to never go down. In our case,
we know that if we ever go down, we will lose our customers; that's incentive
enough :). Having said that, we may still add an SLA guarantee as part of a
larger "enterprise" pricing plan.

We definitely plan on adding plugins for all the popular monitoring systems.
We've also released an integration API to allow PagerDuty to integrate with
any system that can make an HTTP API call (or call a command-line script that
can do this).

I'm pretty sure Zabbix will work with PagerDuty right now, via the integration
API. We'd love to work with you to set this up. Please send me an email at
alex@pagerduty.com.

~~~
alexsolo
I'd love to hear what some of you think about SLAs. Is it worth implementing
one?

~~~
btilly
There are lies, damned lies, and SLAs. Personally I only find an SLA useful if
it is worthwhile. Most of the SLAs out there aren't. And for good reason. You
should probably offer one, but like a smart company shouldn't make the burden
too bad.

Suppose someone doesn't respond to a page. Is it because they were too far
asleep to hear the paging device? Because the paging device didn't work?
Because some other problem kept them from working on the page remotely?
Because their carrier blocked the page? Because you broke down? Because the
problems in their system kept them from sending you the information in the
first place?

There are a lot of points of failure. And your service is not one of the more
likely ones to break. Furthermore if there is a dispute, whose records win?
They didn't respond to a page, your records say they never sent the page. They
blame you, how do you resolve that?

Therefore I'd suggest offering an SLA, but make it be something like, "If you
missed a page and are convinced that it was our fault, we'll refund the last X
months." From your point of view it is a no questions asked refund policy,
that carries with it the consequence that that person is not allowed to sign
up for your service. (Unless, of course, you're convinced it was your fault
they didn't receive their page.) But whatever you do, be careful not to accept
potential liability for something that likely was their problem.

I would also suggest that you share best practices. For instance an important
one is that companies need to provide a well-defined escalation path.
Recognize that humans fail (whether because of not waking up, being in the
process of driving, etc) and so people are unreliable components that need a
fall-back mechanism. The act of educating your clients about things like this
will help them avoid problems that could cause them in an imperfect world (ie
the one we live in) to become unhappy with you.

~~~
lsc
SLAs with exceptions based on "fault" are meaningless. Either you guarantee
you will keep your shit working, or you don't.

(Either way is fine, really... but arguing over "fault" is not a productive
activity.)

~~~
mseebach
It's not meaningless? If "working" is dependent on several pieces working, and
only some of them is under your control, you can be in a state of "not
working" without being at fault.

I've had a server go down for a large group of users because of a
malconfigured routing table _between_ them and the server. If we'd had an
expensive SLA, there would have been significant "what the heck is it we're
paying for, then?" discontent.

~~~
lsc
right. my point is that if you are selling the customer a service, and you say
'I will get you network connectivity' and then, for reasons outside your
control, you don't get them network connectivity, it doesn't make much
difference to the customer if the network is broken because you did something
dumb or if the network is broken you are getting DDos'd from china. the point
is that the network is broken.

last month I paid out almost fourteen grand in SLA credits because I didn't
stop a DDos within my allowed 0.5% downtime. Was it my fault I got DDos'd? no.
However, i was the only one in a position to do something about it. (and
really, if I wasn't tired and generally an idiot, we would have been down for
an hour rather than 8.)

You do need clear lines, though. if you need connectivity from point A to
point B, that's easy, I can guarantee that. But defining connectivity to 'the
internet' is harder. there are cases where I've got good connectivity to most
places, but you can't get to some ISP in dallas, because they've hoarked up
the routing table.

Right now, I play that sort of thing by ear. If only one customer is having
the problem, I try to figure out where it is and if I can't figure it out,
it's not that big of a deal to give them a credit. If many customers are
having the problem, well, then I have a problem, and really, it's my job to
figure out where that problem is and to work around it... even if that problem
is a misconfigured router at some other ISP. I mean, really, what is the
customer going to do about that sort of thing?

this is the point of having a SLA; it aligns the interests of the service
provider with the interests of the customer.

------
arnorhs
Nice. Good job guys.

Interesting. They don't include a free/freemium account, only paid ones with a
free trial. I have been wondering about this.

I've always assumed the best business model is to offer a free plan for
everyone that is not limited to time but with fewer features or some other
limit/constraint like number of users, amount of storage etc.

I wonder how the two models compare. Because I know a lot of people simply
will not sign up for anything, even if there is a free trial. People just want
something free they can start using and that they don't run into walls - a la
Google Docs, Gmail, Basecamp free account, etc...

Any thoughts?

~~~
alexsolo
The main reason we haven't offered a perpetually free account is because we're
a bit different than other SaaS companies: hosting isn't our only cost, we
also have to pay for each phone and SMS alert we send.

The other reason is that we see PagerDuty as solving a real "hair on fire"
problem, and we think if you're one of the businesses that needs this, it's
reasonable to pay a certain amount for the service. I'd like to hear your
thoughts on this.

~~~
lsc
Might I suggest a free account that is limited to email alerts? It probably
wouldn't cost you much, and it wouldn't cut into your 'business class'
business... but it'd be a nice way for small timers to get a taste of your
service monitoring their personal stuff (and then maybe recommend it to the
boss)

~~~
btilly
I'd guess that anyone who needs to receive text messages already knows about
the email to sms gateways that their phone carriers provide.

~~~
koenigdavidmj
But you don't really need this service for that. Nagios sends directly to
those (just like any other email address).

Of course, the two-way SMS that lets you wake up the other guys if needed
would break under this.

------
OmarIsmail
Congrats Alex and Andrew!! That makes two UW SE2006 startups covered on TC :)

Though I think you may already have us beat by being YC funded, the jury is
still out on that!

And I agree with the free trial model over freemium. Your service is worth
paying for. Period. The trial is used for determining if your service actually
works as expected. And you don't get network effects the more people that use
your system. So there's really no point for freemium.

Keep up the good work guys! Very exciting!

~~~
agmiklas
Thanks Omar. With Baskar, I think we might also have the distinction of being
the first Comp Eng 2006 startup on TC, too :)

------
mgorsuch
I love PagerDuty, and it has already paid for itself many times over. The fact
that it will phone my house if I miss the text messages has been a big win
over AT&T's lousy coverage in my area.

~~~
agmiklas
Thanks very much! It's always great to hear that PagerDuty is working well for
people.

------
dmor
These guys were the Twilio developer contest winner back in November
[http://contests.twilio.com/2009/11/outbound-notifications-
al...](http://contests.twilio.com/2009/11/outbound-notifications-
alerts-1.html)

congrats Andrew, Alex, Baskar and the rest of the team!

~~~
agmiklas
Thanks Danielle! Twilio plays a big role here at PagerDuty. You guys have been
great. More importantly, on the few occasions where's there's been service
hiccups, we've never had problems getting hold of someone at Twilio. Can't say
the same about other providers we've tried.

~~~
dmor
You're welcome, and I'm happy to hear you've been satisfied with Twilio's
service - we're really passionate about doing it right and we'll always be
here to help

Btw, I've left you a "gift" in your Twilio account... whenever you happen to
check your account balance :)

~~~
bpuvanathasan
Thanks Danielle!

------
leelin
What a great name with a nice inside-joke component! The first question that
went through my mind was "did they used to work at Amazon?" Was PagerDuty.com
available or did you guys have to buy it?

It's a similar level of cleverness as Lobby7, if anyone remembers that one...

~~~
agmiklas
Amazingly enough, pagerduty.com was available. We were totally blown away by
that -- thought for sure it would've been taken.

------
epall
We just picked up a couple of real, physical pagers because we're unwilling to
trust SMS delivery.

~~~
agmiklas
This is actually one of the big reasons we built PagerDuty. SMS is not as
reliable as people think -- messages get dropped or delayed by hours all the
time.

We've found the automated phone calls to be a much more reliable way of
getting the alert out. We can tell right away that the message has been
received by asking the listener to press a button on their phone, and repeat
or escalate as needed if we don't hear the tone.

------
paul9290
Nice! DO you have a link to the API details?

~~~
alexsolo
Yep, right here: <http://www.pagerduty.com/docs/api-integration-guide>

------
superk
But can you send to my pager?

------
rgrieselhuber
Congrats! PagerDuty rocks.

------
johns
Congrats guys!

------
fedster
congrats guys - awesome work!

