Hacker News new | past | comments | ask | show | jobs | submit login
Rate my startup: PagerDuty.com
83 points by alexsolo on Aug 12, 2009 | hide | past | favorite | 80 comments
If you've ever done pager/on-call duty, you're probably familiar with a tool like PagerDuty. PagerDuty collects email alerts from your monitoring tools and sends out automated phone calls and SMS messages to the person currently on-call. The app supports many of the usual amenities in an alerting system, such as retry of unanswered alerts, on-call rotations, and automatic escalation of unanswered alarms.

Many large tech companies like Google and Amazon have sophisticated in-house on-call management and alerting systems. We have tried to build something similar for small and medium-sized businesses running critical systems.

One of the big challenges in building PagerDuty was making it simple and intuitive to use. If you find the setup process (or any other part of the system) confusing please let us know.

http://www.pagerduty.com




Looks very nice -- as a feature suggestion, a floating number that automatically routes to the currently on-call staff would be a great addition.

As an aside, I find the best way to avoid regular failures and decrease the necessity for a large operations staff is to put the individuals responsible for building the system on-call for when it fails. Your operations staff is woken up when a server crashes or a hard drive fails, and your engineers get woken up when their code crashes in the middle of the night.

If you don't do this, the costs of writing poor production code have to be levied across departments by management, rather than avoiding externalities entirely and letting engineers and operations deal with the direct impact of their implementation choices.

Of course, this is ultimately a wash if you don't also institute development methodologies to help reduce the number of production-impacting bugs, rather than simply relying on engineer's reactive fixes to one-off issues.


Hmm... what do you mean by "floating number"? What we do right now is route all alarms to the engineer currently on-call. Each person can set up their own notification sequence so they get alerted using any combination of phone calls, SMSes, and emails.


A floating phone number that can be handed to, say, 24-hour support staff, that will automatically direct incoming calls to the on-duty engineer.

The value is that when support staff has real phone numbers available to them, they tend to dial historically responsive individuals directly in order to get a problem resolved, thus creating a negative feedback loop -- if you ignore notifications and phone calls, you get called less by support staff in the future.

Having a floating number -- especially if we could get statistics on who answers and who always ignores them, and if the number could "call up the chain" automatically when nobody answers, would be a useful tool to solve this issue.

Of course, the preference is that human staff doesn't need to call anyone, but it still happens.


Hi, this is Andrew, one of the co-founders of PagerDuty.

That's an interesting idea. We've actually thought a bit about adding phone-based triggering to PagerDuty (via a 1-800 number + access code). The idea was to make PagerDuty useful to non-IT businesses like plumbers that also have the concept of out-of-hours on-call duty. From your comment, though, it sounds like this kind of feature would be pretty useful even in the IT world.


If my code crashes in the night, save the core files, logs, etc and send me a mail; unless of course the system is not restartable - then you have bigger issues. Calling an engineer in the night is going to prompt what? Sane code changes? QA'd code?


It forces engineering to internalize the cost of their errors, rather than externalizing those costs by pushing them to our customers and human support/operations staff.

This encourages the following:

- Closer correlation between business robustness requirements and software implementation.

- Adoption of more robust design or methodologies when required by the business. If all that is required is an automated restart, then automatically restart the software. If the same class of bugs causes regular failure, adopt a strategy for avoiding that class of bugs.

- A realistic platform for negotiating external support. If engineering staff is unable to produce software sufficiently robust as to support the business requirements, then we must make a business decision as to decide whether maintaining additional operations staff is cheaper -- in the short and long-term -- than correcting the engineering issues.

In my experience, software that fails regularly enough to cost significant engineering resources in responding to those failures is generally broken software. The goal is to not let software get to that point, and to correct it quickly if it does.

It's very easy for your Operations budget to unnecessarily balloon under the load of supporting failure-prone software; engineering has every incentive to externalize the costs of their implementation decisions, while operations has every incentive to increase their headcount and budget by supporting those failure-prone systems.


This is a great tool. I really like the idea of parsing the email address for what type of alarm to use. That makes this straight-forward while at the same time making it incredibly easy to integrate with any existing alarming system.

The reply via SMS feature is excellent and something that I always want in any monitoring system.

I'm curious about the security around the email alerts, can anyone send to the specific trigger-alarm@mysite.pagerduty.com or can you add at least a 'FROM:' check?

You probably have this on your road-map, but if a ticketing/worklog system could be integrated with this, you could add an incredible amount of value for folks working on the problem in real time.


This is a good point... the first step is getting ahold of the right person, but after that there will probably need to be some sort of dialog or coordination as the on-call person tries to gather more data about the issue, reproduce the problem, tests the fix, etc. Providing that sort of system would certainly be very useful for your customers.

Overall great idea with PagerDuty though, especially if one's business relies (survives) on their website's/system's uptime. Reducing MTBF is often very hard, especially after a certain point, and reducing MTTR is therefore very important for improving availability.


Integration with a ticketing system is a great idea. We are thinking of adding support for Lighthouse, so you can coordinate, document and work with other people on resolving triggered alarms.


We don't have a FROM check at the moment. I'm not sure how helpful this would be, since it can be arbitrarily spoofed by an attacker.

The email addresses for alarms are editable; if you get spam to one of these addresses, you can change the address. We were also thinking of adding an option to obscure the email address by adding a uid to the end. An example would be trigger-my-alarm-5j3rt@acme.pagerduty.com.

We also offer regex-based filters on both the subject and the body, so you can configure an alarm to only trigger if a certain keyword appears in the message.


I like the uid idea or even having a unique passphrase in the subject of the email.

Granted, I'm thinking of monitoring systems on a much larger scale where even a single instance of spamming can keep the entire dev team up at night. I do not see this being a problem for the smaller targeted audience that you are going after that will initially only set up a handful of alarms.


i would not use a uid tied to the users account as a security precaution. never lend out more info than you need to. i would make it an user defined string.


I think the idea is solid, but since I'm not in your target demographic, the only useful criticism I can offer is:

Your front page is very busy. There's many different kinds of text and pictures in so many places, I don't know where to look. Does the "feedback" link need to be in such an unusual place? I understand that you want to convince potential customers that your product is a good one, but if you throw too much information at people, they'll ignore everything you have to say.


Thanks for the feedback scott_s. I thought the left middle side of the page was a pretty standard position for the feedback link; I've seen a lot of other sites that do this.

I do like the idea of making the design more minimal. I guess we are somewhat worried about the possibility of not explaining well enough what our product does.


I don't like floating elements like that; it's distracting. When I first load the page, it's just lined up with the 'H', 'w' and the sign-up graphic. That feels out of place. Personally, I don't like floating elements like that because it violates what a website is, to me. I want to navigate through information, and I bristle when information is forced on me like that.

Consider the amount of "special" things you have:

- Floating feedback link on the left

- "Sign Up" link/tab is highlighted red. (This also means I have to use some small cognitive effort to realize "Home" is black because I'm there, "Sign Up" is highlighted, and the rest are normal links.)

- "call you" in the blue banner is both bold and italic

- Lead-in text is bold

- "calls you" in lead paragraph is bold, italic and green

- green check marks for the features list

- Play button on your UI graphic

There's nothing inherently wrong with any one of these things, but they're all design elements that say "I'm important! Look at me!" But when I see seven competing things evenly spread out over the page saying "Look at me!" my first reaction is to give up and look at nothing.

I think it's possible to include the same amount of information, just presented in a clearer way. Personally, I think the star of your show are your UI pictures.

Of course, keep in mind I am, like you, a hacker, not a designer. But I'm taking the time to explain what I see because I think you've come up with a neat idea for what sounds like an untapped market, and I'd hate to see you run into problems because of simple presentation issues.


I agree with the positioning of the feedback tab. It's not so much that it is "out of place" but rather its just in a bad place at the moment. It's right between "stop sleeping through outages" and the big green sign up button. This is your headliner content, there should not be some flat rogue turned-sideways feedback tag conspicuously thrown over it.

As for the other points. Well this is clearly a 37 signals approach. If nothing else, it is good for SEO. I think most users will click on the video first anyway. So all the content acts to complement the video, which is a good thing.


I completely disagree, I think it all works really well together, and just the right things are popping out.


This is awesome! I wish I had something like this back in 2003-2005 when I was one of four people on shared pager duty. Nowadays I just write perfect code ;) (and we have a dedicated and awesome ops team).

A couple suggestions...looking at your prelim pricing page, the prices look good, but I can foresee a problem with the notification system that accidentally sends way too many alerts. How will you accommodate that? It would be nice to have a grace period or some way of saying, ok we realize it was a mistake, we'll let this one slide but the next time you will be bumped up automagically to the appropriate pricing plan.

Also how easy is it to integrate with existing notification tools like Nagios or Cacti? Or is it just tied to email notificatons from those systems? That would be a downside if it's the latter. I've worked at orgs with truly crappy email systems that are down more often than not. Sending an email blast for a production system outage is likely to get it canned by the sysadmins.

Finally how does the phone call alerting work? Is it text-to-voice? Or pre-recorded messages? Is it customizable?


We tried to design PagerDuty to minimize the possibility of message storms. We did this by decoupling the sending of alerts from the reception of triggering emails. Specifically, we don't generate new phone calls or SMSes if an already triggered alarm receives a new triggering email.

As for the pricing plans, we don't plan to bump you to the next plan if you exceed the limits. We have overage prices for each type of message (phone/SMS). So, going slightly over your quota isn't going to cause a really big bump in your fee.

Currently, PagerDuty only integrates with monitoring tools using email. We do have several customers sending their Nagios alerts to us. We are also planning to build a Nagios plugin to better integrate Nagios with our system.

We haven't actually found the email integration to be a big problem so far. What some of our customers do is set up some external network monitoring tool (i.e. Pingdom) to make sure their mail servers are up and running. Those services are in turn hooked up to a PagerDuty alarm so that we can notify the right person if a site's mail server or external connectivity fails.

Phone call alerting is text-to-voice. It tells you which alarm went off, and also reads you the subject of the error. What kind of customization are you thinking of?


I understand the desire to stop message storms, but in my 5 minute use case, I hit that and perceived it as a limitation and it makes it probably not usable for me, as I already have a good monitoring/notification/paging system that I was trying yours out as an additional notification vector. (That might place me squarely out of your target market, in which case you should ignore this feedback. :) )

Let's say I already have an existing monitoring service that sends emails on network events. I might decide to hook in your service and send my network faults there.

I get "port XX on switch YY link state down" and that gets routed through the system. One minute later I get "office of the CEO video conference network down" but pagerduty never sends that to my mobile device because I'm busy looking at the one port down alert.

Yes, I realize there's a technical solution to create multiple pagerduty trigger email addresses, but at a minimum, I'd encourage you to be more clear about that feature/limitation.

Overall comments: several UI elements were "not pretty" looking on IE7, and some of the call to action graphics "Create your first alert now" were bright yellow and rectangular yet not clickable. No deal-breakers, and I was certainly able to quickly get setup and trial alerts coming.


Hi sokoloff, thanks for the feedback.

We actually support having the same email address for multiple alarms in the system. We also have regex trigger rules for each alarm, which are based on the subject and/or body of the trigger email. This means that you can set up a single email address, and trigger one alarm if "port XX on switch YY" is down, and trigger a different alarm if the "CEO video conference network" is down.

It might be easier to discuss your requirements over email. My address is alex[at]pagerduty[dot]com.


I was disappointed when I found that you can only trigger an alarm via email from a separate monitoring system. I was expecting that there'd at least be a way to trigger an alarm if an email wasn't sent on schedule.

I halfway expected there to be a portion of Nagios' features built in, at least the ones that make sense for services open on the internet: like ping, socket, and HTTP availability.

I guess your intended focus is to be purely a distribution mechanism for existing alerts -- which isn't very useful for just one person -- especially since I'd have to set up separate offsite monitoring systems to generate the alerts I care most about.


I looked into this and really like the idea of the service, but found there was one shortcoming that I couldn't get around. Perhaps I missed something, but I needed a much shorter escalation window than 15 minutes. If my site is down, I cannot wait for 15 minutes for the next person to be notified if the original person doesn't respond. It needs to be within 3-5 minutes, maximum. Perhaps this was configurable and I just missed it.

I really like this service and should the escalation change I would sign up in a heartbeat, it's exactly what I was looking for.

We left a comment on UserVoice, but I thought it couldn't hurt discuss here.


Yes, we do support custom escalation times. You can set the amount of time that the Primary on-call has to respond to an alert (after which it is escalated to the secondary). Likewise, you can set the time that the Secondary on-call has to respond to an alert, before it is escalated to the tertiary.

This is configurable under the Settings tab.


Clickable link: http://www.pagerduty.com


Meta HN discussion: this problem often crops up when people are doing "rate my startup" type posts. They want to post an intro blurb for HN people, so they can't link to the website. Does this smell of a feature waiting to be added to HN, or is it people using it "wrong"?


I agree that this should be discussed. One problem though is that it's not a problem in a normal web browser with a mouse, because it's really easy with a script or just a couple mouse clicks. It's just a problem with mobile browsers like the iPhone's.


I especially liked the idea of taking action from the phone. And the site looks nice and clean. Do you know of any competitors in that field?


AlarmPoint Systems is a competitor, but they sell enterprise software, with the usual enterprise tagline of "call us for pricing and a demo" and 5 digit+ price tag. Another one is ReliableResponse. This is a smaller company, but also enterprise software.

We have tried to offer a SaaS solution aimed at small and medium businesses.


I'm the owner of Reliable Response. As alexsolo said, we're an enterprise product, meaning we install on the customer's servers, behind the firewall. Pagerduty looks like a great product for smaller teams or companies which prefer a SaaS solution. Our price point is around $10k, plus yearly maintenance and support fees, so it's definitely geared towards bigger teams.

http://www.reliableresponse.net


But the logo sucks. Pretty much a lot. Spoils the impression.


If the worst comment you receive about the service is the logo sucks - you are on to something!


Logo I did not even see one nor does it matter.

In five to 10 seconds the copy & design conveyed the site's purpose. I got it and thought cool - there is a need for this utility!

good luck!


I second that, the logo looks like it's ancient. It looks completely out of place in amidst the web 2.0 look and feel of the rest of the page.


I don't think it looks ancient. I immediately understood they purposely made it look like text on a pager.


It looks a bit retro - but strikes me as being easily recognizable and suits the topic quite well.

Perhaps they should A/B test the logo - try http://www.markiter.com (shameless plug)


I disagree, I like the logo.


Fair enough. We were trying to replicate the text you would see on an old-school pager. We'll probably update the logo in the future.


This is a no-nonsense product that people actually need in the field. I think small companies would find this indispensible, the preliminary pricing seems really attractive compared to competitors. And once small companies become big, I don't see any reason to switch to more expensive products either! This one has enough features even for the most critical applications.


A couple of suggestions:

1. Have the lightbox title text be more readable. Right now it is white on whatever is on the page.

2. Have the call to action at the bottom of the page favor the sign up process. I would reuse the green button from the top of the page and leave the "learn more" link as text.

3. The email based integration feature is fairly meaningless on its own. It is redundant if it is covered 3 items down with "Alerts via phone call, SMS, and email" I would personally cull out the overlap and have just a few key features on the right. Really only put the stellar features that you offer.

4. To keep the logo more web 2.0 like the rest of your interface I would have the text framed in what looks like a pager but make the pager look 2.0ish.

Overall it looks as if this might be successful. I am not really in the market for it but I could see some smaller businesses jumping on board.


Great suggestions, thanks for the feedback.


Your service looks awesome, especially the "we'll keep nagging you until you f...ing respond" aspect. I've looked at a lot of similar services in the last few months (you have no idea how relevant your startup is to me at this time), and I definitely am interested.

One thing I'm wondering about is why you have no international calls. I would love to pay (a lot) more money if you could support international calls. Without that, the service is only half as useful, and I will sleep only half as good during the night. :-)


Yep, thanks for the feedback. We will definitely add international calling soon.


This looks very useful in keeping your monitored issues from getting dropped/missed b/c someone forgot to change the oncall phone number for that week,etc.


I was thinking about this the other day -- something like a Google Voice would work well for a support phone type of operation. I haven't done support in a long while, but last time I did, we would physically hand off a phone, which made for interesting times.

With either the newly oncall or the previously oncall being able to update the number, that handoff now becomes virtual.


There are a few problems with the phone handoff idea, though. One of the problems is there's no safety net: if the guy who has the phone doesn't pick up for some reason, the alert can't auto-escalate. That might be fixable with something like Google Voice, but I don't think they give you a way to manually escalate an alert if you know you can't handle it in time. In both cases, you probably have to manually roll over the number when someone new comes on-call, instead of having the system do that automatically from an on-call schedule.

We've actually been anxiously waiting to try out Google Voice up here. Unfortunately, they haven't yet extended their coverage to include Canada.


Sorry -- I didn't mean to imply that Google Voice was a competitor, rather to say that a Google Voice-like service (with many tweaks) would work perfectly for this goal.


Design is way too similar to the 37signals homepage...that was my initial impression. Will dig deeper into the site now.


I feel like their homepage design gets copied pretty often. Perhaps they're on to something?


If you're curious, for our marketing site, we took inspiration from a couple of different sites, including 37signals, chartbeat.com, goodbarry.com, freshbooks.com and a few others I can't remember off the top of my head. I bet a lot of the aforementioned sites took some inspiration from 37signals as well :).

For the application site, we started with the open-source Web App Theme (http://github.com/pilu/web-app-theme/tree/master) and changed it to suit our needs.


actually, i take that back - it did immediately remind me of a previous homepage design of theirs...not the current one though


It looks very similar to the Basecamp and Campfire designs (ex: http://campfirenow.com/). However, both of those designs are fantastic, and I don't see too much of a problem with using them as design inspiration for a non-competitive company.


It looks like peashootapp.com which looked like basecamphq.com...

http://peashootapp.com/


wow, it does not look like peashoot. It is a near-exact clone of the site! I'll leave my personal opinion of the matter at the door..just saying!


I thought the exact same thing actually.


This looks really neat but it's not a runner for us as it's a 3rd party hosted solution.

Do you have any plans for a packaged version which companies can host themselves?


Hello ook. Just shoot me an email at alex [at] pagerduty.com, and we can work something out.


Good idea, great start. I needed something like this in my own app and was considering building it myself. I'd prefer to use some other app to deliver these features tho. You should seriously look at a few of these points. I am breaking my comment into 2 logically separate comments (separation of concerns, too much of a coder :D)


Service features

- You are assuming direct consumers of your service. Think of indirect consumers too. Other startups who can piggyback on your offering to add value to their own product.

- Give an api where I can tell you the email message, sms message, various escalation options, option to repeat email/sms/phone after some delay if nobody responded. Also consider an SMTP api interface too. Not much of an effort for you if you already have the http api present.

- The api should also accept email addresses, mobile numbers and phone numbers. I might be signing up paying customers and using your app to send them alerts. I should be able to do this without having to register each of these email/phone numbers to you. Think of your pricing in this scenario.

- People mentioned ticketing system integration. Great idea. Extend it further. Also have an option where u will send the response back to the originating application.


Thanks for the feedback pradeepgatram. We are planning to add the ability to integrate PagerDuty with external products (most likely products in the monitoring space). Please send me an email to alex[at]pagerduty[dot]com if you'd like to know more.


Geographical expansion to India

- Lots of tech support happens from India. You are too expensive for this market. I would find it hard to sell your service as an addon to my app.

- Telecom is quite cheap out here. Figure out some alliances to bring down the cost (e.g. local SMS gateway, most of them use http api, but there are some SMTP based ones too).

- Additional SMS @ $0.15 translates to ~INR 7. SMS in my mobile phone plan would cost me ~10 times less. Not a correct comparison, but thats the language buyers use (enterprise buyers use it a lot) :)


Just signed up our small startup for this. We had a server on Amazon go down last week really hard - so hard that monit never said anything and we only knew because an external server does pinging on it every half hour or so. Will give this a try for more prompt phone calls!


Don't show incremental user numbers in URLs for commercial services. It's nice to know you were one of the first to sign up for a community thing, but it's offputting when you're paying :/

The yellow boxes under "Follow these steps to get started..." look a little godaddy-ish


Makes sense, we will fix the user number thing.

As for the getting started steps looking godaddy-ish, is that a compliment? :P


Set this up with some guys in my company and we're giving it a try today. So far, after going through the interface quite a bit, I decided that it's kind of confusion to reason about who is going to be called when. For example, you have timeouts for escalation and timeouts for consecutive notifications - at what point in your notification chain for a specific user does the escalation happen? Is it only after all types of contact have been exhausted? Or will secondary on-call be notified after the admin-settable X minutes (default 30) no matter what primary does?


Your "feedback" button on the left of the main page overlaps the "Stop sleeping through outages" text when my browser window is not full-screen, this could be moved a little to look more professional.


Have you considered having a plan where you actually pay "per alarm"? It would be appealing and you can still make easy money. I have in mind $10 per alarm or so.


A thousand times yes! Allow free signup, and send me a $3 SMS when an alarm goes off, avoiding any need for credit card billing.

Another option would be to have a purely prepaid plan, where your account gets debited based on alarms/users per-period, and per-incident. I'm definitely going to use this service while it's free, but I don't think I'd pay $120/year for it unless I had a web startup.


I'm not sure how that would work. Would you be charged $10 each time an alarm is triggered and you get SMSed or phoned?


Awesome...I was just wishing for something like this yesterday...will definitely be giving it a shot.


Looks like you're getting great feedback here, congrats. Without giving away proprietary info, is it possible to give some technical details such as what backend technologies you use, and who you're using to send SMS & phone calls?


It seems easy to use but also flexible enough. What are your plans for pricing?


The pricing page is here: http://www.pagerduty.com/plans

It is linked off of the FAQ, from the question "How much are you planning to charge for PagerDuty?". We didn't make any links to the pricing page off the main page because we want as many people to try the product out while we are in beta.


So what software manages the pagerduty for pagerduty? :-)


I've been thinking of writing a blog post about this for a while... probably with the title "Who Watches the Watchers" :).

We use Wormly to monitor our site, email and DNS. We also have an exception reporting system which alerts us about any 500 errors in the site. We also alert if phone call or SMS messages sit in our event queues for more than 3 minutes after the scheduled delivery time.

We also have redundant data centers and rapid rollover to the backup systems in the event of a data center outage.

We don't use PagerDuty to do the alerting and schedule the on-call though. We alert everybody in our team via SMS and phone if any of the aforementioned alerts go off.


Do your phonecalls and SMS'es work outside the US?


We support two-way (i.e. we send you a message on an alarm, and you send back a message telling us to suppress, resolve, or escalate the incident) SMSing to most of the world. Unfortunately, we can only do phone calls to the US and Canada right now. We're looking at adding international phone calls in the near future, though.


SMS works internationally. Phone calls only work for US/Canada currently. We plan to expand that internationally as well.


Nice idea! Good luck.

I would make "phone calls, SMSes, and emails" much more prominent though. It took me too long to find how I would actually be alerted. There is a lot of fuzzy text on the site that makes it hard to scan.


love the idea, great work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: