
Ask HN: What is it like to work on pager duty? - cesarbs
I might be switching teams at the company where I work at. The new team seems quite interesting to be part of, but they have pager duty (they cycle and each developer is on pager duty for a week). I was hoping to get some input from folks here who have worked on that sort of team, to get an idea of what it&#x27;s like to work in a team like this. Does it impact overall health too much? Would you say it&#x27;s an interesting experience to go through?
======
akg_67
I did pager duty (on-call) for 8 years in my last job as part of professional
services team. I actually enjoyed it as I thrive in stress situations. We had
a good team and boss that also helped.

Some of the suggestions based on my experience:

1\. Make sure there are enough team members are in on-call rotation so that
you get your 1 week on-call every 6 to 8 weeks or more. If on-call is too
frequent, it will be disruptive to your normal life and you and your family
will resent the job.

2\. If your on-call only requires remote phone/access support, make sure
company picks the tab for your phone and mobile internet. If, like mine, on-
call requires onsite visit, company is properly compensating for mileage and
auto-expense. Also get company to pay for on-call either in cash or with time-
off. You can also work these out informally within your team and boss. My
company paid for my cell service, home internet, and provided auto allowance.

3\. You should have a place in your house where you can quickly go, talk, and
work in the middle of the night without disturbing rest of the family.

4\. Make sure your team and boss are okay with you coming to work late or
skipping days coming to office when you are on-call and receive calls in the
middle of night. My worse on-calls used to be woken up between 2:00 - 4:00 AM
when I was typically in deep sleep.

5\. Avoid scheduling anything important during the on-call week. And, let
everyone know that you may have drop everything else if you receive a call.

6\. During the on-call week relax, don't take too much stress, don't do too
much of regular work, don't force yourself to have a normal day-and-night, go
with the flow.

7\. Avoid going to places like movie theater where you can't take phone call
and quickly get out of.

8\. Don't get anxious during on-call week. I had co-workers who used to have
panic attack during the on-call week.

~~~
jmsduran
It seems, especially for major corporations, that on-call/pager duty is
quickly becoming the norm for software development teams. I do agree that
pager duty is a symptom of a fundamental flaw within the system/architecture.
I think it would be in a company's best interest to devote time in improving
the reliability and stability of their infrastructure, instead of relying on
the band-aid approach that pager duty seems to be.

Regarding #8 though, when you are pressured to resolve a complex issue within
a short time window, it can absolutely induce a sense of panic for those who
do not handle stress well. In my opinion, I believe the remedy for this would
be to have two individuals designated as on-call at a time, assuming the team
is large enough.

~~~
devicenull
> It seems, especially for major corporations, that on-call/pager duty is
> quickly becoming the norm for software development teams. I do agree that
> pager duty is a symptom of a fundamental flaw within the
> system/architecture. I think it would be in a company's best interest to
> devote time in improving the reliability and stability of their
> infrastructure, instead of relying on the band-aid approach that pager duty
> seems to be.

I can't see there ever being a time where there is no on-call requirement. You
always need someone standing by in case of some terrible disaster that cannot
be handled automatically. Better to have this a formal responsibility that
never gets used, then to not have it and end up with an extended downtime
because you can't contact anyone.

That being said, if you're getting paged continuously during on-call, then
there's a bigger problem that needs to be resolved.

~~~
_delirium
> You always need someone standing by in case of some terrible disaster that
> cannot be handled automatically.

If it's a _really_ terrible disaster, a once-a-decade kind of thing where
everything goes haywire and you need as many staff as possible to get online
ASAP, then yes. But aren't we talking more about the kinds of "disasters" that
happen once a month or so, and can be handled by a few staff (not waking up
the whole team). To me that sounds more like just staffing for normal
operations.

At large engineering companies this is typically handled via literally having
someone standing by, i.e. formally on duty, rather than having off-duty
employees be on pager duty. There'll be at least a bare-bones staff on the
after-hours shift (probably not in all offices, but in some kind of 24/7
operations center), enough of a staff that reasonably foreseeable things can
be handled. Of course there are some pros and cons to that from an employee
perspective. On the one hand the night shift isn't that pleasant, but on the
other hand your responsibilities are at least formally limited to 40 hours/wk;
if you're on night shift one week, you don't come in during the day, or carry
a pager during the day.

~~~
kgermino
> and can be handled by a few staff (not waking up the whole team).

That's what this is though. With every setup I've seen there's a rotation of
primary and secondary pagers for each team. When something breaks the primary
is paged, if they don't answer within a few minutes the secondary is paged. If
they need outside help they can page an individual person by name or just a
team. e.g. I need help from a DBA, I page the DBA team and the primary is
paged.

If you have 4-5 incidents a month this gives you a team available to handle
any overnight issues without having to hire a bunch of people to twiddle their
thumbs 90% of the time.

------
taco_emoji
It heavily depends on the quality of management. For a system that needs 24/7
uptime, off-hours support issues are inevitable and it's reasonable for a
company to have the people with the best ability to troubleshoot (developers)
handle that stuff when it comes up.

HOWEVER: Is management dedicated to making sure those issues are rare? Namely:

1) Do they give you the time and leeway to fix technical debt that causes
these things to pop up?

2) Are there reliable code review, continuous integration, and QA processes
that ensure that fewer bugs make it to production in the first place?

3) Is it easy to roll-back a deployment at 2am on a Saturday?

4) Is there a well-maintained schedule of IT and development changes, with
impact assessments, so that people don't page you during a downtime they
should've known about? And so that, after a failure, you can view historical
data and determine the causes of a failure and effectively develop a plan for
mitigating it in the future?

5) Can YOU page the DBAs at 2am on a Saturday when you need their help? Are
they going to be rude when they call you back, or are they going to recognize
that the health of the systems is their job, too?

6) Do devs willingly, openly own up to the bugs in their code, in front of
their bosses, without fear of serious reprimand? Does the company recognize
that mistakes are inevitable and that process and communication are better
than blame-finding for preventing failures?

The answers to all of these questions (and more) will, directly or indirectly,
indicate the frequency and overall stress of carrying a pager for a given
company. (They're good questions regardless of pager duty, too.)

~~~
yanowitz
I agree with these points.

I'm a big fan of developers being on call for their application. It puts the
pain where it belongs--with those building the systems (modulo lower-level
errors--such as power failures or network outages--those should go to the
appropriate place).

However, that pain should only rest with the development team if they also
have the freedom and will to spend time dealing with it. They will have spend
time (either a constant tax, or more likely, with occasional sprints) to
reduce operational pain. They are in the best position to reason about the
tradeoffs and pragmatically reason about priorities.

In my experience, this produces the highest code quality and the highest team
morale. I also like the rule -- if you're paged during the night, sleep in the
next day.

~~~
logicalmind
I only agree with this if the developers also get to choose the deadlines for
the app/features they build. All too often, the higher-ups want a feature but
don't want to take the development time necessary to build it properly. And
shortcuts are taken all over the place to get the feature done in time. The
higher-ups don't care about doing things correctly and don't have any pain
when things go wrong. Which makes it more likely to push this sort of
behavior.

I think developers should build code that fails in predictable ways with
useful error messages that a support team can use to solve problems. If the
support team cannot fix the problem with the information provided, then a
developer should need to get involved. This way, developers only feel pain if
the code they write fails in ways support cannot handle.

~~~
yanowitz
I think we should automate recovery for error conditions where possible and
change business processes to be automatable where not. If neither can be done
for some pressing reason, then the failure condition should be defined as an
expected condition that needs dedicated staff to recover. But that cost should
be surfaced and tracked and the first and second order approaches should be
automation above all.

Of course, teams need the authority to solve the pain if they also have the
responsibility for it.

------
breckinloggins
Interesting? Yes. It's probably a good experience to have at least once; just
have an exit strategy in place going into it, even if that exit strategy is
"quit".

In my experience it wasn't really the actual notifications and weird work
hours that was the problem. The problem was that I was officially the end of
the "it's someone else's problem" chain. It's a funny thing about moral
hazards and shit rolling downhill: there's always someone at the bottom. If
you're on pager duty, you're at the bottom.

So I liked feeling trusted with an important task and I liked ensuring that
other people could sleep. But the pager came to represent every wrong thing
with everything in the world. I stared at it in revulsion by the end of
things. (Yes, I had an actual pager to stare at.)

That's just my personality, though. Your mileage will vary.

~~~
taco_emoji
> If you're on pager duty, you're at the bottom.

This depends on the company. At mine, I (as a dev) might get paged when
there's an issue with my website, but I can forward the call on to, say, the
web servers team (meaning they get paged) if it's ultimately to do with their
config.

~~~
chc
What good does that do if the web servers team aren't on call? I thought the
point was to have someone on call to deal with things.

~~~
taco_emoji
They _are_ on call. It's a big company with multiple teams, each team has
someone on call.

------
roeme
Usually, there are practically no downsides to it, unless there is a
fundamental problem in your $ORG.

1\. First of all, it will get you connected to the users which depend on your
$APP/$SYS. Hard. You will get to know their struggle/woes - it's not just some
ticket you can work on at your leisure.

2\. If it's your stuff that causes problems, you will get your shit together
and make sure that it works, code defensively, and test thoroughly - whatever
necessary. After all, you don't want to deprive yourself unnecessarily of
sleep – or others, after the experience.

3\. If it's not your stuff that causes problems, you'll get the oppurtunity to
“yell” at the people responsible for it. And they _must_ act on it - nobody
cares on the why or what, if people have to get up in the middle of the night,
it costs the company¹, and everybody gets upset.

It only impacts your health if you get called up regularly, and no actions are
taken to remove the root causes of it. Or you can't take any.

It's less of a technical problem, but more an organizational one, so – as it
already has been said in here – you should talk to the people of the team, not
HN.

¹) If it doesn't cost them, be wary.

~~~
gaius
The downside is it's usually cheaper and easier to call you than actually fix
root causes. Then it's not on call, it's beck and call. Even if you are paid
double-time for it, the company figures that's a sunk cost so just call him
anytime, for anything.

~~~
arthurjj
I can second this. If you can't fix the underlying reason that you were paged
at 3AM it gets old fast

~~~
gaius
Sometimes there is no reason. Some manager gets up for a pee in the middle of
the night and phones the on-call guy to "check the site is up" or "can you re-
run the report for me" (I'm not even kidding). That company saw the engineers
refuse to do any on-call 'til we got new contracts stipulating on-call was
ONLY for site outages, that said outages had to be verified by a human before
calling (so no hair trigger automated alerting) and in some circumstances we
would be paid quad-time.

But even so things were pretty pathological there. There were those of us who
understood that if the site was down, we weren't making any money, and none of
us would get paid. Then there were others, who understood that there were
people in the first group, and they could just... not bother. And there was
_insufficient_ differentiation between the two come bonus time...

I am coming across as being bitter here, far more so than I actually am, but
the OP deserves to know, it can be bad.

------
protomyth
Soul crushing, but it depends.

I have had good and bad experiences, but it really depends on how bugs are
handled by the organization and do you have to wait on other people during the
night.

I've worked at one place where any bug that triggered a page was unwelcome and
fixed first and quickly. It was considered unacceptable to wake anyone and a
possible problem to staff in the morning.

I've also worked at a place where management did not really seem to care when
people had to be up every night of a pager rotation because of errors in the
system. They wouldn't even prioritize bugs that would let people sleep through
the night. It was hell and affects your attitude about everything. Also, the
DBA team didn't exactly answer their pager in a timely manner which lead to
some very dumb things.

I see the only value in going through pager rotation to learn how code
correctness is important.

Hardware failures are a different story. Only thing I ever get paged about at
my current job is that the power went out or the air conditioner in the server
room broke.

------
ultrasaurus
Disclaimer: I work at [http://www.pagerduty.com](http://www.pagerduty.com) so
feel free to tar and feather accordingly.

Carrying the duty pager is a painful experience for some fraction of
companies, BUT the long term trends are promising. Here's what I'd keep an eye
out for (I've been on call for ~5 years):

* Does being on call affect your other commitments? At PD we scale back the number of predicted story points by ~50 for the devs that are on call.

* Are you empowered to permanently fix the root cause of whatever woke you up? (that's where that 50% of time goes) If you aren't, that's a big red flag. Not all developers take advantage of it, but the ones that do are much happier once they kill the root cause with fire.

* Are you compensated for on call? Among our customers, we have a few that pay $500/week for on call duty, that seems to be the rate at which you can easily find people to swap shifts with.

* Make sure you are off call sometimes. Seriously.

* Who owns the pain report? Someone needs to track how often (and when) people are disturbed and make sure that you are making progress as a team (Github's Ops team is amazingly good at this). If the house is always on fire, you're not a firefighter, you're a person who lives in a flaming house.

* Is it a NOC model, where you can write down common things to try to solve a type of problem (and then you're only paged as an exception) or are you paged for everything? (That's a severe over simplification)

* What is the expected response time? What is the required response time?

* How are you onboarded? The worst time ever to fix a problem is alone, with no context, while things are broken at 2am.

That's off the top of my head; there's good advice in this thread. if you're
still lost though, feel free to reach out to me: dave@pagerduty.com

------
StylusEater
I've held several jobs where I was required to carry a pager: NEVER AGAIN!

I've yet to find a company that doesn't abuse it to save money. Unless I own
the company or have a significant share I no longer agree to help the bottom
line by messing with my health.

I might have had bad experiences compared to most but since you're thinking
about this option, wouldn't it make sense to think about why the company
hasn't just shifted an existing resource to 2nd/3rd shift to help versus
trying to save money by making you do another job on top of your day job?

Good luck with the switch!

------
iblaine
>Does it impact overall health too much? Depends on how often you're paged. If
you're waking up at 4 AM every other day then you can expect life to...not be
fun. If you're rarely paged then it's fine.

Would you say it's an interesting experience to go through? Yes. You will
appreciate good code, frameworks and systems that seldom send pager
notifications.

My personal preference is to rotate weekends and weekdays within a team. That
way someones entire 7 day week isn't impacted by being on call.

------
IgorPartola
It really depends on the team and the setup. Really, it comes down to how
often the system goes down and how catastrophic it is, as well as what the
response is after an outage. I have been in this type of situation before, but
I always had nearly full control over the system, so any failure resulted in
me creating some type of safeguard against future problems. This worked well:
I had very few nights where I had to do anything.

Really, you should ask the people on this new team, not HN.

------
grouma
I currently do pager duty (DRI) for a team within Microsoft. Like most teams
that have this duty, we cycle a developer each week to have the responsibility
to answer any escalations that might occur. The role of this developer is
simply to mitigate the issue. Investigations into the root cause and potential
preventative items are reserved for work hours.

The amount of escalations obviously varies from week to week. Some weeks I
forget that I'm even on call (well that's actually not true as we have to
carry a Lumia 1520 - the thing is a fucking brick) while other weeks are
absolutely painful (waking up every couple hours in the middle of the night).
Thankfully we have enough developers on the team that I'm only on duty every 6
to 7 weeks. What also helps is that my manager has no problem with me sleeping
in and showing up late after a long night of escalations. Overall it isn't too
bad and in fact sometimes can be fun to solve head scratching issues.
Honestly, the worst part of being on call is not being able to make plans that
would involve you being far away from a computer. You can turn this into a
somewhat positive thing though by being productive at home whether it is
cleaning, working on side projects etc.

------
newobj
Pager duty is a natural consequence of devops done right; fix your shit or
feel the pain. So, it's a necessary evil in systems development, IMHO. But I
was on pager duty for the 10 most recent years of my career, so may have a
case Stockholm syndrome.

Everything stated below about disruptions to your personal life are true. When
you're on-call, you should just forget about personal commitments. When
personal commitments unavoidably collide with on-call, you're at the mercy of
kind teammates swapping with you.

A good team will cover you the next day if you had a bad night, but I think
during every bad night, a little part of you has to say "f### this job" and
given enough bad nights, well... I'm a single dad w/ a kiddo, and I can tell
you there is nothing worse in life than reading a kid his bedtime story,
having the pager go off in the middle of it, and having to say "sorry, son,"
as he begins to cry and say "not again, Daddy!" (True, and awful story.) Like
I said, "f### this job."

Anyway, a funny point about devops/fix-your-shit is that there's an effect
here which parallels the Peter principle (getting promoted to your level of
incompetence) in some ways:

If you fix everything that causes you to get paged, then eventually the only
things that page you are things you can't fix (the network, power event, etc).
And while those kinds of wake-ups at least lack the adrenaline/stress
component (just sit there and wait for recovery), they further reinforce the
"f### this job" thoughts - because now you're literally being woken up for no
reason other than to "observe and report."

------
jeletonskelly
Ugh, pager duty... To me it seems like it exists solely because there is a
more fundamental problem in the architecture of the system. Sure, sometimes
things go wrong, but if it happens so often that there needs to be an official
rotation to deal with it, then it means that something is fundamentally
broken. I recently passed up a good job offer because they had pager duty, and
this is for a well known .com.

I think you should ask the developers in the other team how often they get
called during their rotation. You should also ask how much of a priority it is
within their work scope to eliminate the issues that are causing the processes
to fail.

I used to work for a small company that had nightly batch processing jobs on
stock data from that trading day. If any one of those processes failed, then
someone had to log in and fix it or the company wouldn't have a product for
the next day. During the day we had other things to work on, things the
business wanted and there was little importance given to fixing the brittle
(broken) data processing. Management saw it as working software. They weren't
the ones logging in at 3am for two hours to keep the business rolling the next
day. That had a big effect on me. I felt like they didn't care about building
good software, testing the software, and giving the developers peace of mind
that what was in production was well tested and signed off. This is what
ultimately led me to leaving that company and joining one which had solid
processes: development -> staging -> qa -> production. Because of that process
we haven't had a single outage in 3 years. I can go home at night and think
about the software I'm currently building, not worrying if I'm going to get an
email alert late at night because no one cares about fixing our broken
software/processes.

In conclusion, take heed.

~~~
justizin
> "Sure, sometimes things go wrong, but if it happens so often that there
> needs to be an official rotation to deal with it, then it means that
> something is fundamentally broken."

People who say things like this are usually the source of problems for the
people who carry pagers - this is why in DevOps world, developers now join the
club. ;)

The notion that you wouldn't rotate people if you don't alert very often is
not only silly, it's why I left my last two jobs. No "rotation" meant Justizin
is fucking always on-call. :D

~~~
jeletonskelly
There's a big difference having someone who is responsible for the operation
of the infrastructure being on pager duty and the developers being on pager
duty. In most development shops there would be no point in having one random
developer being on pager duty. What if something breaks that he's never
touched? Usually devops is first made aware of the issue, determines the
cause, and, if it's a software issue, should then reach out to the developers
who are responsible for that part of the code. A pager rotation among
developers means there's something fundamentally wrong. If devops is
constantly being paged because of shitty software, they should be the first to
recommend that the QA -> release process be evaluated because it's clearly
broken.

~~~
devicenull
I'm very confused by your reply.. are you trying to say you have three
separate groups? ops, devops, and developers? With devops existing to
communicate between ops and developers?

------
code_duck
I imagine that not being too unlike a small startup where only one or two or a
few people are responsible for making sure the service works.

In my case, I ran a startup for a few years which was quite profitable, but
was set up in such a way that I often had to drop anything else I was doing
and rush to respond to the service being down, at any time of day... In
addition to already having more than enough to do between programming,
sysadmin work and customer service.

Being up adjusting to a change in a data providers JSON or figuring out why
MySQL is cripplingly more slow all of a sudden at 3:30 am isn't a pleasant
experience, especially if you also have to be up again at 9 am.

In our case, there was little to do about it as the service provider we
depended upon for almost everything frequently surprised us with breaking
changes or temporary bugs. That led me to find the entire affair rather
stressful.

So, like everyone else is saying... Depends on how often you're paged, and
whether you have any influence over the root cause of the errors you're being
summoned to fix.

------
mrbonner
My last job has 24/7 on-call rotation once every 5 weeks with a duration of 1
week. That was the most stressful and frustrated path of my career: got paged
several times a week, got paged during wee hours (2, 3AM) by business idiots
from oversea, got paged when someone else's system was down.

I remember my first page was on the day before Thanksgiving around 5PM. And
then the 2nd and 3rd one one came after that around 8PM adn 11PM. And then on
Thanksgiving day I got paged around 9AM when I was driving to the airport to
pick up my brother. I didn't know what to do at that point.

The worst happened when my wife gave birth and I got paged while waiting in
the hospital. She gave birth 2 weeks earlier so it screwed up my on-call
planning. I called my manager and said "you gotta get someone to replace me, I
am at the hospital."

About 6 months later I quit.

------
brudgers
To me it's a red flag. First because there are obviously one or more full time
positions that aren't filled; second because it is sacrificing a week's worth
of workday productivity on the primary task for the short staffing and this
makes problems more likely down the road; third because it creates chaos in
people's personal relationships and family lives, and finally because they
would put a new person lacking long familiarity with the system on pager duty
from day one.

My spouse worked jobs with on call for many years. Though not in tech the
disruption came from being on call not the nature of the work.

I'll add that the reason it gets rotated could be either it pays so well that
it's only fair or it sucks so much it's only fair.

Good luck.

------
quanticle
Like others are saying, the experience varies widely. One thing that I haven't
seen in the thread is a discussion of whether you actually own the code that
will be causing you to get paged. One of the worst work experiences I've ever
had is being on a platform team where we were on the hook not only for the
platform problems themselves, but also for errors in application code that
manifested themselves as platform issues.

Yes, this was a problem with insufficient logging. However, when you have a
platform used internally by dozens of other teams, it's nigh impossible to
ensure that all of those teams are logging and handling errors sufficiently
well to ensure that the platform team gets paged for only platform errors.

------
arca_vorago
Depends on what you are supporting. If the call volume is high due to a badly
designed product, and it's not being redesigned or the fixes aren't incoming
any time soon, it can drive you crazy. If it's just a stop gap for policy
reasons and you don't get many calls, it's not bad at all.

One thing I would say is that while the (my) natural reaction when I get paged
(sms) is to jump right up and get it done... but sometimes depending on what
you are supporting and as long as you use discretion you need to know when
they can wait 15,30,45 mins before you get back to them. This small leeway
will help keep you sane.

------
nadams
I did rotation based IT for awhile and the questions that people ask are good
but the #1 in my book is:

\- Will you get any form of compensation if you have to work after hours?

Where I worked - that was a no. You were paid industry minimum and when you
were on call - you were expected to be alert/on call 24/7 AND come in and do
your normal 8+ hour shift. Now - I don't mean a 1-1 level of compensation but
at least be flexible especially if you were on call.

The calls themselves weren't usually bad - but if you have to come in on a
weekend anything you planned on over the weekend is now shot and that can be
extremely stressful.

------
huherto
The main problem on rotating the pager like that is that people just try to
survive the problems for a week and no one cares enough about finding and
fixing the root causes of the problems.

------
HeyLaughingBoy
It strongly depends on what's expected.

When we did it, the response times and time on the clock were clearly
specified. Return the call/page within one hour between 8AM to 11PM. Later we
scaled it back to 7PM and then finally to support only during normal working
hours.

Whoever got the phone that week also got a small bonus for doing it to reflect
the inconvenience of having to respond to calls on personal time. On average
there was rarely a support call outside working hours so it really wasn't a
big deal.

------
mastermojo
I'm on call for 2 weeks every couple months, one week as secondary and then
the next one as primary. It basically involves carrying my laptop (and a wifi
dongle) with me everywhere I go. Some times a server needs to be power cycled,
but theres never anything crazy. It also completely depends on how stable your
infrastructure is and how much fire-fighting everyone is doing. I thought it
was a good learning experience.

------
skorecky
Pager Duty aside, being on support was really stressful for me and I'd never
want to be in that position again. We were a small team so rotation was weekly
and you would be on support every 4 weeks or so. I couldn't go to the gym
after work without worrying about a call coming in and ruined weekends for me.

But I think it depends on your personality too. It just didn't sit well with
me but it might for you. Just my 2 cents.

------
starb3ard
Have an escalation plan in place. If you're caught short while on-call, it's
good to have a 2nd or 3rd who can take over in emergencies. It also helps to
have someone you can get to cover you for short periods so on-call doesn't
have to stop you doing stuff, e.g, having a colleague take over for an hour
while you go for a swim.

------
jrochkind1
Depends, I think it's different everywhere. If the software is built properly,
you pages should be pretty rare.

If the software and tema are small enough for you to have an affect on it --
this becomes a motivation to make sure things seldom go wrong enough to result
in a page.

------
cesarbs
Hey all, not sure anyone is still active on the thread, but thanks for all the
replies! They certainly gave me a lot of insight and now I have a lot more
things to consider in deciding whether to take this job. Thank you!

------
biot
I've had some interesting experiences. Years ago in addition to our internal
IT infrastructure I had to support a third party platform (effectively an
appliance in our server rack) which would constantly need to be kicked just
due to end users using standard features. That was a nightmare as back then I
was too eager and diligent and strove to be available to deal with things
promptly whether it was responding to a crisis caused by another developer's
code or responding to a failure in the third party platform. I had all the
responsibility and none of the authority to implement any definitive fixes. As
you can imagine, the stress was not enjoyable and burnout was a factor.

Since then I've taken a fairly laissez-faire attitude to being on call. I'll
pick up notifications on an as available, best effort basis. That means if I'm
around and get an alert, I'll do my best to resolve the issue right away.
However, if I'm with friends, my phone will be in my jacket pocket hanging in
a closet somewhere while I might be drinking and I'll see any alerts when I'm
leaving for the night. That could be many, many hours later. I make no effort
to restrict my activities so that I'm always around. And if I leave my phone
on vibrate and don't pick up any alerts while I enjoy a sound night's sleep,
so be it.

If "as available, best effort" on my part isn't good enough, then the company
will need to compensate me appropriately for the interruption that comes from
a higher level of commitment. Some physicians get $100/day and cardiologists
get up to $1600/day to be on call[0] as they need to limit their plans and
avoid activities which make them unavailable.

In a nutshell, if getting paged at all hours of the day and night and having
quick responses to issues is important enough then the company needs to pay
for your time, lifestyle interruption, and mental energy at a rate you think
is fair. I suggest a minimum daily/weekly/monthly rate based on making
yourself available plus hourly compensation for the actual time you put in at
a 1.5x or 2x hourly rate. This all goes out the window if you're in some
scrappy underfunded startup, but if you're employed in a company which has
graduated from shoestring budgets and has paying customers and decent revenue
then you should be getting something for what is effectively overtime.

[0] [http://medicaleconomics.modernmedicine.com/medical-
economics...](http://medicaleconomics.modernmedicine.com/medical-
economics/news/modernmedicine/modern-medicine-news/daily-pay-call-coverage-
increases?page=full)

------
jtth
Only worth it if you get paid while on duty above and beyond salary.

------
bobmagoo
Good timing, I just left a company after being on their security incident
response oncall rotation for 2 years, partly due to the oncall. akg_67 has
some great points
([https://news.ycombinator.com/item?id=9011293](https://news.ycombinator.com/item?id=9011293)),
but I'll add a few of my own:

1) When you're oncall, your time and priorities are no longer your own.

At your kid's soccer game? A date night? Planning on doing any of those
things? Be prepared to get pulled out at any moment to deal with something
that could take hours to resolve. This was the part that really got to me. As
much as I'd like to do any one of those example things, I had made a prior
commitment to be available and had to honor that.

2) Know the response time and physical location requirements for responding to
a page

Is this something you can just fire up your laptop and an aircard and jam on,
or do you have to be able to drive to the office within an hour. Don't forget
about driving through places with less than great cell phone coverage.

3) It can be fun

There was a part of me that really liked the adrenaline rush of getting paged
in on a legitimate security issue and having to run the call and pull the
right people in to get the situation handled. It's a great test of how well
you know the environment and where all the pertinent information lives.

4) Know the team size and oncall frequency akg_67's estimate was spot on.
Anything shorter than a month is crazy and you never quite feel like you
normalize. Since it's based on team size, know what the optimal size of the
team is and that there's funding for it? My team imploded and at the end there
were only a few of us on the oncall rotation. Bear in mind that oncall duty
doesn't go away because you no longer have the staff to make it manageable.

5) Vacations and sick time are now more complicated

Who has to be oncall during Christmas/4th of July/etc? What used to be some
loose coordination with your manager is now a give/take discussion with your
team about who covered the last holiday and who's turn it is. It's all
completely fair and reasonable and if you have a good team dynamic you can
make it work, but it's definitely more complicated than telling Aunt Edna that
of course you'll be home for Christmas.

6) Get paid for it

Whether in flexing the hours for the time spend working a page off hours or by
getting paid directly for off hours work. No reason to kill yourself for no
additional compensation (and there will be those hellish pages or that
automated alarm that goes off hourly starting at 3am).

7) Put the operational burden for supporting a thing in the hands of the
people who have the ability to fix it

There should be a cycle of: Get paged Root cause Fix Post mortem Deploy fix so
that thing never happens again

If you don't have ownership over the thing that's paging you, you're at risk
of getting paged all night every night for something you have to go convince
other people to take time out of their schedules to fix to solve a problem
that they don't feel. Not a great situation.

~~~
gaius
_At your kid 's soccer game? A date night? Planning on doing any of those
things? Be prepared to get pulled out at any moment to deal with something
that could take hours to resolve_

Haha true story, I was on-call once and I called a guy in another team because
I needed him to do his thing. The call was a little weird.

The next day he told me he was driving at the time, phone on speaker balanced
on the dash, with the laptop open on the passenger seat, logged in with a 3G
dongle...

~~~
bobmagoo
Ha, that sounds about right. I've answered pages from the backseat of my car
at a rest stop in the middle of nowhere while my infant looked over my
shoulder. Makes for good stories at least.

I maintain that the pages answered from team happy hours are the most
dangerous.

~~~
gaius
He hadn't stopped, had the wheel in one hand, typing into a root shell with
the other.

------
hiou
If 24/7 uptime is important enough that it requires pager duty and the pager
goes off more than once a month, someone should be working during that time.
Otherwise it's a tell tale sign of an employer that does not respect their
employees. When you think about it, if everyone is already working full time
and the pager goes off 5 or 6 times a month and each incident requires about
2-3 hours across 3 people they are essentially wage stealing 20-30 hours a
month. A quality employer puts those 30 hours into preventing it from
happening in the first place and/or hiring someone to monitor things
overnight.

Edit: I should also add one last thing. If you are knowledge industry
professional, is working part-time graveyard shift something you spent all
that time developing your skills for?

~~~
deskamess
Could not agree more. And they are not just using the time that you spent on
the call. When you are on call for the week you are on call for all those
hours whether there is an issue or not - your time is spoken for. No trips to
remote areas or anywhere where connectivity is suspect (both phone and
computer).

In my case it was a slippery slope... there was never on-call. Then one of our
key (financially) customers had to go through cuts and we had to cut our
support personnel and the support onus shifted to the developers. Since then
this has become the norm across all customers. And the customer who had those
cuts recovered and went on a 5 million dollar project with another vendor. So
this year my company decides to offer us $500 for the week we are on call. It
translates to $5/hr. There is no option to decline the money and not do call.

------
crpatino
I have had the privilege of doing pager duty with a great team. Some of the
things that made the experience great were:

1\. Six people rotation. You need to put your personal life on hold for the
duration of pager duty, make it as spaced out as possible.

2\. The person on-duty had veto power over any deployment past 3pm in the
afternoon.

2.b The person on-duty had veto power over any deployment on Fridays (24x7
means pager duty last the whole weekend).

3\. Every person in the roll was a developer familiar with the systems. We had
first responders - spread across timezones doing "follow the sun" scheme -
taking care of the simple stuff, but when push come to shove, you need
qualified people at the wheel.

On the other hand, I have done horrible shift work. While each situation is
different, there are two common themes: Lack of proper training and
understaffed team. The very worst of all was when management tried to solve
the later inflicted the former upon two unrelated teams that found themselves
having to support systems they knew almost nothing about. I don't know what
was worse, the utter feeling of panic that came with every ticket of the other
system, or the quiet despair of coming to office on Monday morning and finding
out what sort of chaos had spawned out of your under-qualified peer's
meddling.

