
So You've Been Paged: A Guide to Incident Response - kawera
http://blog.scalyr.com/2016/09/so-youve-been-paged/
======
itsmemattchung
> Pager duty is essentially wage theft. I disagree—it's a part of the job; I
> dare say it's an important part of the job ... equally important if you are
> a developer. You write software and you get paged when it breaks ... how is
> that wage theft? I find no feedback more effective.

What I _do_ consider frustrating, however, is when I'm responsible for
alarms/incidents that I can take no corrective action; this is how friction
grows between teams: dev and ops.

~~~
TrevorJ
On a company level in the market there are Service Level Agreements (with
requisite adjustments in price) for this sort of thing. It's clear that the
market assigns a value to different response times. Further, if there isn't an
value to having an employee on pager duty then the company would not do it.
_If_ that value is not reflected when compensating the employee then from an
economic perspective, yes, the employee is losing out.

~~~
CaptSpify
> If that value is not reflected when compensating the employee then from an
> economic perspective, yes, the employee is losing out.

I agree, but it is worth noting that the compensation might not be monetary:
To some people getting more time-off for being on-call might be a better
"value" than extra pay. There's many other additional avenues for compensation
to consider.

~~~
jlgaddis
Yep.

I am, for all intents and purposes, "on call" 24/7\. I really don't mind,
though.

In return, I go to bed when I feel like it, I get up when I feel like it, and
I work when I feel like it. I work from home, too. I think it was around 7:00
a.m. this morning when I fell asleep and I woke up just before 3:00 p.m. I
can't remember the last time I set an alarm. There's times that I'll go for
two weeks without talking to my boss. On a nice, beautiful, summer day, I
often say "f--k work" and go for a nice ride on the Harley instead. In the
evening, after I get home, then I'll "go to work".

From the company's standpoint (we're an ISP), they're glad to have someone
available in the middle of the night that can fix any problems that may arise
(a.k.a. SHTF). Certainly, there are times when my boss might prefer that my
ass was in a chair in our office every day from 8-5. In that case, though,
something that broke at 5:05 p.m. wouldn't get fixed until I came in at 8 a.m.
the next day.

It's not perfect and it definitely isn't for everybody but it seems to work
out for us and, really, that's all that matters.

------
hiou
My quick guide to pager duty.

Step 1: Find a new job.

There are way too many opportunities out there to subject yourself to this
nonsense. Pager duty is essentially wage theft.

~~~
kyrra
I think this is because most companies do pager duty wrong. I highly recommend
the google SRE book[0] (notes here[1], chapter 11 covers oncall/pager). One
thing mentioned in this book is compensation for being oncall. At Google we
get fairly decent pay compensation for holding the pager, enough where it can
incentivize people to be on the rotation.

(I'm a software engineer at google who is oncall at this moment)

[0]
[http://shop.oreilly.com/product/0636920041528.do](http://shop.oreilly.com/product/0636920041528.do)

[1] [http://danluu.com/google-sre-book/](http://danluu.com/google-sre-book/)

~~~
user5994461
> At Google we get fairly decent pay compensation for holding the pager,
> enough where it can incentivize people to be on the rotation.

Google is a big company. I expect them to 1) Have people in all timezone so
that there is no night shift 2) Have many people on rotation so each
individual is rarely on shift.

That's not comparable to smaller companies.

~~~
kyrra
Just some anecdotal about my team at Google. I definitely do night shifts (we
do 1 week long rotations), but with a 30-minute SLA on responding to pages.
I've had bad nights where my first page happens at 11pm, then they keep going
until 4am. The better part is that we are oncall once a quarter or so (12
people on the rotation).

The thing is, there is basically a waiting list to join the rotation.
Compensation is nice for those that are motivated by it. But it also exposes
you to a lot of the infrastructure that you normally don't deal with (so it's
a great way to learn).

We have a higher-up-the-stack SRE team that does 12-hour shifts so it's not as
bad for them. They handle the larger scale issues that are beyond job specific
issues (ex: datacenter issues).

I can understand this sucking if you are on a small engineering team where you
can't do things like this (I've got friends at companies that employ < 10
developers, I've heard the stories). I guess I wasn't thinking about it for
smaller eng teams where the number of people available to support the product
isn't there.

------
secretRubyDev
I work remotely on retainer for a company I worked for when I lived back in
Oakland. I work 24/7 pager duty effectively. A lot of times I'll be summoned
after working 8-10 hours to just look things up for an important client or
confirm sales numbers (which always are correct). I work ~9AM-6PM EST hours
but folks at the company generally work 12PM-8PM PST so there's really no good
way to plan around those sort of support calls.

It wasn't always this way though. The company used to have other developers
but they never replaced them when they left. The business unit I work under
switched to a maintenance mode effectively where we're just upgrading existing
systems for CVEs, supporting the AWS setup, and dealing with important client
requests when they rarely come in.

I'd push for more money but there just isn't the budget for it and they've
made it clear to me. I even had to fight to stay full time as they wanted me
to work less days but be on call still, "we need to figure out the best way to
utilize our resources" (paraphrased).

I will say it's detrimental to my health. I wake up most mornings and
immediately grab my phone out of fear I missed an alert in the night. That
whole bit about utilizing their resources hasn't been sitting well with me
though so I'll probably move on after I finish the extra documentation of our
systems they're now pushing for.

Just putting a counter point to the people saying, "You signed up for this,
you should know what you were doing." It's not always that simple. I have a
family now and can't just pick up and change everything at a whim.

I'll also say I get nervous going into movie theaters, etc. where I'll be
disconnected for a few hours. It's just not a healthy situation at all.

------
protomyth
> The good news is this: All issues eventually get resolved (unless you just
> give up and quit. Please don’t do that.)

Well, that isn't exactly true. The issue might continue based on the way work
is scheduled in your project. Its amazing how the business and managers often
don't schedule the "stop problem for happening" work when it becomes apparent
that the support / devops staff can fix production issues themselves. I've
been there and watching the business and your managers reject the time needed
to fix the problem during the next iteration / release / sprint is soul
killing. It really doesn't bring as much business value as these new features
after all.

------
Ghostium
Please increase the contrast!

------
ransom1538
After a few years in management, when it comes to paging there are two groups:

1) People that want to fix their own code. They usually don't have to be told
"you have pager duty" \-- they just care. They even become upset if they don't
know about the page and will even install their own paging systems without
your knowledge.

2) People that can't be bothered. They dont' answer the phone, don't answer
slack - instead they just watch a re-run of the mindy project and eat ice
cream while your production system is crashing.

Generally people in 1) fix things for 2). Wage theft is really from group 2) -
because they steal from 1). Each time I have fixed something for group 2) - I
either fire them personally monday or start an all out campaign to get them
fired. Firing is hard. But letting the entire team down just makes it easy.
Ironically, if you want less pages just fire more of 2).

~~~
jayofdoom
There is a third group -- which I would place myself in. I feel a strong
responsibility to an environment I work in; but I also have a need for a
disconnect and work/life balance. I absolutely am not ok with people in group
#2, but I think group #1 is similarly unhealthy -- being constantly aware of
issues can lead rapidly to burnout.

This is why you have to have an on-call rotation, with an SLA (i.e. all pages
are ack'd within ten minutes) with enforcement for people who regularly miss
pages, and keep the life-disruption to one or two team members. Obviously,
anyone who's worked on a large software product knows you might get an
escalation even if not on-call, but that's a hugely different workload than
being attached to a pager and having to respond to them.

