Hacker News new | past | comments | ask | show | jobs | submit login

This answer always downplays what "on call" really means. Even if you only get paged once a year, if you're "on call" and the expectation is that within X minutes you are at a computer working, this affects the life you live. You can't go camping, out to a nice dinner, on vacation, etc without lugging a laptop, internet connection, etc with you.

I've heard managers often tout that "if your code is stable you'll never get paged". But that hides the fact that you're still on the hook to be available. So either you start shirking your responsibility or you reshape your life around some percent of non-working hours being owned by the company still.

On call is one of the things that nobody told me about honestly. You’re absolutely right that it affects ones life: you need to ensure availability for OC work. Managers who downplay this are liars.

That being said: the SRE model has encouraged orgs to improve observability and set clear expectations and it often coaxes teams into building more reliable systems under the threat of limitless OC toil.

Not sure what was being downplayed. You are def on call and have n minutes to respond, so you need to stay in range of cell towers or wifi. However, when your system is more reliable, you wake up in the middle of the night less and less. We rotate through on call on our team. One week out of six or so.

During that time I keep my laptop and my phone that can tether with me. I've been on call and at amusement parks. I got paged once and had to go out to the car and work an hour. Calculated risk on my part. What I should have done is ask a team mate if they would cover me for the several hours I was at the park.

My strategy is to not work places where I need to be on call, and quit if it ever becomes a requirement. I've never been paid enough to give up my free time.

I agree on principle with this strategy.

However, I do want to work in places where I am responsible for operating what I build. I want to know how well the service I built is doing. If I promise it will be up 99.99% of the time but is not, then I need to know if its down so I can figured out why it went down.

Being on call for systems you build also leads to better software. It makes me design software in ways that are helpful to investigate when it fails. e.g. my error messages provide a lot of context. There is investment in making sure the service is able to terminate gracefully. Metrics are instrumented to locate when things might be off.

What this means is that for things that I build, I often don't get paged as often. Its in my interest to ensure that the service is honest, well designed and well built. When I do get paged, its for something that's seriously wrong.

Anyways, I've been lucky to be in this position. What I've observed for many teams is that they inherit flaky systems which they have to make more reliable; but in the meanwhile every OC shift is an absolute slog, requiring 40 hours/week dedication by people.

My approach has been to request ownership before agreeing to be on call.

If I'm going to be responsible for what I build at all times, then I'm going to be compensated for what it's earning at all times.

That means options, stock, or profit sharing (with a large preference for stock).

Otherwise I feel like this relationship is strictly abusive.

Alternatively, my contracting rate is 150 an hour. I'll agree to work up to 70 hours a week, but everything after 50 is compensated hourly. On call counts, regardless of whether I actually get pinged.

This is the right answer. If a company thinks its system is critical 24x7 then it should have actual staffing 24x7.

On-call is the type of egregious abuse of employees that an IT union would be able to fight against. Right now companies can take advantage of people who are desperate (H1-B, young parents, etc) and don’t have the freedom to take a stand like the above poster but if all IT workers banded together it would be possible to fight against the practice.

At my previous company, we could switch our on call day with other people on the team. This mitigated the issue you're talking about. I never had an issue scheduling a switch with someone for the year I was on call there. That's just one specific company though, not sure if other teams allow this.

Or you put your laptop in the trunk of your car when you're on call and have tethering on your phone.

Expecting another group to maintain your code seems an awful lot like throwing it over the wall.

And you keep your phone on during a film, and at the nice dinner, and during your daughter's wedding, and never go camping without phone service. That is exactly what is being described by "This answer always downplays what "on call" really means."

During your rotation, you have to have your phone with you. I go to the movies and to nice dinners. I would definitely get my rotation covered for a wedding or camping.

You should have faith in the systems that you build. If you don't want to carry a pager for them, why does someone else want to do it?

This is why SRE organizations typically have stringent requirements for stability before accepting a rotation, and will give it back if reliability drops below a certain level.

SREs have global responsibility to the system and have a correspondingly global view. If your system broke down because of a bug you wrote, that’s on you. If a system three times removed from yours that you didn’t know depended on you broke because you changed a non-API behavior, that’s where SREs shine: they know how to quickly isolate the problem, roll back the necessary systems, and define how to avoid similar problems in the future.

If thats priced into the contract and made clear from the beginning sure. But it doesn't come for free.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact