I've heard managers often tout that "if your code is stable you'll never get paged". But that hides the fact that you're still on the hook to be available. So either you start shirking your responsibility or you reshape your life around some percent of non-working hours being owned by the company still.
That being said: the SRE model has encouraged orgs to improve observability and set clear expectations and it often coaxes teams into building more reliable systems under the threat of limitless OC toil.
During that time I keep my laptop and my phone that can tether with me. I've been on call and at amusement parks. I got paged once and had to go out to the car and work an hour. Calculated risk on my part. What I should have done is ask a team mate if they would cover me for the several hours I was at the park.
However, I do want to work in places where I am responsible for operating what I build. I want to know how well the service I built is doing. If I promise it will be up 99.99% of the time but is not, then I need to know if its down so I can figured out why it went down.
Being on call for systems you build also leads to better software. It makes me design software in ways that are helpful to investigate when it fails. e.g. my error messages provide a lot of context. There is investment in making sure the service is able to terminate gracefully. Metrics are instrumented to locate when things might be off.
What this means is that for things that I build, I often don't get paged as often. Its in my interest to ensure that the service is honest, well designed and well built. When I do get paged, its for something that's seriously wrong.
Anyways, I've been lucky to be in this position. What I've observed for many teams is that they inherit flaky systems which they have to make more reliable; but in the meanwhile every OC shift is an absolute slog, requiring 40 hours/week dedication by people.
If I'm going to be responsible for what I build at all times, then I'm going to be compensated for what it's earning at all times.
That means options, stock, or profit sharing (with a large preference for stock).
Otherwise I feel like this relationship is strictly abusive.
Alternatively, my contracting rate is 150 an hour. I'll agree to work up to 70 hours a week, but everything after 50 is compensated hourly. On call counts, regardless of whether I actually get pinged.
On-call is the type of egregious abuse of employees that an IT union would be able to fight against. Right now companies can take advantage of people who are desperate (H1-B, young parents, etc) and don’t have the freedom to take a stand like the above poster but if all IT workers banded together it would be possible to fight against the practice.
Expecting another group to maintain your code seems an awful lot like throwing it over the wall.
You should have faith in the systems that you build. If you don't want to carry a pager for them, why does someone else want to do it?
SREs have global responsibility to the system and have a correspondingly global view. If your system broke down because of a bug you wrote, that’s on you. If a system three times removed from yours that you didn’t know depended on you broke because you changed a non-API behavior, that’s where SREs shine: they know how to quickly isolate the problem, roll back the necessary systems, and define how to avoid similar problems in the future.