I'm back as a developer again and things are so much easier. I do spend a few hours helping out with tools and helping out the SRE guys and I get much more credit doing that as a developer than actually as the SRE.
There are several principles in SRE which are specifically designed to avoid the issues you mentioned and to keep software engineering teams invested in reliability:
1. SLO is defined by the business, which describes reliability requirements for the system;
2. If the system is less reliable than SLO and runs out of error budget, feature development freezes, directing engineering resources to reliability improvements (this prevents the "last to be found" issue).
3. There is 50% cap on ops work that SRE do; everything over 50% goes to the feature development team. This prevents the "working your ass of to keep the system running" issue.
All of this is well described in the "Keys to SRE" talk by the guy who actually invented SRE: https://www.usenix.org/conference/srecon14/technical-session...
It's similar in security and QA in that the more ability you have to tell other groups to not do stupid things and enforce it the less soul crushing it is to do your job and the better of a job you can do.
I currently work as an SRE. If your stuff breaks and whichever SRE is on duty/call can't figure out why and fix it(which is almost a certainty since we cannot be an expert in everyone's software) guess who's getting called...
and if we can't reach who we're calling guess which direction on the org chart we go...
This is also related to how an engineer can truly have a "10x" output of another engineer. Making the right decision on what not to build give you that 8x. And building it quickly and well gives you the 2x. But it's not often recognized as such.
In many orgs, this is regarded as nay-saying and ignored.
> There’s little upside in a siloed approach that throws a change over the wall with no concern for how it might affect the person sitting on the other side.
Often, any change that impacts another person _at all_ is considered to be too much of an impact, regardless of the benefits. Developer workflows are usually prioritized.
> You embrace every opportunity to automate
My experience has been that automation is not prioritized if it blocks anything else. Running the business takes priority. The signal missed here is that this usually means there isn't enough staff to handle the daily run-the-business tasks and the necessary automation.
> “You need to be able to dig in and say ‘stop’ and ‘no, we really need to to do this thing now,’ which can be difficult to do in some engineering organizations,”
Asking your engineers to sell stability rather than backing them up with institutional support is an organizational smell.