
Habits of Highly Successful Site Reliability Engineers - kungfudoi
https://blog.newrelic.com/2017/11/13/site-reliability-engineer-sre-habits/
======
rb808
I worked in a SRE job for 5ish years. It sucks. Developers release a bunch of
crap, when it breaks they're last to be found. You can work your ass off and
do some magic and keep the system running, but rewards for that are slim and
quickly forgotten. If you push back and ask for more reliable stuff usually no
one listens because business goals are more important.

I'm back as a developer again and things are so much easier. I do spend a few
hours helping out with tools and helping out the SRE guys and I get much more
credit doing that as a developer than actually as the SRE.

~~~
sysrq-trigger
What you are describing sounds like a typical sysadmin role, not SRE.

There are several principles in SRE which are specifically designed to avoid
the issues you mentioned and to keep software engineering teams invested in
reliability:

1\. SLO is defined by the business, which describes reliability requirements
for the system;

2\. If the system is less reliable than SLO and runs out of error budget,
feature development freezes, directing engineering resources to reliability
improvements (this prevents the "last to be found" issue).

3\. There is 50% cap on ops work that SRE do; everything over 50% goes to the
feature development team. This prevents the "working your ass of to keep the
system running" issue.

All of this is well described in the "Keys to SRE" talk by the guy who
actually invented SRE: [https://www.usenix.org/conference/srecon14/technical-
session...](https://www.usenix.org/conference/srecon14/technical-
sessions/presentation/keys-sre)

------
drewbug01
> At New Relic, we describe it internally as “someone who is constantly
> analyzing every change for its risk and what its impact could be down the
> road, not just today. And what does that mean for the larger
> infrastructure?”

In many orgs, this is regarded as nay-saying and ignored.

> There’s little upside in a siloed approach that throws a change over the
> wall with no concern for how it might affect the person sitting on the other
> side.

Often, any change that impacts another person _at all_ is considered to be too
much of an impact, regardless of the benefits. Developer workflows are usually
prioritized.

> You embrace every opportunity to automate

My experience has been that automation is not prioritized if it blocks
anything else. Running the business takes priority. The signal missed here is
that this usually means there isn't enough staff to handle the daily run-the-
business tasks and the necessary automation.

> “You need to be able to dig in and say ‘stop’ and ‘no, we really need to to
> do this thing now,’ which can be difficult to do in some engineering
> organizations,”

Asking your engineers to sell stability rather than backing them up with
institutional support is an organizational smell.

