
SRE: The Biggest Lie Since Kanban - kiyanwang
https://theagileadmin.com/2018/10/02/sre-the-biggest-lie-since-kanban/
======
nostrademons
Google-style SRE has three critical elements:

1\. SRE is involved in the design of the system from a very early stage,
oftentimes before any code is written.

2\. SRE has veto power over deployment.

3\. SRE is an engineering position, focused on automating everything possible
to achieve greater reliability.

Facebook is the only organization besides Google that I know of that actually
follows this. Most people I know who hire SREs do indeed just hire a glorified
ops team, but they're kinda missing the point.

~~~
solatic
I wouldn't say so much that SRE has veto power. You call your Ops team an SRE
team and tell them they have veto power with few other changes, and either the
CEO is going to end the SRE experiment in a week when Dev and SRE are yelling
at each other about getting product out the door, or the company will implode
rather quickly from being dysfunctionally unable to ship.

Rather, there is an error budget and if Dev violates the error budget then Dev
has to go back to being responsible for Ops until they fix whatever was
causing the product to go over its error budget. Without the ops-
responsibility handoff between Dev and SRE, you just have an Ops team.

This requires the org to both a) have shared language for defining errors and
error budgets b) have some way of transferring control from SRE to Dev
securely, without incurring additional downtime. That's more difficult than it
sounds - going from Dev not having production credentials, to having
production credentials, to losing them again when SRE takes the service back,
all the while often using similar infrastructure e.g. managed databases which
are backed up without Dev intervention etc.

What most organizations get wrong is that SRE is a higher level of efficiency,
not a foundational practice. It requires building out infrastructure to
support the model. If you don't have it yet then you can't force it by
slapping a label on it.

------
webwanderings
Everything "agile" when forced from the top, becomes a lie. I'm talking about
big orgs and teams.

You throw methodologies around; you create and change names and labels all the
time. It makes no difference. It is only when people self-adopt the philosophy
behind 'being agile', and when the 'leaders lead', it makes a difference for
everyone.

------
hosh
I like Spotify's organizational structure. In addition to product groups and
tribes, there are cross concern practices that anyone can join. For example:
UX, TDD, and SRE can be engineering disciplines that bring their voice across
product groups.

And I totally agree, reliability engineering starts in the beginning with dev.
I see this mindset happen more with Elixir/Erlanf devs than Rails devs. (I do
both). The whole focus on "developer experience" tends to bias concerns for
reliability engineering as if that will magically go away.

------
joshuamorton
This paragraph:

>Second of all – I don’t want to make an enemy of all the lovely Google
engineers out there, but is your experience with Google services that they
evolve quickly and get better once they go to wide release? It’s not mine.
They rot. Have you used Google Hangouts lately without it ending up with
cursing and moving off to someone’s Zoom? That kind of specialization still
has its downsides in terms of hindering your feedback loops that let you
improve (the Second Way). Is SRE just Google-ese for “sustaining?”

Makes me think the author of this blog post doesn't have a clear understanding
of what SRE does. Product rot and feedback has nothing to do with reliability.

It theoretically possible to have a product with no product engineering team
because there are no new features, and still require an SRE team since, well,
people use it.

I'm not sure how the author managed to conflate product managers and SREs, but
it appears that they did?

------
peterwwillis
> That’s why SRE is a Big Lie – because it enables people to say they’re doing
> a thing that could help their organization succeed, and their dev and ops
> engineers to have a better career and life while doing so – but not really
> do it.

People doing things wrong doesn't make the thing they're trying to do a lie.
If you ask an organization enough questions before you take a job there, you
can quickly tell if they're spinning their wheels or if they're serious. Most
medium to large size companies are not serious about this stuff unless they're
one of a handful of multi-billion-dollar tech startups.

If a company hasn't hired another company to teach them how to implement these
new kinds of teams, they're probably not serious about it.

------
dvfjsdhgfv
> There's the super well-known ones like Bitcoin and Ethereum of course, and
> also a myriad of other technologies and ecosystems: Mastodon, ActivityPub,
> Perkeep, Scuttlebutt, Dweb, Solid...

P2P technology is at least a decade older than blockchain - it's peak was in
the early 2000's with Napster & co.

------
asplake
SRE being Site Reliability Engineering - it is there in the text but easy to
miss.

Shameless plug: I wrote ‘Kanban from the Inside’

------
retox
Article could start with a definition of SRE.

