Hacker News new | past | comments | ask | show | jobs | submit login
Maslow's Hierarchy of Site Reliability Engineering Needs (2015) (plus.google.com)
107 points by pkaeding on Dec 27, 2016 | hide | past | favorite | 39 comments



Cute title. For those who missed the reference - https://en.wikipedia.org/wiki/Maslow's_hierarchy_of_needs

A business version would be good, something above the pure profit-motive... actualization as a moral, sustainable, environmental, socially engaged organization that gives more than it takes from society. (Note: This is critically different to the SV-dominant self-image band-aid approach of large scale post-profit philanthropy.)


That was the purpose behind the creation of B-Corps, which allows the company to have some public beneficial goals as part of their legal obligations, besides increasing shareholder value.


I've come to the conclusion that drawing these distinctions is actually harmful in the long run. Software engineering requires operational understanding. If you hire programmers that do not know how to spin up a VM and codify their deployment processes then hiring SREs is not really going to fix anything.


I think it's true that you want everyone to be good at everything- you want engineers writing tests for their own code, you can't just hire software engineers in test and make it their problem.

The distinction is practically useful because not everyone is an expert on everything, and if you focus your hiring around very narrow markers- e.g. ability to work with algorithms to solve a coding interview- you may not build an organization that has balanced expertise.

Breaking out the roles and trying to hire for all of them helps ensure you don't get blindspots. The software engineer who spent lots of time thinking about operating systems, Linux internals, and how to build reliable systems out of unreliable parts bring value to the organization in the same way that engineers who focused on mastering algorithms bring value. Everyone needs to be strong in everything, but in practice we have different strengths, and getting people with different strengths together seems to be a good idea.


Algorithm = skill that everyone has, cheap and easy to get.

OS + linux internals + reliable systems = 3 non trivial skills, harder to find and more expensive. The combination is definitely rare.

> Breaking out the roles and trying to hire for all of them helps ensure you don't get blindspots.

If you break out the skills and hire for them independently. You'll end up with a lot of people with the common skills and barely anyone with the rarer skills and the ability to train them to others.

I agree with getting people with different strengths together. It's a NP-complete hiring problem though.


In my early sysadmin years I knew absolutely nothing about algorithms while untangling infrastructure messes spawned by the educated. Most of the good ops people were scratch-and-dent smart people who came to the field without the benefit of any kind of CS education.


> I think it's true that you want everyone to be good at everything

You can't fix unrealistic expectations.


It's not that unrealistic, as long as you have a wide enough definition of "good."

I don't think you want to hire someone for an SRE role who doesn't understand algorthims well enough to do detailed analysis of the codebase, including, say, rudimentary big-O analysis of the code. Good SREs read quite a lot of code and if they're debugging a performance issue and the code contains portions that are obviously exponential or factorial in time, you want them to be able to see and recognize that quickly as a potential cause of the problem being debugged. The same is true of problems caused by bugs or anything else. Likewise, good understanding of what's possible and not possible with software is necessary for area to file (or potentially close) reasonable PRs for the codebase.

The person who thinks in terms of page faults and systems analysis may take a little more time to find the optimal algorithm for an abstract problem than someone who only thinks of abstract problems. But they should know enough about software and computers to recognize the limits of their knowledge, and that's the hard part.


While I share your sentiment, don't forget that there is a very large pool of prospective employees who may be beneficial to a company in spite of a more narrow skill set, even if they don't grow their skill set. Take college grads for example. They're relatively inexpensive and have a moderate skill set as they begin their careers. Are they useful to hire? Yes. Do they know anything about operations? No. Is it necessary for them to? Not from day 1. They have a lot to learn, and early on, they may be more effective to an organization in a more narrow role. Let's try not to stereotype all engineers who are looking for employment into "already has all skills" and "cannot hire because lacks all skills" buckets. There's a scale, and some set of skills can be very effective for some organizations, even if they aren't quite where you are.

I'm probably being pedantic, but I felt it necessary to highlight the implicit assumption.


You do have to have the willingness and ability to give on-the-job training to cultivate those skills, which is often wanting.


Here's a similar one we use a lot in all sorts of slide sets at Facebook: http://imgur.com/a/EN0G8

(Stolen from Pedro's "Notes from Production Engineering" talk: https://www.youtube.com/watch?v=ugkkza3vKbc , Pyramid shows up around 47:30 min PE = Production Engineer. It's a similar role although not exactly the same as the usual SRE role.)

It's a bit of a different view on the same underlying work. Interesting to see how they slightly differ.


That's the first Google+ link I've seen in like three years.


Also see Dickerson's Hierarchy of Reliability.

https://docs.google.com/drawings/d/1kshrK2RLkW-XV8enmWZxeRFR...


not once does the writer explain what she means by "SRE" -- yet uses it repeatedly. Spend a few words early on to define your term and then go ahead and abbreviate away. That's just writing 101.


if you don't know what an SRE is, you're not the intended audience. probably half of the articles posted to HN are unintelligible to non-practitioners of that particular subfield, don't see why this should be different.


I don't know. As a person who is part of a number of intended audiences, I still get frustrated when people (directing things at me) don't define acronyms when they're first used, since

1. it's a hinderence for people who are studying to become the intended audience.

2. sometimes I'm reading two literatures that use the same acronym, so it's momentarily stupifying.


That just makes this a problem with HN rather than with the blog. HN's audience is not narrow, and it's annoying to see so many interesting titles leading to incomprehensible articles.

It really helps when HN posters either clarify things a bit in the title, or if that's not practical, make a comment when you post the link.


Finding a balance is hard. Should you define SQL? DB? HTTP? where do you start and stop defining acronyms.


In my blog entries [1] I use the ACRONYM [2] tag with a TITLE attribute that most browsers will show when you hover over the acronym. I don't understand why it's not used more often though.

[1] http://boston.conman.org/

[2] I know, ABBR is the new hotness, but I'm still using HTML 4 for my blog and old habits die hard.


I've modified the title to expand the acronym. Apologies for the confusion; I forgot that not everyone's head is where mine is. (I'm not the author of the article; I just found it interesting and wanted to share.)


oh no apologies necessary. It's becoming increasingly common on HN lately and I can't fathom why people take the time to write a thoughtful piece to advocate a particular view and then immediately lose a massive amount of the audience with unexplained acronyms. Especially when it's so easy to just lay it out once.


AcronymFinder.com works pretty well when you encounter an acronym you don't understand. In this case, you'd have to hit the 'Information Technology' section, as will frequently be the case on Hacker News.


had to go all the way to her Google+ bio to find out that she's a Senior Site Reliability Engineer, so I'm guessing that SRE in this essay means that.


It's also in the title of the link on HN?


it wasn't originally


It stands for "site reliability engineer" but the term is as meaningless as "full stack engineer".


I honestly don't see a problem with it in particular. Not anymore than with other titles in the industry. The title gives you a clue, but it's far from being a very specific way to find out in detail the skills/responsibilities of a given person, and because no one enforces what it means companies will use them for similar but significantly different purposes sometimes (and yeah, sometimes just to be 'cool'). It does not come as surprise to me at all that specially relatively new titles are vague.


devops / sysadmin / ops / system engineer pick a name.


SRE is a terrible role name both in its abbreviated and in its fully spelled out version! Just because it originates at Google, it doesn't mean it's cool. I am terrified that it gets more and more popular these days!


> I am terrified that it gets more and more popular these days!

Personally I believe this is a direct result of the absolutely massive devaluation of the term "DevOps".

The latter has become a synonym for meaningless mishmash of "everything in operations, and nothing in particular". Using the term SRE is a clear signal that we're talking about engineers with good understanding of systems and operations.

Disclaimer: this distinction is mine, and I have come to it after making the mistake of trying to hire "devops engineers". Put those two words together and your applications pipeline gets crammed with unskilled remote hands who cannot even realise they have delusions of grandeur. It's also - apparently - an invitation for every cold-emailing recruiter to spam you to death.


I like it. What do you see terrible with it? why, and what would be in your opinion a better role name?


The presence of a "reliability engineer" raises the question of what the hell all the other engineers are there for.


They are there to build features. Hopefully those features are something that solves business problems and ultimately drives revenue. Reliability is a requirement for a useful business system but is not a useful thing on its own. Your system needs to solve actual problems with software features to be useful. That is what the rest of the developers are rightly focused on.


"Operations Engineer", for example.


The term SRE espouses a philosophy that when SRE has done its job perfectly, there is no operations work because the system is reliable on its own.

If you're looking for members of that school of thought, you say SRE. If you're comfortable having a "boiler room" and just need to staff it up, the terms sysadmin and devops work just fine depending on the level of coding involved.


I dig a little bit before commenting (SRE vs DevOps), but SRE as an abbreviation is ugly and reliability is assumed for every subject or role, so, it's redundant.


SRE is about engineering systems to be reliable, which is certainly not a goal taken seriously by default. In a traditional environment the system itself is not reliable and not trying (very hard) to be; rather, the combination of system + pager-carrying operators is reliable.

Many more people are paging operators to do DB replica promotions by hand, than are inplementing distributed consensus algorithms and self-healing distributed databases.


I'd say that's quite more vague and doesn't quite sound like what an SRE is. You didn't answer my other questions btw.


Sorry, I meant "Site Operations Engineer". "Reliability" is an adjective assumed to every subject or role.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: