People need operations staff, people don't like operations staff and keep trying to treat them like developers.
But, operations staff do and have always developed software, just internal software for glue or orchestration, and they work differently to regular software developers in that their customers are usually themselves to meet an internal objective of reliability, stability or ease-of-use for developers.
It's interesting to me, because I'm a bit longer lived it seems and have been tied to industry for a long time, and I see the treadmill grinding along continuously;
Sysadmins (different from system operators) were usually among the most senior developers who ended up knowing how operating systems and compute worked fundamentally. Over time this eroded and eventually you had helpdesk people being labelled as sysadmins.
Then "DevOps" emerged as a job title, which meant a dozen things to a dozen people, and the same issue happened, it was the same operational needs and the same operational solutions, just with better tools as the passage of time allowed better tools to exist.
Then SRE, which ironically was devised before DevOps was, which did exactly the same thing of trying to turn operations problems into the easier to reason about software development space.
But still, it's operations folks, people who are more responsible about an outcome and a continuance than they are about delivering a feature.
But Managers genuinely can't reason about anything other than features, so a cost center it becomes and eradication is desired, and the treadmill begins again.
So, SRE's, who embody essentially the same characteristics as early sysadmins (but with large budgets and better tools) will eventually become system operators, who will eventually become helpdesk and eventually replaced by some new title that insists that "SREs never wrote code, and this next generation will!".
The author is experiencing the exact same thing I did over a decade ago when "sysadmin" became unfashionable and everyone told me that "sysadmins can't code", despite working in teams of sysadmins who wrote precursors to kubernetes/nomad, on bare-metal, in perl on Solaris.
Will be interesting to see what the next iteration will be called, the author will just need to alter her vernacular.
Same here, being that developer on the team that would embrace build and deployment scripts, means that for the last 30 years, I have always to jungle around, am I a developer, a systems administrator, network engineer, or whatever is fashionable as term for upper management.
The worse part of the fashionable job titles is that both management and HR love to put people on little boxes, and generalists are a big headache for them, on top of deteriorating the actual original meanings of these roles.
>Then "DevOps" emerged as a job title, which meant a dozen things to a dozen people, and the same issue happened, it was the same operational needs and the same operational solutions, just with better tools as the passage of time allowed better tools to exist.
DevOps was/is not just better tools, it's the same team that builds the software, operates the software (and underlying infrastructure). Maybe when it became a job title (later) then it just became a rebranding of the ops team.
Cliff notes version, things that happened around the same time:
* Dev+Ops days is made by Patrick Dubois, the intent was for Sysadmins to use Agile ("Agile Systems Administrator" was the desired job title), he used "dev"+"ops" to signify a unification in working methodology, not as a team
* 10+ deploys a day from Flickr; where the authors are in a single unified team of sysadmins and developers.
Like Agile, people always claim you're "doing it wrong", which I find ironic given that the portmanteau was created to bring Agile to systems administration itself. However, devops was confused right out of the gate.
You need to also look into why DevOps became a thing. Because devs were sick of the BS dealing with Ops teams (want a server, that will take 6 months to provision). So devs decided they could do it themselves (with better tools).
But if AWS originates from the ops teams of Amazon, then perhaps they were just doing Ops better than everyone else. They chose not to be just a cost centre, and find ways to become better.
Not scale in terms of your company scaling. Only scale in terms of number of companies using the same off the shelf tooling.
The difference with those tools is that they were off the rack and not bespoke. There were plenty of bespoke systems before, used in fewer places because bespoke systems are expensive.
Perhaps they would’ve spent more effort making software not suck if they still had that constraint. Instead, now we get the modern dumpster fire because more resources are an HPA call away.
> DevOps was/is not just better tools, it's the same team that builds the software, operates the software (and underlying infrastructure)
Not happening in any of the companies I worked for as a contractor or subcontractor. In those companies at best dev teams have nowadays some additional capability to deploy to dev/test instantly (so they do their own CI/CD pipelines and manage dev infra via Terraform or whatever) but production is always handled by a "DevOps" team which does not touch the application code at all.
This is especially visible in companies that outsource their development to software houses / individual contractors (B2B stuff, most of the folks I interact with).
These setups are frankly asinine. Often come with another contractor and an entirely separate org to handle service/support. And then management wonders why relatively simple applications take forever to ship, changes take even longer and operating costs are astronomic ("back in the day four guys could do this" - they still can, you just don't let them).
> It's interesting to me, because I'm a bit longer lived it seems and have been tied to industry for a long time
> The author is experiencing the exact same thing I did over a decade ago
> the author will just need to alter her vernacular
If I'm not mistaken, the author is older than you and has been in the industry longer. She's complaining about this topic specifically because she was a Google SRE back when SRE was first becoming a thing.
The older term "Systems Programmer" (which still exists primarily in academia, I had this title a few years ago at a university) has always felt more accurate to me as the "high level operations person who has programming as a primary skillset" job that the original SRE job description seemed to be aimed at.
That's because it's difficult to really define a job in IT with clear description along with set roles and responsibilities.
In some orgs, it's the title that dictates what you can/can't do.
In some orgs, eventually your title is decided based on your role/responsibilities.
In many orgs, your title is just an HR/accounting construct and has no relation to what you do and you do whatever your boss asks you to regardless of what your role on payslip says.
I've had a similar run of titles. The article reads like "I'm an operations person who does serious programming as a hobby."
To expand a little: The job of any "operations" person is to keep the money flowing and avoid wasting money/looking stupid. All that technical circle jerking is just a temporary detail.
Fads come and go, getting functional products in front of customers is forever.
you're going to be the "ops bitch" for the "real" programmers
Rachel is spot on about what is often wrong with IT culture; "typecasting" people for someone's convenience or to get a fancy title leads to learned helplessness and dissmissing other people's expertise and interests. I rather we all try to keep things simple and encourage people to be well-rounded engineers.
To be fair, many of them do seem to operate on the "op sees, op does" level.
Just last week, we had some mails escalating, client had "issues" with their on-prem install of our software, which ran on a dedicated VM.
I read the mail thread and turned out the database service "used a lot of memory", and they'd tried rebooting the server several times but it just kept using a lot of memory. So now they had escalated because they couldn't figure out how to "fix" this issue.
Of course, this was not an issue. The database is designed to use all available memory by default, and this was a dedicated VM so it didn't affect other services.
I've seen many such and other instances over the years. While there are certainly awesome ops people out there, which is always a pleasure to interact with, a significant amount are at a much more basic level.
As a developer, I've had to tell "ops" how syslog() works. Also, as a developer, I've had "ops" bitch because we had different config files for the service. Yes, because the service is geographically redundant, and each site had different IP address! And I, the developer, have maintained, and continue to maintain, the configuration file! Checked into your ops repo. You (ops) have never had to maintain it at all!
I've also been on the other side of the wall, as ops, having to tell the developers why the keep seeing zombie processes on the server. Because the developers had no experience with Unix signals and having to wait on child processes. And again, about how syslog() works.
Ops people are cost centers. They can display their wizardry in blog posts until they are blue in the face, but except for those few companies with an incredibly large moat whose main profit is just raw traffic or high uptime (and how many companies like that can maintain that moat indefinitely anyway), engineers not actually building or directly improving product will always be cost centers.
It brings me no joy to say this, as ops people tend to be very smart and cool under pressure, but I never see myself becoming one. High quality software and airtight system integrity seems decreasingly important to people paying salaries and investing money. And the world moves on. I refuse to be seen as a cost center if I can avoid it, and I don't have much sympathy left for otherwise very smart tech people who haven't figured this out yet. If you love the job, do it. If not, transition and don't complain. The people in charge are not going to be persuaded by blog rhetoric. I said what I said.
I’m not scared. We’ve seen the rise of DBRE because all of the ZIRP startups (the ones who survived, anyway) realized they couldn’t just YOLO their way through indefinitely, and that RDBMS is hard, actually. Of course, they couldn’t admit that they needed DBAs, so they invented a title and mixed in SRE so it would seem cool.
I might be in a cost center, but companies will cut this cost center to their own demise.
Even as a Google SRE who came in with a Systems Engineering focus (i.e., sysadmin), I found it useful to qualify for, and become, a SRE-SWE so that I could also add Software Engineer to my résumé (I wrote enough code that I should get that title). Big and scaling companies still find the combination of sysadmin and software engineering skills extremely valuable, but you do need to emphasize that you are a skilled software engineer as well.
Sucks, but true. Which is extremely unfortunate. Bad software is expensive to maintain, bad ops can take down companies, and both can slow the sales cycle and threaten renewals.
Engineering as a whole should be treated as a profit center.
That's a heap of tough love. Ultimately indeed, those who pay, decide, and they're taught to systematically cut costs to improve short term profit above all else.
As long as that very description is spelled out clearly near the top of the job ad, it doesn't matter terribly much (within reason) what you call it - job searchers will try various different strings to find it. Personally, I'd call it an ops role, however.
In general, most of the issues I've seen with these sorts of roles aren't the naming of them but rather the third bullet in your list. Having an ops role that's responsible for all the problems with software they didn't build and weren't allowed to have meaningful input to the design/implementation of isn't healthy. It sucks up their time and energy fighting fires they didn't create and likely aren't really empowered to fix. This is the problem with this role (as often implemented) that Rachel talks about in her first paragraph.
If you really care about having a good and reliable product you need people involved with the design and implementation that are deeply invested in making it reliable and maintainable in the long term - which means either making your dev roles shoulder some of the oncall burden or having people that straddle the ops and dev teams. Or both. If you're having difficulty filling this role, perhaps this is the real problem?
I haven’t started trying to hire for this position yet, it’s been something I’ve been thinking of for a while though. Right a small number of developers, myself included, are on call and I’d like to reduce that a bit or just share the burden.
My goal isn’t to throw crap over the fence and say “make it work” but rather empower someone to make maintaining and growing our platform their main goal. The developers (again, myself included) are not great at the ops side of things and can rarely focus on the infrastructure itself due to other priorities (yes, we can talk about how that itself is an issue). If I could clone myself and one of specialize in ops and the other on programming for the platform I would in a heartbeat.
Infra/Ops and programming are two different mindsets (much like managing people or qa differs from writing code). Switching between them is hard and you pay a penalty to do so. Not to mention there are skills (networking is high on that list) that I’m not good at. I can scrape by but that’s not where my skills lie. That’s why I’d like to hire someone who is good at it, who _does_ enjoy it, and who push for changes from a ops perspective that I can’t due to time or skill.
> Infra/Ops and programming are two different mindsets [...]
HARD disagree on this. I've done both. Most of the really excellent programmers I've worked with have, at least a little. You can't write highly reliable networking software without a deep understanding of how networking actually works. You can't write highly performing software without a deep understanding of the infrastructure and hardware it's running on. And so on.
I'm not trying to bersmirch yours or your teams abilities here - if you're writing in a high level language and most of your challenges are implementing biz logic then not knowing very much about the underlying infrastructure and hardware is fine, you aren't trying to write a distributed RDBMS, you probably don't need to know this stuff.
But do remember that there are lots of people for whom the hardware, the infrastructure and the application they're writing are inextricably linked. It's not a different mindset, it's just people with additional skills you haven't needed to learn yet.
I have zero doubt that I could do an Ops job well, what I can’t do is switch between writing business logic and maintaining servers at the drop of a hat. Similar to how I can’t go from QA to engineer without a context switch/penalty.
As I said elsewhere in this thread if I could clone myself and do both roles I would in an instant. But if I have to pick one I’d pick writing code (as my primary thing), not saying that Ops doesn’t write code, just not the same type of code.
For me it's very similar, I was in dev, now I'm in ops and I can easily switch back. But to think that all good programmers can do infra is a falsehood.
I've met so many developers who don't even know how computers really work. They're good at a particular tech stack and do their job very well, but they can't do much else. Let alone infra.
Personally, I agree that it's two different mindsets, but sometimes they can overlap.
I've settled on "systems engineer" -- I just try to keep "systems" in the title to imply that my job is largely -- "making many things work together to run the business"
I think my current title has the word "cloud" in it or something and my boss rejects (with prejudice) "SRE" or "devops" explicitly to avoid being labeled as operations.
At our organization, a bunch of devs which were interested in improving/automating infrastructure split into a separate DevOps team. They first called themselves DevOps, then renamed themselves to SRE, then again to Production. Every time they changed their name they prepared presentations for the whole company explaining the name change and why the previous name wasn't correct after all. Everyone is still confused and always forgets what's their current official name.
Other than that, there's also:
- a separate IT department which manages networks, telephony and access
- the ones who push people to update software/OS is InfoSec, a separate team
- ordinary developers can make changes to k8s/grafana/Prometheus configs for their services, if it's reviewed by Production as the owners (they're understaffed compared to the multiple dev teams)
- there's also the "Core" team which deals with the microservice platform on top of bare infrastructure (how microservices should communicate, etc.)
Often the lines are blurred and it's not always clear who you call if you have a problem.
"Platform Engineer" has started to grow as a name for pretty much this. An engineer that builds the platform that all things run on. For example, the k8s clusters, the observability stack, has on-call, and builds the internal tools and automations (called the IDP, or internal developer platform).
Apparently not everyone gets it. I once went to an AWS meetup and introduced myself as a "platform engineer", but not everyone understood what I did =)
But personally, I like "platform engineer" more than "ops".
What I've experienced working well is to consider platform/infra as just another dev team. They should experience the same good practices (staging changes, tests, documentation, clean code) and duties (level 2 on calls, regular postmortems from ops, etc).
Ops/prod/support eng are generally more business related and less technical than infra/platform. They make sure the processes run as they should, or operate with agreed procedures when something is raised. Some issues may be lightly technical ("theres an alert on free disk space of server X", "provider Y changed their SFTP keys without notice", etc) and some business related ("provider Z is late to push updates", "it's a holiday on country of provider A", "service of team B raises an exception when situation C occurs"). They are often level 1, in between the actual dev teams (escalating issues to them, or asking for better stability, logs, resiliency, etc) and the platform/infra team.
Being faced with production issues is a burden for focus, etc. I wouldn't want my platform team to be on level 1 handling these stuff or nothing get done.
Still, I wouldn't want them to be completely free from it or they would loose track with reality, just as any other dev team.
To summarize, I would say:
- A fleet of business dev team
- A fleet of platform / infra team
- A support/ops/prod team handling level 1 through procedures written by dev/platform teams, and raising to infra or business dev for level 2. Regular feedback sessions with each other team to have them stay on touch with real life. Some light coding / improvements tasks paired with infra or dev to get them a better understanding of the underlying layers.
From the description you provided, that's what I would call a "platform engineer". Though the titles are sensitive for some people, and there's a lot of discrepancies between companies, so i usually let people from platform choose the title they prefer, between SRE / DevOps / platform eng / cloud eng / infra eng.
I got the SRE book by google, and read the whole thing, cover to cover.
Companies want SRE people but aren't willing to give SRE empowerment and authority.
So companies do what companies to: take a regular team of operations people and slap the SRE term on it, and call it a day.
And it doesn't work, of course.
---
Regarding the empowerment & authority: according to "the book", SREs often play the role of "launch coordination engineers" as in vetting (read: roasting) a service before it goes live and have authority to say "this won't go live, fix this first" and to do so no matter what deadline is going to be missed.
Also SRE team have the extreme prerogative to "give back the pager" as in take a service back to the development team and say: it's not stable enough, YOU will be on-call for it until you fix the shit you wrote.
These are two emblematic examples, but there are many more in the book.
Can you imagine any of the non-google (and non-faang) companies actually doing something like that?
And that’s why I don’t mention my military background during job interviews. In software world it would likely be lost on them anyways, but really certain words are triggers for unfounded assumptions in an industry dominated by unfounded assumptions.
Management hate classic operation because it's too powerful, it's the operation part of the nervous system of a company, operation can't be managed nor segmented by the management, as a result many try to do their best to delete sysadmins from the scene.
Results are that most end up on someone else system with some form of operation. Meaning the cloud. Now that costs skyrocket, issues piling up, no one seems to been able to create a damn full infra well, the push deflate.
We have experienced many similar trends:
- full stack virtualization on x86, sold as the future, a super-duper simplification, actually a way to allow third party selling pre-made images to those who have no operation or do not own the bare metal (VPS etc);
- when people realize how big the overhead is paravirtualization became an old-new trend, mostly with k8s, now this model start to creak and the push toward owning back the infra and the iron start to be noisy
I expect in a decade a mainstream NixOS/Guix System move as we had with Ansible/Salt before, the old CFEngine much before and so on, in 20 years probably companies will own back their machine room with a Plan 9 -alike model to just get redundancies and extra temporary resources.
An SRE role was my first job at Google in 2006. It was a product of its time and place. Google could hire the sort of full stack programmer/sysadmin types that the article talks about because they had enough clout and desirability at that time that they could get very skilled programmers to give up programming and do sucky on-call work instead. Even then, just two years after IPO and receiving glowing press everywhere, they struggled with this proposition. A typical strategy (which worked on me) was to promise that whilst they wouldn't offer a job as a developer right away you'd be able to transfer out of SRE after a few years. Another strategy was to build teams that consisted of mixes of sysadmins and developers. In other words, even back then in the most optimal hiring environment possible, there were very few people who were genuinely both developers and sysadmins in one neat package. There was always a clear bias towards one or the other.
The hiring process was nightmarish for both sides. SRE had a higher rejection rate than SWE with only about 1% of candidates getting an offer; the accept rate was lower. Hundreds of interviews to get one person through the door. I had to do eight interviews that covered everything from TCP to obscure bash puzzlers, simultaneous equations, C, performance debugging, Python and CS algorithms. All of us had to spend enormous amounts of time running interviews as a consequence, and there was a constant shortage of SREs too, meaning many services were being administered and on-call covered by SWE teams instead. Who then grumbled a lot because that wasn't what they'd signed up for, of course.
I think if you're not in a similar situation as Google in 2006 then the SRE role isn't going to make sense for you to try and hire. It's a unicorn role. There just aren't enough people out there who have skills that span such a wide gamut of topics. The fact that she put up an ad for SRE and got "a bunch" of "ops monkeys" is the least surprising thing in the world. Of course that happens. Unless you're willing to interview hundreds of candidates to get one hire, whilst simultaneously running a dragnet recruiting operation, then you'll have to compromise. Even Google had to compromise.
Aren't there things we can do to simplify the software stack, so we're more likely to find candidates with all the required skills in one package? There are certainly fashions we can push back against, e.g. premature use of microservices, Kubernetes, and distributed systems in general.
> Aren't there things we can do to simplify the software stack, so we're more likely to find candidates with all the required skills in one package?
"Can't we simplify the human bodies, so that we can crank out doctors after a two years bootcamp rather than 10+ years of med school and specialization?"
Cloud definitely helps, there. People pay money for clouds because they find ways to hire lots of SREs :)
For simplifying, yes probably, but it's hard. For instance relational databases require quite a lot of skill to run, even if you pay for a managed cloud DB. The cloud operator will handle some work like backups but some other problems like schema migrations, sudden query plan changes etc can still trip you up in production. To simplify that away you need a fundamentally different approach, maybe something like FoundationDB and Permazen. But then of course you lose a lot of the features that make people want the RDBMS engines.
I don't think so, because there are countless vendors for all different parts of the stack fighting for marketshare and doing their best to prevent their competitors from becoming the de-facto for anything.
Just think of how many different flavors of SQL there are being run. If we can't even all agree on that, I have doubts that we could unify on the rest.
Unless you’re an early hire, the odds are hideously stacked against you successfully arguing against complexity.
DHH isn’t always right, and his “you only need a server” mantra isn’t true for everyone, but IMO it is DEFINITELY true for nearly any startup that’s shown sustainable growth. By all means, use cheap cloud resources when you have minimal revenue (unless you already have someone who can administer Linux). But after that? Run a couple of boring servers and front then with a load balancer. You don’t need multi-regional HA at that stage, so don’t worry about it.
To me, a SRE is *both* a sysadmin AND a programmer, developer, whatever you want to call it. It's a logical-and, not an XOR.
By sysadmin, I mean "runs a mean Unix box, including fixing things and diving deeply when they break", and by the programmer/whatever part of it, I mean "makes stuff come into existence that wasn't there before".
The main issue i see with that is the companies usually aren't willing to advertise and pay SRE salaries for actual SRE skills.
The skillset described in the above quotes are essentially the skills of an SWE and of a Sysadmin. So essentially you're doing two jobs for one salary.
There are people capable of doing two jobs, but you won't find them until you start advertising and paying actual-SRE salaries.
It doesn’t mean anything for two reasons: companies have treated it as a catch—all, and there is a glut of people calling themselves SREs who have never operated a server that wasn’t in a cloud.
You can learn enough about Linux to be decent at your job on only VMs if you’re dedicated, but I’d argue that until you’ve also dealt with hypervisors, bare metal, and hardware issues, you’re missing some of the picture.
“That’s no longer applicable, so why should I care?” Because it pops up everywhere I’ve been. Random build server that everyone forgot about but is critical suddenly shits itself, it’s running some ancient version of Ubuntu, and it’s all hand-rolled. Someone decided to provision a bunch of EC2s with bash, but they made critical errors like not knowing to make a new initramfs after configuring mdadm, so now the RAID disappears on reboots.
Understanding the fundamentals has always and will always matter. Anyone telling you differently is selling you something.
Because if you’ve never touched hardware, you haven’t seen the full gamut of what can go wrong. I don’t think everyone needs a rack in their house, but you can buy old tower servers for dirt cheap, and learn stuff.
In my career so far, the best Ops-adjacent people all had homelabs. I view it as a huge green flag if you have one. Not necessarily a red flag if you don’t, but it warrants some more questions. That’s not to say “I have a server” means you’re great; you could just be running Plex with nothing in IaC, no config management, no monitoring, etc.
Listen man, you’re not wrong, but you’re also
Living in the past. It’s no different from saying a real programmer does low level C coding.
The field has changed, everything is in the cloud. Yeah most engineers can’t troubleshoot low level stuff, but they rarely
Need to. There’s no need to be grumpy about it and just accept their jobs are mainly configuring abstractions rather than being some bare metal guru who knows kernel intricacies.
I work with like 15 ops/system engineers. None of them know anything about cpu schedulers and raid arrays. We’re just old man
Counterpoint: I know both cloud and metal. It can be done, you just have to care enough to learn. That’s my bugbear: people by and large aren’t learning things for the sake of learning, they’re just learning what they need for the role.
If you can affect the quality of the on-call, then it's not so bad.
If you can't, then that's your ammunition to remove the feature from being "blessed".
This is the gift that the SRE book actually gave people. That, and error budgets.
It loosened peoples idea of what ops was, before this it was 100% uptime of all services at all times, and ops people being responsible for things that they could not reasonably affect nor push back on without extreme resistance from developers and management.
In my experience, being "SRE" means being pager bitch. I was almost completely burned out by being pager bitch for a company that had two warring tribes of PHP and Haskell developers that just sort of threw things over the wall for SRE to keep up. We had no meaningful way to push back on developers because that impacted product release timelines and we can't have that, can we?
IDK, part of the reason I got into things like DevRel is that I am good at SRE style work, but I'm just not cut out to be pager bitch like other people are. I now have a 500% premium for any job that involves oncall work. Nobody's taken me up on it, surprisingly.
Do you know of any company that actually _does_ error budgets? Like, they did the statistics on their previous downtime, calculated them and wrote SLOs with them baked in?
Everyone lost respect for SREs when Elon took over Twitter and culled lots of staff. Every SRE in the land lined up to shout from the rooftops "There is no way Twitter can keep working it's going to flame out and fall over and keep crashing" and it didn't, not even once.
Welcome to "Olds-land," Rachel. Sorry about that. It's almost impossible to be a developer/engineer/opsmonkey, with any varied experience, without running into this. People will always find something in your résumé, that makes them uncomfortable.
> they didn't have the usual lists of godawful clown software that most places rattle off that you'd be expected to work with.
That could be a summary of all that is amiss with the tech industry, these days.
Well, yeah, in a world where corporates don't want to pay for sysadmins that can actually code or giving their devs a pager, you get what you saw: sysadmins and the people that do Jenkins being renamed as the "devops" team in 2014, then the "SRE" team in 2016 or so, then, "platform teams" after the age of Kubernetes in 2018-ish.
There are so many companies that have SRE teams despite those teams not maintaining a website!
It’s a predictable problem. Make up a new and fashionable term and it will bite you in the end. See: euphemism treadmill. It has everything to do with the specifics of ‘site’, ‘reliability’, ‘engineer’ and all the technical and social stuff Rachel talks about but on another level it is about the style of discourse, see ‘non’, ‘fungible’, ‘token’.
OP seems to think SRE and DevOps are seen as lowly by management, yet those jobs still exist here, while the "real programmers" were offshored long ago. Good luck getting a job Actually Building Stuff, almost no one does it any more.
They are seen as unfortunate cost centers. They don't add new features that are sold to clients. They don't even fix the bugs the clients care about. Making the others more efficient or preventing catastrophe is invisible work.
But, operations staff do and have always developed software, just internal software for glue or orchestration, and they work differently to regular software developers in that their customers are usually themselves to meet an internal objective of reliability, stability or ease-of-use for developers.
It's interesting to me, because I'm a bit longer lived it seems and have been tied to industry for a long time, and I see the treadmill grinding along continuously;
Sysadmins (different from system operators) were usually among the most senior developers who ended up knowing how operating systems and compute worked fundamentally. Over time this eroded and eventually you had helpdesk people being labelled as sysadmins.
Then "DevOps" emerged as a job title, which meant a dozen things to a dozen people, and the same issue happened, it was the same operational needs and the same operational solutions, just with better tools as the passage of time allowed better tools to exist.
Then SRE, which ironically was devised before DevOps was, which did exactly the same thing of trying to turn operations problems into the easier to reason about software development space.
But still, it's operations folks, people who are more responsible about an outcome and a continuance than they are about delivering a feature.
But Managers genuinely can't reason about anything other than features, so a cost center it becomes and eradication is desired, and the treadmill begins again.
So, SRE's, who embody essentially the same characteristics as early sysadmins (but with large budgets and better tools) will eventually become system operators, who will eventually become helpdesk and eventually replaced by some new title that insists that "SREs never wrote code, and this next generation will!".
The author is experiencing the exact same thing I did over a decade ago when "sysadmin" became unfashionable and everyone told me that "sysadmins can't code", despite working in teams of sysadmins who wrote precursors to kubernetes/nomad, on bare-metal, in perl on Solaris.
Will be interesting to see what the next iteration will be called, the author will just need to alter her vernacular.
reply