> To this day, I believe these new leaders had a fundamental misunderstanding of what problem Uber’s SRE organization was solving (enabling bottoms-up organization with highly fluid priorities that weren’t resolved into a single priority list anywhere to function successfully), viewing it instead from the lens of their previous experience. As a result, SRE quickly shifted from an embedded model to a very different model.
> The name was unchanged, but it was soon a very different team. That unchanged name concealed the beginning of a significant leadership and cultural shift
Unfortunately a very common problem. New person weighs heavily towards their own great experience previously, fails to understand new org and its problems.
One of the best infrastructure managers I worked with shared their approach with me: the first 6 months, they would go to all the team leaders for the teams they served and listen to the problems they faced. No solutions were proposed. Just understand the pain points first, develop solutions after learning those.
I often wonder if we should have social scientists study these phenomena and create some guidance so people don’t keep repeating these mistakes.
> the first 6 months, they would go to all the team leaders for the teams they served and listen to the problems they faced.
I once got rejected from a manager interview with a billion dollar unicorn. I said I would spend my first 3 months understanding and observing. The director retorted and asked rhetorically what if my team complained that I wasn’t doing any work?
It left a bad taste in my mouth. And I have pondered for months whether I mishandled the question, fundamentally misunderstood management, or whether this unicorn is doomed to fail-at-scale because of poor organizational intelligence.
It's the leader's job to influence team culture and structure. Culture often arrives and departs with leaders based on their values. While listening would have given you good intel, the director probably felt team members would have experienced a leadership void for 3 months. Depending on the level of morale at that time, that may have been unacceptable.
Instead you could have said how you'd work to create a culture based on your values of x, y and z. Ideally your values would align with the corporate culture and would motivate and inspire team members. You could have even told the director you'd have held meetings telling your team that you wanted to find out what their problems were. Tell them that on your watch things will get better, etc. Reassure them and provide a vision.
You don't only need to observe to discover problems, if you only ask people are likely to tell you, especially if there's new blood that might be able to create changes.
Your answer is a correct description of what you actually should do, but unfortunately, interviewing is a signaling game.
There’s a lot of people out there who are smart and can perform well in an interview, but aren’t do-ers. They don’t make things happen. You don’t want to give off any vibes that you are that kind of person if you are interviewing for a fast-moving company.
Listening a lot is definitely what you should do. Doing nothing else for all of 3 months is rarely the solution though. When you really listen, you should identify some things that you can immediately help with. Share similar stories in your past and let the talker get a breakthrough and suggest a possible solution based on their knowledge - it won'tbe the exact same one but you can add value and perspective of fresh eyes. Other times the talker has intimate knowledge of the problem and possible solutions and all they've been waiting was for someone empowered to just let them try it.
As a leader your job is to enable those people. Then as time goes by you'll get your own knowledge of the problem space and can start taking some initiatives yourself too. But there's no need to literally do nothing for so long.
People often misunderstand "empower people " with "I'm a dunce and can't help anything for months on end" as if magically on month 3 you're now enlightened. Perhaps the interviewer was checking if you understood the nuances of these situations.
> People often misunderstand "empower people " with "I'm a dunce and can't help anything for months on end" as if magically on month 3 you're now enlightened. Perhaps the interviewer was checking if you understood the nuances of these situations.
Not doing any work is a valuable service at a company like this. The technical problems driving all the work in the article were self-inflicted; they had infinite VC money, so they hired a zillion engineers, who produced a zillion engineers' worth of work, so they needed a zillion servers to run it on.
Whether or not customers asked for it, or whether the same app design fundamentally could run on a single machine under their desk, doesn't really come into it. And bootstrapped companies (Instagram or Stackoverflow for example) often do manage to run on the single machine under a desk.
> whether this unicorn is doomed to fail-at-scale because of poor organizational intelligence.
This is the heat death of any organization. Organizations can absorb a stunning amount of dysfunction and somehow keep ticking.
> The director retorted and asked rhetorically what if my team complained that I wasn’t doing any work?
I think you can interpret that question two different ways, and it's hard to tell which one it was without being in the room/being in their head. The more generous way to interpret this is that they were checking your understanding of the role a manager plays outside of pushing changes and decisions.
That said, if it was clearly the director communicating that a manager should be jumping in and imposing a vision immediately, then you would've been misaligned philosophically, and that's probably not a job you'd want even if you're qualified. Being a manager while fundamentally disagreeing with your own boss is having one hand tied behind your back.
> what if my team complained that I wasn’t doing any work?
Regardless of whether you're an incoming manager or senior IC, it's a good idea to observe your first few weeks/couple of months in a new team rather than come in suggesting big changes[1]. Yes, things might not be in an ideal situation, but there might be context you're missing which explains why things are the way they are.
[1] There a few times where big, shocking changes are necessary to reset a team in a bad situation. Most of the time this is not the case.
> I said I would spend my first 3 months understanding and observing. The director retorted and asked rhetorically what if my team complained that I wasn’t doing any work?
It's possible that the interviewer was of the "Jack Welch mindset" and worships action and making snap-decisions-- Captain Kirk style. A LOT of people see that as the ultimate model behavior for an executive. Or maybe, he instead asked that to see how you would push back to the inevitable pressure from others in the org that wouldn't respect someone that's not biased for action?
Listening is an underrated skill. We all place our attention (and thus the direction of problem solving) onto differing aspects of a situation. It takes a while to actually listen, learn those attentional perspectives, and get groups to holistically search for answers. Likewise, it takes time for team members to feel heard even when the manager answers with "You and I are not in agreement over solution X".
OTOH Uber is now in a stable place regardless of the author’s further involvement. The author’s story reminds me a lot of my own current situation - shit ton of ops load for years with everything being critical and no time to stabilize, new management comes in, get new hires, most of the old team leaves, mix of remaining old team and new team suss out remaining issues over the next year or two, team eventually falls into the normal working patterns setup by new managers. Does this happen at every big tech company?
And it's not just the first six months. It's something one ought to do all the time, to better serve the people one is there to serve.
Just going to the place where the thing happens and looking at the thing happening to better understand is a hugely undervalued job.
I don't want a detailed digital report of how my child is doing in preschool. If I want more details, I'll take a couple hours of vacation and join them in the preschool, just to observe. I don't need a product manager to tell me how a customer uses a product in detail. If I want to know in detail, I'll visit them and watch.
Instead in the corporate world we have layers of layers of management demanding progress reports and skill assessments and performance rankings and velocity audits and what have you. I'm sure this eats way more time than if they just spent a couple of hours every now and then working with and asking things of the people who do the actual work.
> I'm sure this eats way more time than if they just spent a couple of hours every now and then working with and asking things of the people who do the actual work.
> It's something one ought to do all the time, to better serve the people one is there to serve.
IME, it seems like most managers believe that you are supposed to serve them, which I think gets right to the heart of whats wrong in many organizations.
> Instead in the corporate world we have layers of layers of management demanding progress reports and skill assessments and performance rankings and velocity audits and what have you. I'm sure this eats way more time than if they just spent a couple of hours every now and then working with and asking things of the people who do the actual work.
This rings very true in my experience. Every new layer, every new TPM organization adds more overhead. They start off with “were gonna boost productivity and get things done faster!” but when that doesn’t happen the result is a demand of more process, more bureaucracy, more JIRA/story point blah blah all of which ultimately falls on the engineers who still have to do the actual work but now have to learn how to bullshit the bureaucracy.
There's a saying I learned in shooting that applies to most physical activities/sports. "Slow is smooth. Smooth is fast." It also applies to working as part of an aorganization. Process and bureaucracy might make you slower as an individual contributer but still speed up the organization.
There's an important qualifier to that saying: it only applies if the motion is relevant to the task at hand. Smoothly doing the dishes will not make your shooting faster.
That might sound obvious, but it's not always. Having commonly used tools on a central toolboard instead of at each workbench makes it seem like part of the task to go to the toolboard, when in practise it's an irrelevant wasteful side-task. You can improve how smoothly (quickly) you go to the toolboard, but the improvement in speed comes from moving the tools to the workbenches again.
Likewise, smoothly doing paperwork of the sort we're discussing here will make your paperwork faster but it won't improve your first-order work.
This is what the professors in business schools study. MBA’s often get a bad rap, but as far as the actual research goes, that’s where the current efforts at rigorously studying management and organizational dynamics are.
Is there any evidence that these academics affect the practice of business much? When I think of ideas or changes coming out of business schools and changing how businesses are run the only thing I can think of is Michael Porter’s five forces. Toyota has likely had a greater impact on how business is run than every business school professor combined, ever.
More bluntly most business professors never even try to become successful entrepreneurs. They’re unwilling to bear the costs of the market test because they don’t expect the benefits to be worth it.
Peter Drucker was influential mid-20th century and I still hear MBAs talk about his work. He wrote in the 1950s-1970s about the importance of the customer, the value of decentralisation rather than command and control, managers as influencers of company culture, etc. Standard management guru stuff now, but novel at the time.
Of course it’s difficult to know how much management change was due to him. E.g. I’ve seen claims that he influenced the re-building of Japanese industry after World War 2, but I don’t know if it’s true.
Deming became a business school professor in 1946 and was brought to Japan in 1947, where he taught statistical process control. That was extremely important but Deming functioned more as a conduit for work Walter Shewhart and others had done than anything else. A man with degrees in electrical engineering and mathematical physics with less than two years of experience in a business school before transforming Japanese manufacturing by teaching them about techniques developed for the American war effort by statisticians does not seem a great example.
> Deming edited a series of lectures delivered by Shewhart at USDA, Statistical Method from the Viewpoint of Quality Control, into a book published in 1939. One reason he learned so much from Shewhart, Deming remarked in a videotaped interview, was that, while brilliant, Shewhart had an "uncanny ability to make things difficult." Deming thus spent a great deal of time both copying Shewhart's ideas and devising ways to present them with his own twist.[16]
> Deming developed the sampling techniques that were used for the first time during the 1940 U.S. Census, formulating the Deming-Stephan algorithm for iterative proportional fitting in the process.[17] During World War II, Deming was a member of the five-man Emergency Technical Committee. He worked with H.F. Dodge, A.G. Ashcroft, Leslie E. Simon, R.E. Wareham, and John Gaillard in the compilation of the American War Standards (American Standards Association Z1.1–3 published in 1942)[18] and taught SPC techniques to workers engaged in wartime production. Statistical methods were widely applied during World War II, but faded into disuse a few years later in the face of huge overseas demand for American mass-produced products.
It says that he just transcribed Shewhart, but he also afterwards studied academically under Fisher and Neymann, two foundational figures in the world of statistics.
I don’t deny he was an important figure in statistical process control and its popularization. I’m just saying he’s a bad example of a business school professor having a big impact. He was a statistician/engineer/physicist during WW2 who was very important to Japanese manufacturing. He had barely any business school experience when he did his most impactful work, and that was on teaching SPC, an American invention not made by business school professors, to the Japanese.
Yes, but most of those books aren’t written by social scientists. Just as most of what is learned at business school is not the most important thing for running a business most of what scholars looking for the praise of other scholars publish is of limited use to businesspeople.
Businesspeople have more to learn from their own than from people in a completely different sector, just as artists like painters and sculptors learn more from their own than art historians and theorists. Much of academia is masturbatory, of no use or interest to people outside the specific sub field of academia it comes from, and the further divorced a field is from use by or contact with practitioners or consumer demand the greater the proportion.
I’m glad they mentioned the Susan Fowler article, but I’d love to hear from a guy who overlapped in Uber at that time in the same team, ostensibly “not as part of the problem” tell their side of how they saw things.
full range of feelings reading this... not what i was expecting to do this evening, but here's an attempt: i have a uniquely close perspective to all that stuff having interviewed and then later reported to the guy who first harassed Susan when she started and working with her on the team she transferred to, and ostensibly "not as part of the problem" (though you could take my inaction in this whole saga as a grain of salt.)
I started working at Uber the same month as Will after the infrastructure team passed on my resume without even a phone call and handed it off to another infra team which Will neglected to mention -- a "shadow infrastructure" team created as an artifact of technical differences and personal grudges which both doubled every six months like our head count, we were responsible for edge services' business logic written in Node.js instead of the Python 2 predominant in the rest of the engineering org. I joined as a junior engineer out of two failed ruby on rails startups with ~0 positive Node.js experience but a good enough sense of Linux systems engineering and administration to be doing release engineering and on call for those edge services three weeks later. Everyone had a similar trial by fire in that time, and many didn't "make it" a month despite having a smoother hiring experience than the story Will related.
The infrastructure org was lucky to have Will. I didn't have a manager for months, reporting to a 20-something "director" who was too far in over his head and too up his own ass to care about that, and taking day to day "marching orders" from someone who is the most technically brilliant people I ever worked with but with the EQ of a bat. This leadership debt was a recurring pattern through out the whole engineering organization throughout the years of insane growth, even after the "new SRE" came to be. But that toxic division and leadership debt and the "shadow" infrastructure organization itself festered in to some of the awful behavior which ended up exploding in the saga with Susan et al. We promoted two senior engineers on my team in to management who proceeded to compete to be the first manager to hire a woman in to our 100 man, 0 woman engineering group. A year and a some change later Susan ended up reporting to one of them when we had all finally made up under the new SRE leadership.
I have some very surreal memories of these times, some dear memories of the brilliant folks I worked with, some unspeakably regretful memories of both, and some of this I had completely blocked out. It's hard to believe in the two year arc of this blog post, we hired endlessly (occasionally poorly, often brilliantly), we brought up four data centers including a reverse-engineering of our tech stack to launch two from first-principles in China, rebuilt entire parts of the company's architecture from empty git repos to a thousand plus servers in a quarter, and nearly burst at the seams doing it.
It's an irreplicable and irreplaceable experience but I can't say I'd go through with it again. I'm not even sure I'd wish it for someone: after three years on these infrastructure teams and two more on the privacy team after #deleteuber and starting work on GDPR compliance 6 months before the enforcement started, I left. two years later, I am still recovering from burnout spending down my little dragon's hoard healing a complete lack of interest in technical work.
I'm glad that Will has been distilling and expressing the lessons we learned so that others can go through this sort of experience in a less extreme and painful fashion.
Thanks for sharing this, and I wish you well in regaining your interest in technical work. Having been through burn out myself, I ended up changing career path so I could regain enjoyment in my hobbies, and in the end was able to once again find excitement in technology. Take care of your self :)
I worked at Uber and was around/in SRE for around the same time as rrix. It was truly a wild experience, good and bad, but I’m still burned out too, and it’s been almost 3 years since I quit.
It's an article on a personal blog targeting a specific readership familiar with the term.
More broadly, I think the level of granularity at which it makes sense to define terms (Mesos, Kubernetes, even Uber) has to roughly match the level of familiarity of the reader that will get something meaningful from the piece.
> It's an article on a personal blog targeting a specific readership familiar with the term.
He's the CTO of a "wellness" tech company now and writes broadly on the topic of engineering and engineering management. The folks who will read this likely are familiar enough with these technical terms to glean knowledge, or at least to gloss over them.
> The SRE organization remained. The SRE organization that we’d built was gone.
Sad :(.
> That’s not to say the experience was entirely good. Working at that era of Uber extracted a toll. Many others paid a much worse price; I paid one too. I was in way over my head as a leader, and I struggled with it. On a work trip ramping up the Uber Lithuania office, my partner of seven years called to let me know they’d moved out. Writing had been my favorite hobby, but I gave up on writing during my time there. Some chapters enrich our lives without being something we’d repeat, and Uber is certainly one of those chapters for me.
As someone very early in their career, I would still take that experience in a heartbeat. Just by observation, I strongly believe that kind of experience is both extremely rare and extremely valuable.
By my comment I meant that I would be willing to suffer some negative consequences for that kind of experience. What those would be, I have no clue. I also don't currently have a partner to lose, so I discounted that particular thing.
That’s understandable - I’ve just got married to my partner of 7 years and my priorities have slightly shifted :P I get the idea of trading being uncomfortable for quick progression though!
Will's a great communicator, and quite personable: I know this because I've worked with him for a few years. I don't think it's bravado, SRE is a common industry term.
So are all sorts of terms, but in the sciences at least the rule is still that you introduce the full term and its acronym the first time it is used, before switching to the acronym for the rest of the paper. Sure, everyone doing electrochemistry knows what EIS means, but you still write "electrochemical impedance spectroscopy (EIS)" before you go on.
You can adjust the strictness for audience and intent perhaps, but it's still sloppy not to.
this acronym is ubiquitous in the industry of this site and anyone who follows his blog. it's not unreasonable. I'm shocked someone commenting on Hacker News doesn't know it.
I’m in the industry. Have been for years. I still have no clue what an SRE is. Like what do they do? Debug broken services? Unless you worked with people in that SRE role it’s not clear how you’re supposed to know what they do.
This article could have helped me with that but it didn’t fully. It sounded more like a rant that will totally make sense maybe to 50% of infra or devops people.
Maybe it’s me though. I actually still don’t know what a true product manager is supposed to do lol.
in most places SRE is title-inflation for DevOps, but in a less cynical interpretation it's the idea of 1) having reliable enough base-layer systems that product engineers don't need to care about servers or debug networks or in the most extreme cases log in to production machines themselves 2) putting engineers who have strong systems engineering, software engineering, and debugging skills in the same seats as the most critical product teams to carry the pager and fine tune and optimize those services without reporting to the product teams' leadership whose goals might be in conflict with engineering reliable systems.
This really only happens well at Google AFAICT.
the "new SRE" team Will mentioned was a bunch of ex-Google Infra folks who brought a holy book [https://sre.google/sre-book/table-of-contents/] with them and asked us to institute these practices whole-cloth. some of them had already been phased out of google by then but how could we have known
Just because it's been phased out at Google doesn't mean it's not worthwhile. There's a lot to be learned and a lot of value to be gained out of building GFS before Colossus (Though that example is getting out of date - I would no longer recommend that anyone try to build a GFS-Patterned distributed filesystem, but for several years after the party I attended to celebrate the final death of GFS I would still recommend that new-unicorns looking to build their own distributed filesystem layer go for the GFS Architecture first).
There's a lot of techniques and tools that are no longer in favor at Google not because they're not the best available SRE tools, but because that's how far Google has fallen from grace.
Site Reliability/Infrastructure/Production Engineer typically have three focusses:
Managing and scaling "software infrastructure" (Kubernetes, cloud services, databases/caches, job/message brokers, etc). SREs tend to have expertise in these systems that generalist SWEs do not.
Incident response. Many production incidents are not caused by a bug in application code, but due to some cascading failure of some piece of the software infrastructure, often due to unexpected load, performance regressions or network/hardware outages. Because SREs have a broader picture of how the various pieces of infra fit together, they're best suited to start root cause analysis and determine whether it's an infra/code issue or some combination of the two.
Devops/developer productivity. Many SREs work on build/release systems, internal tools and enforcing best practices.
sre means different things to different people, it’s an overloaded term like devops. but I feel like he spent a lot of this article defining what it meant for this team - all that about embedding a dedicated person is their definition of sre
Every org is different and I like to point out this article [1] to illustrate it. SRE is really what your company wants it to be. If you have full-cycle devs, then you're likely doing more platform work. If your devs lack the deploy and operate part of the cycle, then that's likely where the SREs will pickup the rest.
This can help understand the many approaches out there, many of which are terrible and the rest are generally suitable only to some cases and not others:
I assume it's like "Enterprise architect" only maybe less so. For EA I've heard that no two companies ever mean the same thing by the term (I barely even know how we define it, and that's the team I'm on), where SRE I think is supposedly copied from people's half-understanding of what Google says it does.
This is wrong. SRE is an engineering org, NOT an ops or support org.
While it's common practice for SRE teams to share some of the operational burden, it's for the purpose to find risks and vulnerabilities and then engineer solutions to those. I have never seen SRE do support.
SREs should know how to build and run reliable and maintainable systems. Architecture and design are the best tools for that.
SRE/Production Engineering are common terms in the tech industry... How do you come to the conclusion that the author is the one with a "very very small circle"?
I've been an engineer and manager in a software company in europe and have only heard about what an SRE is. Now "software engineer" is a common term, but not this SRE thing.
Perhaps it's only common within silicon valley circles, that being said, the fact that the author didn't define the acronym didn't help either.
Honestly it'd be like explaining what sysadmin means nowadays. It's so common I am surprised there are sectors of the industry that don't use the term yet! But it makes sense that a person who's been blogging about SRE-work for years doesn't feel the need to define SRE. It's been around as a role since the early 2000s.
As a person whose job title is SRE, it’s definitely far less known than sysadmin is (where many non-tech people I talk to know the title) and is still a less known title than DevOps is. Even if the title was created 20 years ago, I don’t think SRE really started spreading as a term until the last few years.
That the title wouldn’t be well known in Europe, or fully known in the US isn’t surprising to me. I’d probably expect someone who works in the start up world to know it, and would expect anyone working at the FANG compensation level to know it, but I don’t think it’s super common among the vast middle of companies.
(And for reference, I had know idea what SDET meant until I looked it up right now, although I certainly have worked with Software Development Engineers in Test. Probably an even more obscure term?)
> That the title wouldn’t be well known in Europe, or fully known in the US isn’t surprising to me. I’d probably expect someone who works in the start up world to know it, and would expect anyone working at the FANG compensation level to know it, but I don’t think it’s super common among the vast middle of companies.
I guess that's fair.
> (And for reference, I had know idea what SDET meant until I looked it up right now, although I certainly have worked with Software Development Engineers in Test. Probably an even more obscure term?)
I'm not sure - I work as an SDET and have found work as an SDET in Oxford and London England, remote in other European countries, and am now interviewing for several companies in the Bay Area. It seems pretty universal to me! But again, maybe that's just down to the bubble one is in. FWIW, it just means a software developer who focuses on testing as opposed to frontend or backend, so managing test infrastructure, reporting, writing frameworks for devs to use for testing, etc etc...
I'm curious, have you heard of any of these: Data Engineer, Machine Learning Engineer (MLE), Staff Software Engineer.
I'm surprised you've never come across the term SRE/Production Engineer before (Assuming you meant to say "have only __just__ heard about what an SRE is"). I live in New Zealand and I'm pretty sure I knew what an SRE was before I even joined a big tech company (Not a SV one btw).
I’ve worked in tech in NZ for 7 years or so, never actually met someone who calls themselves an SRE. Obviously I know the term, but IME we don’t use that title here
The title might be inflated, and maybe the egos too (but that's true across engineering disciplines), but I can tell you the pay certainly isn't inflated. While I was in Operations, I on average was paid 1/2 of what an equivalent level SWE would get paid in most companies, this was typical across the industry, and I'm guessing still is.
Because SRE came out of Ops, which came out of SysAdmin, there's a cultural expectation both by other engineers and by management that you are essentially a digital janitor, and if you can code it's only minimally so that you can write glue scripts, rather than being equally capable as an SWE. The reality is that most of my peers were /better/ developers than equal-level SWEs, because they had a stronger understanding of lower level systems and their interactions than SWEs who spent all their time working in abstractions.
As another commenter notes, I would have been perfectly happy keeping the "SysAdmin" title my entire career, and while I held the title for awhile, it always grated on me when people were hired as DevOps Engineers (I would often tell managers "DevOps is a philosophy, not a job title."). At least SRE accurately describes the work in some abstract sense, you are an engineer and your primary duty is the reliability of the public facing site and its dependent backend services.
Frankly, the amount of derision I see here and in other tech circles from SWEs against basically every other role in tech makes me think SWEs are the ones with the ego, for that matter.
> Frankly, the amount of derision I see here and in other tech circles from SWEs against basically every other role in tech makes me think SWEs are the ones with the ego, for that matter.
I can believe this. It’s especially funny when building a reliable system today requires a really basic understanding of few low level system concepts (eg graceful termination, signal handling, healthchecks, DNS) and most SWEs I work with simply refuse to learn these things.
> most SWEs I work with simply refuse to learn these things
Are the SWEs over worked with unrealistic deadlines? Also how did the SRE present this information to them? I think both factors can drastically have an effect on the outcome.
For example a small team who is already on the brink of being overburdened getting tossed a few links on Slack that says "please make sure you know about Kubernetes graceful shutdown timeouts and signals" with no other context is going to be met with resistance because now it feels like you're giving them another job. If they have no prior experience with that sort of thing it could take days to partially understand it. That's eating up time for their sprint, etc..
Alternatively, you could write a nice bit of documentation in private explaining the problem and why graceful shutdowns are a good thing, how it aligns with being able to deploy new versions of our apps, how they can create a better user experience for the apps they're building along with prepare anything you can to get as much information about your apps as possible (longest page responses, etc.), then spend a bit of time with your tech lead or someone who knows your applications well to iron out good values for all of your services (this assumes a worst case scenario where you don't have these metrics logged anywhere yet). Then wrap things up with a 5-10 min show / tell to share this knowledge with the devs so they have an awareness of it.
In my opinion that maximizes everyone's time, the SRE doesn't even need to know much at all about the app's business domain too since it all boils down to how long an app might take to exit. Knowing a bit about the tech stack can be researched for the document you'll write too, such as mentioning how you can hook into the shutdown process of the app server to potentially execute code during a graceful shutdown, etc..
I've found that after all of that, it was embraced in a very positive way with devs now looking for ways to improve the app to better handle the app shutting down at unknown times without data loss and minimal user disruption. This is now a skill set the devs can take with them anywhere around having experience building robust and resilient applications. Everyone wins (customers, devs, SREs, etc.).
I agree with this. Making these concepts easy to understand, with examples on how to do this concretely in the company’s custom setup goes a long way in ensuring that developers both understand these things, and like you pointed out, make them evangelists for implementing these things.
That's not the fault of people doing the work, that's what the industry is doing to us.
I get increasingly irritated with the myth that 'DevOps' are any different than the sysadmins of 10 years ago.
"But DevOps can code", yes, so could sysadmins, in fact, terraform, ansible, vagrant, saltstack, chef, puppet etc;etc;etc are all made by people who held the title of sysadmin when they were written.
In fact even the term "DevOps" was originally from a conference, where the idea was that "we can do systems administration in an agile way" -- NOTHING to do with coding, everything to do with getting developers and sysadmins working closely together in an iterative fashion.
I would personally be very happy being called a sysadmin, but doing so is career suicide, because we as an industry have decided that sysadmins are somehow braindead, and that you really need "SREs" or "DevOps" -- despite the fact that these are the same people.
What gets my goat even more is that people hate on sysadmins because of corporate culture, echos of centralised IT organisations that said no to everything.
But we're doing exactly the same thing with these new titles now. It's a joke.
I think the difference is that there were (and still are) many sysadmins who don’t code. So the shift was that there was an expectation that all ops people could code.
I don’t think it’s solely a Unix/windows thing, but I still work with windows admins who manually set up machines. Like they set up a vm in azure and then spend hours or days pushing buttons until it’s “ready.” For one vm.
A person like this should not be in ops. They should be trained and improved to automate this by code.
So the world is full of “there’s always been devops” because 30 years ago sysadmins were scripting out their stuff in shell scripts and whatnot. And also “we’re moving to devops” where people still manually do system tasks.
Nowadays some companies have dedicated devops teams who say they don't code.
I've been to meetups where I met staff from devops teams who talked about wanting to learn to code in a language like Python someday, to switch into a developer role, and whose response to me telling them I was a C developer was "wow that's really hardcore isn't it".
So perhaps there isn't an exceptation that all ops people can code, and perhaps those particular devops teams are not much different from the sysadmins of old.
It’s weird. I always thought DevOps meant a philosophy of using traditional agile software development practices (source control, continuous delivery, testing, etc) to make ops processes better. So config as code, instead of maintaining unique snowflakes; cattle over pets; deploy pipelines instead of manual jobs etc.
But the term seems to have been diluted into almost meaninglessness as people seem to mean very different things when they say it.
SRE is related but my take is it’s an evolution/reaction to DevOps. Some took devops to mean “developers do all the the ops” and SRE puts on-call back onto a dedicated role (so back to the NOC and old school Ops) but keeping (and perhaps even further-emphasizing) the commitment to engineering work in service of automating operations at scale.
It's considered prestigious by other engineers. Outside engineering, not so much.
At one point in my career I moved from a technical presales role to SRE. The engineers congratulated me on the promotion, while the sales people all thought it was a demotion. (And in terms of career progression it probably was.)
I'd love to see some evidence of that. I have never been in an organization where Ops folks and SREs are treated as if those roles are prestigious by other people in the engineering org. Generally they're treated like digital janitors, and assumed to be bad at software development. I've literally had an SWE /at work/ say to my face that "You're in Ops because you weren't smart enough to pass a code interview" because they were upset I had pointed out a flaw in their design and were essentially making an appeal to the nebulous authority of their title.
SRE is definitely more prestigious than being in technical presales to other engineers, but that's because to other engineers any role in the sales org is not real engineering (even if it is), and to sales people anything in the engineering org is a demotion because it means you are joining "the basement people". Sales people aspire to becoming a CEO, not to building something that makes a mark on the world. Engineers want to build the thing, they don't care /how/ they have to go about doing it (whether that's being a founder/ceo or working in the basement).
It is presented as prestigious in order to hire at least somebody into it. While, depending on place, work may be interesting the management usually is more rotten than say in dev.
Don’t know in the particular case, but it’s very common at a VC-fueled company for the VC to be pressuring execs to hire “seasoned management”.
The hypothesis appears to be spend money to bring in people who know how enterprises work. Instead of people “just” managing or managing managers, get “leaders” who are “proven”at managing business functions or lines business through managers of managers. So, pay big bucks to people that have risen to that size management before.
Of course this means “layering” the Wills of the world with LOB or functional managers who by virtue of having been at a big enterprise predating the unicorn thing, have no clue what the function needs to be (no ‘north star’).
Next thing you know, thanks to this overweening managerialism, the unicorn-turned-Leviathan is hiring “transformation” executives. You know what’s next… the next unicorn still made of Wills, ready to eat Leviathan’s lunch.
Landing a job you have no qualifications for, with no deadlines, building internal tools at your own pace, and becoming a multimillionaire in the process? Sounds like a pretty good gig!
It seems... a little...something. The word "founding", "founder", "founded", at least in technology circles (the relevant circle here), has historically been used to refer to starting a company. All of a sudden people are saying things like they "founded product X at Google" or something. Everyone knows the historical usage of the term, so why use it when "started" or "created" or "stood up" or "established" would work?
I can never be interested in any story about Uber after hearing about the (early) culture there. This is literally the company that started the #MeToo movement and showed us how bad the VC/SV culture can be, and can be covered up with money.
This story actually does mention that. And while I understand the desire to dismiss the whole company, it should be remembered that companies, like any communities, are large and diverse.
> "In the interview with the digital news platform, Khosrowshahi said the 2019 murder of Washington Post journalist Jamal Khashoggi shows that "the (Saudi) government said they made a mistake. It's a serious mistake, but we've made serious mistakes, too right?"
Tech isn't some neutral zone free of ethical quandries. Google leadership pushing a Dragonfly contract with China is another example. Israel selling their Pegasus spyware to the Saudis so they can round up and murder dissidents is another example. Just because it's profitable doesn't mean that justifies doing it.
Uber was an international company by 2012. The #metoo movement, specifically in regards to weinstein didn't pick up until around mid 2010s as far as I remember. Yes technically it originated earlier but there was an entire decade between the first reference and the actual movement.
> The name was unchanged, but it was soon a very different team. That unchanged name concealed the beginning of a significant leadership and cultural shift
Unfortunately a very common problem. New person weighs heavily towards their own great experience previously, fails to understand new org and its problems.
One of the best infrastructure managers I worked with shared their approach with me: the first 6 months, they would go to all the team leaders for the teams they served and listen to the problems they faced. No solutions were proposed. Just understand the pain points first, develop solutions after learning those.
I often wonder if we should have social scientists study these phenomena and create some guidance so people don’t keep repeating these mistakes.