I can't help but wonder, with the increases in attrition across the industry, are we hitting some kind of tipping point where the institutional knowledge in these massive tech corporations is disappearing?
Mistakes happen all the time but when all the people who intimately know how these systems work leave for other opportunities, disasters are bound to happen more and more.
I've been an SRE on a tier 1 GCP
product for over three years this is not the case. In my experience, our systems have only gotten more reliable and easier to understand since I joined.
It's not like there are only a few key knowledge holders that single handedly prevent outages from happening. In reality, you don't need to know shit about how a system works internally to prevent outages if things are setup correctly.
In theory, my dog should be able to sit on my laptop and submit a prod breaking change without any fear that it will make it to prod and damage the system because automated testing/canary should catch it and, if it does make it to prod, we should be able to detect and mitigate it before the change effects more users using probes or whitebox monitoring.
This is happens for 99.9% of potential issues and is completely invisible to users. However it's what's not caught (the remaining 0.01%) that actually matters.
From within AWS, it just feels like we push people too hard. Service teams are too small relative to their goals, sales teams have unrealistic growth targets and also double as support for needy/incompetent customers, and professional services have billable hour requirements which are as high as any major consultancy and with additional pre-sales support expectations.
Strict adherence to the "hiring bar" means we fail to bring in good people who aren't desperate enough to act out the cultish LP dance during their interview. Hiring new grads seems to be the only area where growth is not stalling - but that can obviously only help so much.
My team is hiring for 2-3 people and we are being buried alive without that growth happening sooner - but I can't in good conscience recommend this place to anyone I respect or like.
>"Strict adherence to the "hiring bar" means we fail to bring in good people who aren't desperate enough to act out the cultish LP dance during their interview."
What is the "cultish LP dance" here that is weeding good people out?
>"My team is hiring for 2-3 people and we are being buried alive without that growth happening sooner - but I can't in good conscience recommend this place to anyone I respect or like."
I appreciate your candor. Are you in a dev role or are you on the SRE side? Is your description true across pretty much all teams/services then?
> What is the "cultish LP dance" here that is weeding good people out?
The "culture fit" interview process focuses on leadership principles, so lots of questions like " tell me about a time when you went above and beyond for a customer". Being yourself will get you nowhere, you need to research the questions and the script that is expected of you.
> What service does your team work on?
I'm a partner-focused SA, so not a developer and not aligned to a particular service.
I interviewed with AWS about a year ago. Knocked the programming/system design interviews out of the park, but it was clear the LP interviewer simply didn’t believe I was being truthful about the example I gave (included a period where my team had no direct manager, which is abnormal). He also had a programming question for me but we didn’t have time for him to explain it.
No offer, recruiter emphasized that they were halving the “cool off” period for me (so I could interview again soon), and maybe they do this for everyone, but it’s clear there was one interview making the difference here. Interesting that this is apparently a common problem.
When I interviewed, the instructions I was given by the recruiter were very explicit. For the Leadership Principles, I did my hardest to come up with at least two items for each LP, and of course they had to be in STAR format -- Situation, Task, Action, Result. To that, I added a URL to back up my story for all of the cases where I could.
So, when I used the example of designing and starting to build the replacement e-mail system for ASML for literally zero additional cost [0], I pointed them at the URL for the Invited Talk that I gave based on that work. And when I used the example of when I broke all e-mail across the entire Internet, I pointed them at the URL for the article that was published in The Register. When I talked about what I had learned about Chef and DevOps, I pointed them at the invited talk I gave in Edinburgh entitled "From zero to cloud in 90 days" and the accompanying tutorial I taught called "Just enough Ruby for Chef". And so on.
I really feel that having the URLs to backstop my stories helped me sail through that part of the interview.
In my case, I wasn't being hired as an SDE, so there wasn't much programming tests they wanted me to take -- one of their senior developers did connect me to a shared coding platform, which we really just used as a shared whiteboard. He asked me some questions on how I would solve some problems, and I used my 30+ years of experience with Bourne shell and bash to show him stupid first order solutions to those problems and then we talked about what some of the second and third order solutions might be.
The longer I work at AWS, the more convinced I am that everything depends on the people you're working with. There are good teams and bad teams. There are good managers and bad managers. And if you can find a good team with good managers, then you're golden.
In this respect, I don't think AWS is materially different from any other employer I've known.
[0] They had already bought all the hardware, including some stuff I scavenged from a closet where they had been sitting new-in-the-box for a couple of years; the OS was covered under their site license; all the application software was open source; and my time was free because I was already there under another contract)
Can I ask what role were your interviewing for where you weren't asked to solve Leetcode style problems on a whiteboard? I thought that was standard for AWS. You mentioned it wasn't SDE but I'm assuming it's still a DevOps/Technical role you have based on your STAR examples.
Who would interview with a company again after this nonsense though? I would hope candidates view of said "cool off" period is that it is a permanent one. The arrogance of these recruiters is astonishing.
It was unpleasant but at the end of the day it was just one guy.
What gets me is that every recruiter who reaches out (hundreds by now I estimate) wants me to complete their timed coding test to qualify to interview again.
I found Google’s interview process comparatively much more respectful (although more demanding), and have been happy working there instead.
Do you mean AWS recruiters are asking you to take timed coding tests? Or in general? This is a thing for people with experience? What value or skills do recruiters actually bring? It's a wonder anyone gets hired at all with these jokers as gate keepers. Good on you for getting a better gig. Congrats.
Yes it’s a standard part of the interview cycle at Amazon. Usually two questions in an hour or something, and you code solutions which the test platform grades. They’re on par with leetcode easy IME, but I just find it to be a pain so I won’t do it again.
I’m sure it’s helpful to weed out applicants who actually cannot code, but what’s the point in doing it twice?
As some dude from Baltimore who has just picked up a second gig there. It seemed to me that these are normal questions asked of you in most interviews. In my daytime position at the first gig I have interviewed technicians and have asked similar questions of them. It's not about culture fit. It's about finding their answer to basic customer service questions. It would make sense that a customer obsessed company would be focused on a set principals they have found success in.
Speaking of the research, the recruiters email you the principals and specifically mention to you to review them and to consider them during the interview process. They even send you a document about the STAR method of interviewing to help you have a smoother interview. To me as a guy from Baltimore who doesn't know anything half the people here do. I don't think the interview process could have been smoother.
They are normal-ish questions and I can see where the STAR guidance is better than getting no help at all - I just think the process is too rigid overall. I've interviewed a handful of people for our team in the last year - all had the right background and technical expertise, but since they didn't have an example from their career which matches up with the LP questions they were asked, they were knocked back. All of them would have been "bar raising" from a technical standpoint, but since they were not demonstrably already "Amazonian" in their mindset, we couldn't hire them.
Customers and partners, the latter basically taking our ideas and selling them as a product - the flow of IP towards Amazon isn't always as clear-cut as people believe :-)
Broadly though it is a pre-sales role to help people get started, followed by ongoing guidance as the customer iterates (this is the part which often turns into free support).
There is nothing cultish about the way Amazon interviews. If you are a good engineer with a relevant background who can speak english you will have no problem passing these interviews. I'm not sure why you are framing it as cultish.
I interviewed with AWS about 7 months ago and got an offer. I had multiple recruiters emphasize the LP stuff. I prepared, and there’s no way I would have passed without that preparation.
In my experience, they also give the toughest programming questions. It is a lot of prep overall.
Native English speaker here. I was denied for unspecified reasons a couple years ago. At the time it was perplexing, as I had pretty good answers for all their questions.
Now I work at one of the slightly more sane FANMAG (to include $ms) companies.
Pretty sure I dodged a bullet, maybe the engineering manager spared me because he liked me more than I realized.
I would be curious to know which one is considered more sane these days. I feel like I've heard enough negative things about the culture at all of them at this point.
Walk down the rows of cubicles chanting "Nonne avertis et conare iterum?" (best I could translate "have you tried turning it off and on again" to latin)
Nothing makes me feel more like a wizard than learning some new command line skills. Just string together some short, seemingly indecipherable symbols, and... magic!
Not far off. The “golden age” of humanity was shattered long ago, with the mortal wounding of the god emperor, and knowledge of most of the greatest technology was lost.Millennia later, a cult has grown up that both worships and maintains technology as having machine spirits, which are somehow linked to the machine god itself. That god may or may not be the same or related to the god emperor of mankind, depending on the interpretation.
Honestly the lore of w40k is quite fun to read, if you’re into dystopian and fantasy sci-fi.
FWIW hardline tech priests view the machine god as separate from the emperor. Hardline imperium officials view the emperor as separate from the machine god. The official party line is that the emperor and the machine god are the same, with the emperor perhaps being an avatar.
It seems like both sides are fine to have them be reconciled, but it's an important narrative gadget that can be used to get humanity to fight itself in-universe.
Also interesting is that humanity's "lost" technological progress seems to eclipse some of the other races in the W40K narrative, with even the Tau (space dwarves with robots) and Eldar (space elves with crystals) freaking out when humanity brings giant robots, because the sheer physical impracticality of a gigantic human shaped robot is noted, with nobody aware of how they continue to work.
> gigantic human shaped robot is noted, with nobody aware of how they continue to work
BattleTech had something similar: it was considered cheaper to keep replacing humans than to replace the mechs, because hardly anyone knew how to repair or build mechs.
Outside of the mega-fang industry, I’m wondering the same thing.
The Great Resignation had to have taken a huge toll on regular enterprises. There are probably going to be some unlucky (or lucky, depending on how hardcore they are) people in the position of maintaining aging legacy systems and retrofitting them into the future.
COBOL, for example, is becoming a lucrative language for people in the financial and insurance industries. Legacy Java is all over the place, I’m sure. Legacy .NET is in the middle of a huge industry retrofit, (.NET 5 was the official post-legacy rebrand and they’re on to .NET 6+ now).
>The Great Resignation had to have taken a huge toll on regular enterprises.
The Great Resignation was people leaving jobs they didn't want (front of house/service industry/gig) for jobs they did want (career-track jobs) after those jobs resumed hiring again after the pandemic settled down.
Labor force participation went up not down due to 'the great resignation.'
You're right, but that's been true since the beginning of the tech boom (but isn't exclusive to tech) when no one works for a place for several decades. Companies weather this in different ways but attrition has always been around.
What's causing people to believe that the latest round of attrition is any different?
I'd speculate that perhaps more senior people are moving, and/or a greater overall rate of attrition combined with much more complex technologies and organisations. In other words, it might be harder to become good at jobs now, and fewer people stick with them. Just a hunch but definitely seems to be where the incentives point with loyalty penalties and tech bloat.
In my experience, education certainly seems to not have kept up with computing, at least in terms of having massive academic-industry partnerships like a doctors residency or a trade apprenticeship .
I’d definitely agree that it is probably harder to become good in older organizations - the technologies are probably generations behind the current state of the art and the learning curves are high for those older technologies.
Just thinking through keyboard, but it’s probably reaching the point where enterprises need to evaluate aqui-hire or outsourcing entire development departments due to attrition, due to the incentives to leave for regular employees.
Promoting high employee turnover could actually be a very effective strategy for a company's long term sustainability.
If your company is hostile to people sticking around for decades, then it makes it that much less likely that you end up stuck with an machine that relies heavily on poorly documented tribal knowledge that's likely to start falling apart as your core people start cashing out.
Similarly, it makes it much easier to make the business case for switching to outsourcing and insourcing business models. That makes it much less likely that you have to worry about losing money to people who “work from home”.
The COVID death counts are hopelessly over-counted. This is why there's a cottage industry of people pointing out things like "COVID deaths" which mysteriously also suffered from being murdered, or drug overdoses, or undiagnosed leukaemia.
Then you get into the problem of care homes being authorised to report COVID deaths without any testing or formally trained opinion at all.
It's shocking to see such a statement so far into the pandemic. This is solved and known already, and while complicated, we've figured it out for some time. We can easily see the massive amount of deaths when we look at excess death numbers. Covid deaths are, if anything, undercounted. To believe anything else at this point is to bury your head in the sand and avoid all scientific evidence and medical consensus.
I won't argue this topic since this thread is about AWS, but the article that you quote says "Experts calculate the excess death rate by comparing figures for a given period with the average for that same period over several previous years."
That's not how experts calculate excess deaths, since that algorithm produces totally bogus answers. Here is how real experts compute excess deaths, compensating for seasonality and population changes: https://euromomo.eu/how-it-works/methods/
Excess deaths is a perfectly valid statistic to base policy off... if policy makers maintained a hands-off approach and didn't radically change society through out the pandemic.
Can you in good conscience say that the typical rate of death remained steady while the following happened:
* a majority of given populations remained at home (lock downs - meaning no travelling to work in multi-tonne death traps)
* practised increased hygiene protocols (masks, more frequent hand washing)
* did not visit elderly homes (at least, less than usual)
* many people were reluctant to get timely health care (due to fear of catching COVID from medical facilities)
* ate worse and excersied less
* infected elderly patients were sent back to their nursing homes (to typically infect the entire facility)
There are so many confounding factors that on the face of it, should result in a radically different death profile... and almost every country faced the above to different extents.
Anyone claiming to be able to work out the actual number of people who died from COVID-19 from excess deaths is being disingenous at best.
Yes, absolutely.
Within my own org of ~50 people, 15% have resigned/contracts ending during Q1 (after 15% in Q4).
Of the remaining 85%.. 20% have been around since before COVID / 65% joined during COVID.
Of our senior engineers & team leads, 70% have joined in last 6-9 months.
Only 3 full time senior engineers with 2 years or more tenure.
We've grown during COVID but we've also just burned through people.
Turnover has hit the point where we stopped doing going away zoom toasts.. people just sort of disappeared.
Newest member of my team has been in the company for 6 years and on my team for 4. I was in the local pub the other day and there was a retirement do for someone who had been here for 35 years, which certainly isn't exceptional (40 years is more noteworthy, and I've known a couple of people who made it to 50 years)
If we can have meeting after meeting for working groups, agile kayfabe, status reports, etc for hours recurring weekly.. we can spend 15min on the phone saying thanks, good luck, and see you again a handful of times per year when a teammate leaves.
I think this is a transient issue. When you're in growth mode you make a huge series of hacks to just keep things running and then when you leave.... well, it's a problem. But if the business is robust, and lives beyond you, what replaces your work is better documented, better tested, and maintainable.
That's the dream. Obviously there are companies that sink between v1 and v2, but that's life.
Fundamentally I think the cloud business is robust, it's a fundamentally reasonable way of organising things (for enough people), which is why it attracts customers despite being arguably more expensive.
I've been in this situation in much smaller scales, and yes, you'll see massive drop in productivity but that's the cost of going from prototype to product.
No what I'm saying is that AWS is transitioning out of early stage growth, and so they're seeing all the issues you see where the original people who hacked stuff together are moving on and you have to start really focusing on making things stable, but the beginning of that process is always everything getting unreliable.
Mistakes happen all the time but when all the people who intimately know how these systems work leave for other opportunities, disasters are bound to happen more and more.