I am sorry but as an application developer, I think this is all wrong. I'll thank my infra team today for not being assholes like this guy.
1. Application developers are your users. If we application developers took offense every time a user tells us that things are not working, we'd be pretty pissed off all the time. Educating and empathizing with your users is part of your job.
2. Talking about how it was better before: QA teams sure do buffer a lot of crap. They also cost a bunch and slow down time to release. Yes agile is causing problems. The bureaucracy and stiffness of organizations before agile was no nirvana either.
3. By your own affirmation you treat applications as black boxes that should be deployed using a runbook that should just work. This is ridiculous. Application's ownership is shared between everyone who works on it.
4. And yes, as developers, networking or physical drive space are things that we tend to abstract away. Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs.
This all feels like someone who used to not do anything suddenly being asked to take part in what's happening...
EDIT: apologies for the strong language and sounding like an asshole myself, but I certainly feel irritated when someone takes the time to write a 5000-word article complaining about whiny developers who thought they could own ops but actually don't know anything and scream for help when they themselves are the cause of all evil.
I find this response surprising, as I fully agreed with TFA.
I've had an Ops team that had a similar attitude, and they did a lot to help me become a good developer. Part of that was requiring that I come to them with identified problems. "Hey I'm getting this error, can you take a look at a stack trace in a language you've never used and tell me what's wrong?" would have gotten me booed/laughed out of the office, and for good reason.
It's not at all unreasonable to expect the developer to come around instead with "hey my application can't write to this NFS mount like I expected. It's running as $user, the permissions look right but I'm still getting permission denied. Any thoughts?" (A real situation I ran into, turns out SELinux had further permissions I was unaware of, and my Ops lead Chip was happy to show me what was what.)
Yeah, we're all on the same team, and that cuts both ways -- Ops should ensure Dev has what it needs, and Dev should make some actual effort to understand the landscape their production applications run in. Which seemed to me to be the entire point of TFA.
I work in healthcare IT and many times have heard an “I’m just a nurse” dodge, and I respond in just this fashion — “Good! Then you have just the skills needed. In fact you don’t even have to diagnose the problem, just document what you did, what happened, and how that differs from what you expected.” It works pretty well.
The author of the article had a great point. Developers often DO treat me as an IT person. The number of times I've taught a developer what an SSH key is or how to install a python virtualenv on the dev box blows my mind.
On the other hand, everything else he says is wrong. Devops isn't "ops gets out of the way and lets developers do crap and nobody owns it when it breaks" its... dev and ops WORKING TOGETHER. If you don't know what their app is doing YOU ARE NOT DOING YOUR JOB. (On the same note, if they don't know anything about how the system is deployed or the like, they're also not doing their job).
The rest of the article is literally about complaints about that.
The point that you can't monitor an app you don't understand, and developers don't want to push out crappy code, is the whole reasoning behind devops. The people no longer being in their own silo, but instead working together as a more holistic team to own the thing from end to end. If you stay in your silo but add automation, you are 100% going to have pain.
There is another pattern they could try, thats the platform model. In that case ops can stay in their little silo and present a platform with apis that developers can build on. Its what you're talking about here, and that can ALSO work. But its a different model. The old style of ops they would do as they were told, while trying to restrict anyone from changing anything. As a platform team, now they're delivering a product. They should be talking to users, and judging throughput, and iterating quickly, and being customer focused... basically, acting exactly as the developers are supposed to be. I am a strong beliver that the platform team should have product owners and customer metrics the same as the developers- heck, if they like QA, they can come up with a QA process.
But yeah, I've seen a lot of low effort finger pointing in orgs that pretend to do devops from inside their functional silos. That point, the one in the title, is a great one.
> Part of that was requiring that I come to them with identified problems. "Hey I'm getting this error, can you take a look at a stack trace in a language you've never used and tell me what's wrong?" would have gotten me booed/laughed out of the office, and for good reason.
This a thousand times over ... If you can train your users to do this any customer relationship will be better off!
You, sir or madame, are a good job. I like working with people like you. I want to help but some things just don't fall into my wheelhouse buy when they do, we're on it. This is how teamwork should be defined.
I think if your take away is that the author is an asshole, you might want to reflect on specifically why you feel that way. In my experience as a sysadmin, in a large company that's been trying to become a user of "cool" IT in the last decade, the article is spot on.
I think for point 1, he's trying to say that application developers aren't doing their role as both dev and QA. I've witnessed the same issue where an DBA had trouble installing Maxscale on two identical servers. He was convinced that there must be something different between the two servers despite them being created from the same template, and only differing in IP/hostname. He had done no research, opened no tickets with the vendor, but instead wasted 30 minutes of my time arguing that it's not his fault. And this is common with many of the developers I've worked with in the last decade.
For #3, I don't own the application you develop. We provide you with a platform that YOUR application runs on, based on requirements you provide. If you don't do an adequate job of providing accurate requirements, that's on you, no my team.
And #4, developers don't abstract all those things away, they often fundamentally don't understand how they work at all, so they ignore them. This ignorance has damning consequences when they make blind assumptions about how things work.
Talking in terms of mine and yours means we are not on the same side. And this is the problem.
If there is a problem with the deploy let's meet, fix the issue and most importantly learn from the problem, and document the incident for future reference.
Just because I have my responsibilities and you have yours, it doesn't mean we're not on the same side.
I've come to dread cute management phrases like "everyone should pull on the rope". I agree with the sentiment, but software development is not as simple as pulling on a rope. There are lots of moving parts and lots of things to specialize in. And I say this as a generalist dev, not as an ops engineer.
I agree with TFA completely. I was interviewing for a job recently, and one of the questions I would ask when the interviewer signaled it was time for me to ask questions was "how do you handle QA?" On some occasions, this got me weird looks, because "QA" seems to be an antiquated concept.
In a similar vein, my stint at Amazon taught me that one of the questions to ask my interviewers is to tell me about their on-call rotation. Is there any? How often are you on call and for how long? Who gets paged first?
Yeah, we're all on the same side, but there needs to be some structure and order. Otherwise, you end up with something like this:
"Twenty-seven people were got out of bed in quick succession and they got another fifty-three out of bed, because if there is one thing a man wants to know when he's woken up in a panic at 4:00 A.M., it's that he's not alone."
-- from "Good Omens", by Sir Terry Pratchett and Neil Gaiman
> If there is a problem with the deploy let's meet, fix the issue and most importantly learn from the problem, and document the incident for future reference.
Not being on the same side should not be the problem, it's the solution if you divide the work as each side has to treat things differently while working together.
Operations takes care to restore the operational side. The faster the better.
Developers need or normally want to tackle the issue differently and most likely after operations reports updates they can decide whether the workaround is acceptable or development wants to maintain further on.
this normally works very well this way as operations does not have the time to maintain the application but the developers have.
I used "mine/yours" to denote where the responsibility lies. In a small org you can have the entire IT team troubleshoot an issue. In a large org, that's unfeasible.
I'm willing to help troubleshoot and provide guidance based on my experience, assuming the application developer has performed their due diligence. I have no insight into what their application is expected to do, or its failure modes. I have no input into the coding methods, the test harnesses, the deployment process. But when that shit breaks because the dev doesn't understand the difference between `rm -rf ./*` and `rm -rf /` that's his problem.
Now of course this is an org problem, not a team problem. As in parenting, setting boundaries and responsibilities is the key to success. Too many leaders in IT simply think that "DevOps" will be cheaper and faster and leave it at that.
I currently manage an infra ops team. I was a developer for about 10 years.
I agree with point 1, nearly completely. A lot of developers could take a lot more responsibility for understanding the environments their applications operate in, but I get it.
Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.
Point 4, in my shop, we provide a lot of documentation and guidelines for this sort of thing. Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out. Again with the road metaphor, if you drive a semi into a single car garage, you're the idiot, not the person who built the garage.
On some of this, I'm taking a hard line. I do, in fact, end up doing a lot of troubleshooting with developers. But most of my team does not write code. If you want more senior ops folks who also have a coding background, come on over! There aren't that many of us who are any good, and I would love to hire more.
> Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.
I don’t follow this. Developers are responsible for learning what kind of environment their application runs in, but ops is not responsible for having some clue about what they’re running? That cuts both ways, and it’ll help everyone out.
> Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out.
I find this attitude fairly common amongst ops people. They just build something that is totally inappropriate for actual usage, and then dump the responsibility for figuring that out on the developers.
> Developers are responsible for learning what kind of environment their application runs in, but ops is not responsible for having some clue about what they’re running?
I don't think it's as cut-and-dried as your question frames it, but I do think there are fundamental differences between the two positions that justify some of the tension there.
The problem is the difference between domain knowledge and general systems knowledge. The former varies wildly from org to org, team to team or even within individual teams. The latter is more consistent across wider applications and over longer timeframes.
Developers usually need a lot of domain knwoledge to do their job, which can leave less space for systems stuff. But the systems stuff they do learn tends to be more widely applicable.
Ops folk often service many teams where the domain knowledge differs between them. The best of them might be able to internalise all of those differences but it's a big ask. And there's rarely any crossover.
This difference is also why developers tend to have a slower ramp-up time than ops engineers do on joining a new team. It's just the nature of the work.
I say all this as someone from the developer side of the fence. I'm fortunate to have some years in the bank now that the systems stuff comes more easily. The domain stuff remains really hard.
Developers have more responsibility than ops for knowing their apps, for the simple reason that each developer owns a small number of apps, but ops owns the infrastructure for all the apps.
Why has your organization built a one-size-fits-all ops organization if it doesn't have a one-size-fits-all dev organization? Sounds like a failure of ops organization to recognize that the needs of the email hosting guys are different from the website team or the billing team. Maybe you should build a set of smaller, more focused ops teams focused on meeting the needs of those different groups?
Smaller, more focused ops teams already exists, but are not bound by application boundaries but by system boundaries. (mostly, storage, compute and networking). The reason is because each of these is a completely different environment on its own.
I completely agree. Far too many devs are clueless about how their apps perform or interact with the ecosystem. That tunnel vision has a LOT of negative consequences on infrastructure.
Because the ops org doesnt concentrate on just the one application. They have broad knowledge of the entire stack and therefore don't have as deep of an understanding on any single piece.
The dev org also doesn't concentrate on just one application. I've not seen this situation where every Ops personnel is assigned to the entire stack. Each Ops employee or team in a larger organization is generally responsible for a subset of the environments.
> Point 4, in my shop, we provide a lot of documentation and guidelines for this sort of thing. Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out. Again with the road metaphor, if you drive a semi into a single car garage, you're the idiot, not the person who built the garage.
With the road metaphor, one issue I've seen is ops will create a rope bridge and get mad when devs need to drive a car over it. "You shouldn't do that! You idiot! Just walk over the bridge like we expect!"
Example: We have about 500 different applications in our company and the ops team maintains a single rabbit cluster for all apps (and everyone is supposed to use that one cluster). If an app gets too chatty on that cluster "Oh you idiot, why are you so chatty! You just sunk the organization!" Which, in turn, discourages the usage of rabbit (maybe that's the intention?)
> But most of my team does not write code.
I actually prefer this ( :D ), our ops team was a bunch of converted devs that decided the best way to do things was making a giant ops framework for all devs to follow. That ended up costing WAY more money than if they'd just used tools that were available. They fetishized trying to make everything "just one line!" which ended up breaking anytime you had a slightly different need (trying to take control right up to managing how version bumps happen).
Overly trying to force a single method of implementation has a lot of negative consequences. I prefer instead to have guidebooks and examples with the freedom to be an idiot and walk off the beaten path when needed.
> With the road metaphor, one issue I've seen is ops will create a rope bridge and get mad when devs need to drive a car over it. "You shouldn't do that! You idiot! Just walk over the bridge like we expect!"
Well, the main problem with the "bridge mismatch" is usually that resources required for an environment are not free. Its usually the opposite, most infrastructure is rather expensive, and running multiple systems side by side because multiple developers require slightly different versions of the same thing tends to explode cost.
> You are mistaking the highway road crew for mechanics
The highway road crew know what a car is, though, right? They know that the road needs to be clear and flat and drained of water, and the markings need to be clear, so that cars can drive on it.
When the devs come to you complaining about flat tires, you can't turn round and say 'this is a mechanic issue, I don't know how tires are meant to work. They go on the bottom, right?' - you're meant to help check for rusty nails or bits of metal in the road that are causing all these flats.
'Oh, I didn't realize that was something that could cause trouble for cars'
Well then you're a pretty crappy highway maintenance guy.
Yeah, just don't expect any sort of understanding for jammed doors, bad clutch, or overheated motor, which is the kind of complains OPs were referring to.
> Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.
How? Honest Question. If you know nothing of the application how are you able to offer any input into the infrastructure it runs on.
Because an ops team will have between dozens and hundreds of apps to support. You do a survey of needs and build out something that gets to the most common use cases.
You try to respond to what people need and add things when there is enough demand. But I can't know what your business goals are, what your uptime metrics are, or who your users are.
At some point, your app becomes a black box that takes in requests, accesses DB/storage, and emits logs/metrics. I just don't have the brain space to be intimately familiar with each service.
"I can't know what your business goals are, what your uptime metrics are, or who your users are"
Do you... work for the same company? Draw your paycheck from the same revenue stream?
If you're just a black box provider of undifferentiated compute/storage why the heck are we paying you? We can buy that from a dozen cloud PaaS providers.
We have 60+ projects out there using four major languages and god knows how many frameworks supported by a few dozen developers, and... two ops staff.
I do my best, but I'm never going to be intimately familiar with your product on a technical or business level in the way that you are when you spend 20-40 hours a week on it. If you want that level of service, you're gonna need another couple million a year in ops staffing budget.
You're paying ops because someone needs to know how to fit all the lego AWS gives you together, understand what's inside the lego pieces to debug issues when things go wrong, be accountable for ensuring best practices are implemented as far as security, backups, etc, optimize spend, and figure out how to architect all this stuff to make sense on AWS.
We could get some of that with some of the more managed services like Heroku, but at our scale the premium we'd pay is waaaaaay more than two ops salaries.
That only works for a small company. Once you start having multiple products then I can't reasonably be intimately familiar with your particular service. If you want a white glove level of support then you're going to need to grow the operations teams substantially.
If the author is an asshole, you certainly also are one by the same standard.
Developers are not just "users", they're fellow software professionals who can reasonably be expected to work harder on troubleshooting than reporting "it works on my machine but not in the test environment :(" without even reading the error message or including it in the report.
As a general rule, when you have most of the control or knowledge of a technical process and you want someone else to help you with it, you need to give that other person as much transparency and info as possible. Because they don't control the process and will have to slowly, laboriously ask you questions, or ask you to do things, rather than just probing the system themselves.
They're taking time out of their day to work in a relatively inefficient and frustrating mode just to help you out, so jeez, have some respect and try to make their jobs a little easier.
If you don't and prefer to wear this entitled attitude, fine, but you're just as much an asshole as he is.
my favorite response ever to "it works fine on my laptop/dev machine" is "let's connect the prod load balancer to your workstation and get you a pager, problem solved!"
I believe your assessment of the agreement is flawed. Application developers are not our users. You're our tenants. We provide highly available housing for your projects. We keep the lights on, we keep walls standing and we make sure the roof doesn't leak. We also provide APIs for you to interact with. When those things fail, we are responsible. When your code doesn't run in the test environment where everyone else's does, that's not our job. I'll help you but at my convenience because I have other things to do. Of your app fails in the middle of the night, that's your responsibility. If it's an infra problem, then it's on me. We don't ask you to tune the network or balance the cluster or ask you why the daemon sets are failing, right? If this was a shared responsibility, you'd be helping with the core too but I can almost guarantee that's not happening. (Some of my eng peers do but the vast majority think or it as a black box.)
No. Equating internal teams with paying customers is the very attitude that is causing these problems. Encouraging teams to think about their "internal customers" leads those customers to become entitled. We work together in the same company, our relationship is not the same as with actual external paying customers. I can't tell a paying customer that they're being unreasonable or lazy or unrealistic. We absolutely should be able to have that conversation with other internal teams when appropriate.
The post is describing the situation that has evolved as a result of QA being phased out. Telling Ops to suck up that extra work because "Dev are your users" is exactly why the post was written.
At most large companies things are organized in such a way that internal teams are your "paying users". Some internal teams at some companies even say "If you want X feature and Y support you need to request $$$ funding and N people for our team".
On point 3:
> I likely don't even know what "working" means in the context of your app.
I think both sides can do more to reach into the domain of the other. I get it- we don't want to deal with blinking lights and they don't want to deal a missing semicolon breaking everything.
Honestly I think "that's not my problem" is one of the worst attitudes you can have as part of an organization with common goals.
Not being able to say "That's not my job." just leads to dysfunctional organizations, burnout, and resourcing issues.
Different job roles exist for a reason. In the case of developers and ops, developers develop, ops manages ... well, usually literally everything else, and there should be _some_ overlap in the middle.
If you can't say "no, I can't do your job for you", then that area of shared responsibility just gets wider and wider and, if one side _is_ doing that (as developers usually won't step into the ops end too deep), it shifts further and further to one side. In the case of a typical organization where you have maybe an ops person per dozen or two dozen developers, that person very quickly becomes a bottleneck. That person gets burned out. You need to hire a bunch of expensive ops people to do work cheaper developers could be doing.
Literally watched the lack of role definition and a bunch of ops people that, by virtue of almost everything being their job, won't say "not my job" do this at a company I'm leaving. A couple months back I was literally writing code in one of our apps because the team that owned the project "didn't know elasticsearch and didn't have time to figure it out".
It sounds like someone who is frustrated because the process or culture in their organization has lead to point 3. I tend to involve ops before I write a single line of code and definitely before deploying to a stage environment. Over the course of a project, they help me write the runbook, create dashboards, and alerts. After all, we are all on the hook when things go sideways at 3AM. I want them to know as much as possible about how things work.
My previous company had a HUGE problem with Devs cowboying off and doing whatever and dumping it on the Ops team at the last minute.
One of the biggest (but for damn sure not the last) issues was a dev who designed and built an entire new product around a MongoDB database, which wasn't something we had in production, and something he didn't mention during the months of development and demos to stakeholders. Week before the launch date he hits up our Ops folks to get production set up.
Ops was calm and collected about the whole thing. "We don't have MongoDB in production. Are you volunteering to learn how to correctly install it, write monitors for alerting, be paged with issues, figure out backups and how to ensure our data stays safe, secure, and available? You're not? Then get the [redacted] out and rewrite your app. Yes it will affect the ship date, and yes it's your fault."
I'd love to say we used that opportunity to shore up our processes involving kicking off new applications and including Ops folks in from day one, but that took years more.
Something similar happened at my company like 5 years ago.
A developer was tasked with adding a major new feature to one of our older monoliths. He added MongoDB as a dependency. The application already had a well managed Oracle database. Nothing about the feature required MongoDB.
When it came time to go to production, the DBA and ops teams responded similarly to how you did. I wish I could say sanity prevailed, but the business mumbled something about contractually obligated release dates and forced it through to production. Pretty sure it is still there rotting away.
I've worked mostly on the app side of things and this sort of thing just makes me shake my head.
well at the end of the day you managed to ship it? Did it cause any big problems down the line? It seems the biggest problem is that it is rotting away somewhere, which to me means that it is working without need to do much care on it.
If they listened to your DBA/ops guys no value would be gettig shipped ;)
I don't know of any big problems other than the unnecessary cost. I agree meeting the needs of the company is king, but it was just a lot of unnecessary complexity because a dev wanted to put MongoDB on their resume. Could have been avoided by talking to the rest of the team early on. Of course, they would not have liked the answer of just creating a new table in boring old Oracle.
> I agree meeting the needs of the company is king, but it was just a lot of unnecessary complexity because a dev wanted to put MongoDB on their resume.
Counterpoint: the dev is doing this to remain employable, so that they can ensure higher success in the future for themselves.
Their goals simply don't align those of ops and are at best parallel with those of the company as a whole - of course it's to be expected that they'll attempt to prioritize their own when there's a lack of governance and oversight within the company.
It's something that i've noticed more and more, yet is something that noone really talks about - people wanting to use bleeding edge technologies just because they're at the top of their hype curve: wanting to implement microservices when they're just maintaining monoliths and there's no need for them.
Personally, i'm an advocate of both microservices (or at the very least modular monoliths), containers and many of the new technologies, with the exception that i've initially tried all of those out in personal projects in the evenings and weekends. Yet what is the person who doesn't code outside of work supposed to do to remain employable? Would you expect a doctor to practice new types of surgery in their own time? Actually, why don't companies fund a week every few months for their developers to upskill themselves? Just a bit of time that's treated like a vacation, but during which they're expected to hack together prototypes etc.? Clearly most companies out there don't do greenfield or pilot projects, so something like this could help.
I don't think i have any good answers for this, but it definitely deserves more consideration!
> Ops was calm and collected about the whole thing. "We don't have MongoDB in production. Are you volunteering to learn how to correctly install it, write monitors for alerting, be paged with issues, figure out backups and how to ensure our data stays safe, secure, and available? You're not? Then get the [redacted] out and rewrite your app. Yes it will affect the ship date, and yes it's your fault."
> So, you could have delayed the app by the same amount
> but now have a mongo environment for production as well?
No, we couldn't have. Not just because we didn't want MongoDB, which at the time was notorious for data loss, but because our ops team didn't have the capacity at that point in their schedule or team size to handle it. Maybe had we discussed at the beginning of the project plans could have been made or altered, but we didn't and so they couldn't.
> Seems a bit of a waste to rewrite the app instead.
The responsible dev took the time necessary to rewrite the data layer to better reflect the needs of the application.
Is what I wish had happened. Instead the developer jammed the huge JSON blobs into a column on an MSSQL table and changed a few lines. lolsob.
> Instead the developer jammed the huge JSON blobs into a column on an MSSQL table and changed a few lines.
Sounds like quickest way to deliver value to the customer. As described, was far too late in the process to worry about deploying with a clean, extensible architecture.
A reasonable amount of technical debt in order to ship in the timeframe available.
Depends on your definition of "reasonable". You now can't leverage your DBA's skills to optimize queries, because you're using the RDBMS as a key/value store.
You're misusing a tool because you didn't do the correct application design in the first place.
NoSQL has it's place, mostly in the trash. Lazy key/value stores (which is all that NoSQL is) throw away all the benefits of relational logic for a glorified combination of a file system and grep.
That's not "delivering value to a customer", that's delivering crap.
Standard "Agile" response. It was only "far too late in the process" due to a complete lack of process, oversight, product quality ownership and capabilities.
If nothing else, that developer should be "counselled" as should the PO, the Scrum Master and anyone else involved that allowed the situation to occur.
And the ongoing capex and opex for the additional unbudgeted support should be pushed back on the PO as a requirement to fix.
except that shipping something with semi-broken infrastructure leads to losses down the line.
What if your mongodb database drops its data and now you have production impact? Are those losses calculated while making these decisions during development.
Pet hate: they're not operations' logs, they're developer logs. Developers write the code to create log messages on the principle "more is better". Logs are another example of the systemic hoarding problem with people and computers.
They're a ratchet pattern, adding more is easy but once they exist it's very difficult to find someone with the authority to authorise removing them and the willingness to stick their neck out and declare that they aren't required and the willingness to spend time on low-importance maintenance. As a consequence logs build up until something gives and they become high importance urgent failure. The middle bit where they "aren't important" but they still waste storage space and networking bandwidth and processing power (and money) and when there is something to debug they waste people's time because the important details are needle-in-haystack among tons of low-value filler, all gets ignored.
At the limit, it isn't sustainable to print the complete internal state of a system at every clock cycle. It "should" be possible to do a lot better troubleshooting_power-to-log_weight ratio than "print every state change which feels important at the time in whatever semi-English message format is convenient", shouldn't it?
> And yes, as developers, networking or physical drive space are things that we tend to abstract away.
Why is it Ops job to guess at your application requirements? You have the best understanding of what setting LOG_LEVEL=DEBUG is going to do to disk requirements.
In theory, but pragmatically it's irresponsible to assume without running in staging or on a subset of production resources to monitor and see what actually happens.
QA teams sure do buffer a lot of crap. They also cost a bunch and slow down time to release.
If your QA team is slowing down releases then that is the developer's fault not the QA team. Frankly, this move fast, don't do proper QA is irresponsible and a danger to users.
They add latency, there's no way around it. Even if there are no software problems and their verification is instantaneous, QA by itself adds an extra hand-off to a team with an independent task queue.
If your QA team is a "thing" that gets features at the end of a sprint and churns out bugs or releases you're doing it wrong. They should be involved on a feature by feature basis working alongside the developer with QA time incorporated into every task. All unit/integration/system tests should be automated during the cycle so there is no "hand-off to QA". There should be less latency because you have a test expert speeding up implementing tests or being a force multiplier to developers by acting as an internal consultant who can advise on bits where needed.
I guess it depends how you count a release. I think these fast moving teams spend more time in production debugging than the QA team adds. Shipping it should not be the final determination of release time.
I wish more companies valued QA teams, then maybe I wouldn't get so many notices of security breaches and need to keep checks on my credit.
Some of the bonehead stuff will be caught by QA, but there are folks on some QA teams that get security. Sadly, developers talk down about QA so much that the people we need on QA teams are not going to go there.
We have a fantastic QA team, and they test everything that goes to prod. Definitely slows things down (by about 1/3) but our user experience is significantly improved because of it. IMO a good QA/test team is critical to delivering an excellent user experience.
I totally agree with TFA; except it was ever thus. (And agile has helped reduce the problem if anything)
As an ops person I've had to explain the devs own architecture to them; they didn't know how it sent mail -- nothing to do with SMTP; they just hadn't shared the knowledge among themselves of the db/java app interaction.
I once had a developer tell me ridiculous things like "my java app can't write to java.tmpdir". They couldn't even tell me what file they were trying to write. I had to dive into apache docs and send it to them. I turned out to be a bug in an apache project code, nothing to do with tmpdir writeability.
The lack of basic responsibility and ownership was appalling.
Ad hominem is where you aim to refute the argument someone is making by impugning the character of the person making it. Obviously, it doesn't follow that because someone is a bad person, that what they say is wrong.
"Don't believe this guy, he's an asshole" would be an ad hominem argument.
But this is not an ad hominem argument being made here. They are, instead, making the logical claim that, based on the attitudes described, the person comes across as an asshole.
They aren't then saying that that invalidates their arguments - they are taking the author's arguments at their face, and inferring the person's character from them.
Which seems to be a mode of argument you're comfortable with, since you've just done the same to the person you replied to.
Agreed. Good application code often contains edge case handling, build time checks, unit tests and defensive flows that handle the unexpected so that users don't wake you up at night. Why can Ops not do the same? Why can Dockerfiles / Orchestrators / CI / playbooks not also implement sanity checks on deployments?
"Ooops... deployment failed. While deploying your artifact we found the following:
- Nothing is listening on the nominated port
- Your deployment is utilizing 100% CPU while idling
- We detected an abnormal volume of write operations to the mount
Please fix these issues and re-trigger the pipeline at your earliest convenience.
Now that just shouldn't happen... ie, we(ops) aren't going to deploy something that doesn't come with healthcheck(s). The healthcheck never passing(port isn't listening) is going to stop the deployment from ever completing. Ops job is to push back on developers if they try to hand us something like this to build a pipeline for. In my company, to hand Ops the name of a repo and say "build a pipeline"...there are a lot of requirements, and the biggest one is a list of SLAs. That list of SLAs is how we build monitoring for your application, and one of those should always be a list of port(s) and protocol(s) that are exposed; we build monitors against those.
It varies with the company you are at... but building pipelines is just one of the ops tasks typically. Ie, in your gitlab-ci.yaml example, ops would have given you the template (assuming you were going to be the one committing it) that your pipeline had to follow - so it was uniform with the thousands of other pipelines in the company. It's unusual at most of the shops I've been at for devs to ever build their own deployment pipelines. That might be OK at a smaller shop, but once you have 100+ deployments of any type, the devs have lacked ability to keep it all uniform at scale.
A better way to put it... most of my colleagues in an Ops roles already did development for 10-15 years, and moved on to developing the tools to deploy other people's products. Additionally, kubernetes isn't everywhere - they also build pipelines that produce AMI's, GCP images, and write the terraform/cloudformation/HEAT/etc to deploy those things. If you wonder "who automated blue/green deployments?", that's your ops team.
Also, in your example of "human kubernetes" - ops builds those clusters, and monitors those node pools. If you half less than a dozen clusters, or less than 100 nodes among the pools - you might not even have an ops team.
What’s the value of uniform pipelines across thousands of projects? You’ve now got thousands of teams who don’t really understand how their stuff is deployed because someone handed that to them on a platter, and everyone is subjected to the uniform constraints and complexities of the shared solution whether they need to be or not…
What seem like efficiencies can rapidly become barriers.
It’s like any case where two systems share a requirement - you can factor it out into a shared library or you can duplicate the code; in the case where the common requirement is only ‘coincidental’ not ‘instrumental’, you are better off duplicating the code so that the two systems can evolve independently and not take a coupling to a shared dependency.
The same applies to infrastructure. Sure, you’ve got a dozen clusters, and it seems efficient to have one team set up and operate all of them - but are you sure the efficiency of one big team is better than twelve much smaller teams, closer to their dev orgs, who each run one cluster more tightly suited to that org’s needs?
How far down can you push that decentralization?
With smaller and smaller units of cloud compute and storage being available as services, the answer is increasingly ‘all the way to each individual application’.
I am sorry but as someone who has been on both sides of this, I think this is all wrong. And I thank god both my app developers and operations people aren't assholes like you.
Hey, that's a pretty shitty way to start off a comment, don't you think? With a personal attack?
1. Yes. But it's not operations problem if you are whining that your PS5 game isn't running on the XBox. There is personal responsibility in this, too, and it's not operations job to hold your hand and explain how to do your job. If you aren't reaching out to operations to make requests, they aren't going to know what to do. Your entire comment shows that you think they are subservient to you, rather than you actually being an honest user. Tell them what you want, and work with them to get it.
2. QA teams do not slow down time to better quality releases. They do slow down time to half-baked or buggy releases. Regardless, the number of app developers to operations people is generally a very bad imbalance. I promise you, the good ones are working with the people that reach out to them.
3. Maybe if you invited the operations people earlier, they'd have some ownership in the product. But usually they release it without operations even knowing, and suddenly there is something in production that is half-working. They had no hand in it. They literally did not work on the project, so they can't know.
4. You can't abstract away things if you don't know how they work or account for them. Again, inviting operations people to earlier discussion is incredibly easy. You know what projects you are working on, they tend to not because there are far fewer of them than there are application developers. So, it's on you to reach out to them to get input. Yes, they have to make themselves available, but you have to invite. And guess what? When you do that, you get a wealth of information and makes the product better.
Your comment feels like someone who is used to expecting perfection from others while accepting their own mediocrity.
Wow... ending a comment with an insult is rather shitty, too. Why did you decide to go the route of writing a comment that starts of shitty and ends up that way?
"Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs."
"Maybe if you invited the operations people earlier, they'd have some ownership in the product."
Awww... You like each other but none of you dare to make the first move :D
Whole thread reads like bunch of guys shouting at each other "but but ... I know better!".
IMO this is main topic of the thread and of the article.
There are groups of people who instead of spending time to figure out how to work together and understand what other side has to say, they just throw shit over the fence.
Maybe some could start by reading points at least couple of times and try to understand instead of trying to write personal experiences as fast as they can in reply to other comment that hurts their ego.
The gap is a leadership problem, someone is accountable for all the "shipping". It's a made by divide and conquer strategies (more "abstraction" at higher level).
Maybe it works if you assume all employees want to do bare minimum and don't get blame for what was not delivered.
What I see most of the time is that people want to deliver, people want to be valued by their work.
Of course I am cynical as the next guy from me in terms of "getting on high horse" but there is a lot of people who want to do their job and want to do it well.
Playing divide and conquer, playing non-existing scare deadlines is going to work once or twice and any smart employee will leave after that kind of crap. Other option is you are going to get smart employees who cannot afford to leave, but because of that crap they will just stop giving any fuck.
I sign up in reality into "self fulfilling prophecy employee", when you treat your employees or other people with expectation that they are thieves - in the end they will steal from you.
If you treat your employees as if they suck - they will suck.
Of course there are bad apples but if one goes the road that everyone want's to rob him, he will get robbed.
So instead of devs inviting ops people to the meeting they want ops people at, you are saying ops people should go up to devs while they are working and offer to help, even when they aren't needed?
> Awww... You like each other but none of you dare to make the first move :D
Only if you ignore reality. If you hold a meeting and don't tell me about it, how can I attend? Both sides want the ops people involved. Maybe the one having the meeting should invite them.
Cause the whole article reads like "developers are whiny assholes who don't know shit about computers". And yes, it starts with an attack and ends with an attack too.
1. It's not operations problem for sure, but I certainly don't bash people for not knowing things I am the expert of.
2. Fine
3. The OP's saying he doesn't want to know!
4. Well, writing applications is sitting atop a stack of technologies more and more abstract. A developer not knowing what happens in an IP packet is the same as an infrastructure guy not knowing what happens in an NP junction.
> A developer not knowing what happens in an IP packet
I don't care if devs understand IP packets, TCP congestion control algorithms, or anything similarly low-level. If they do, that's awesome, but it's not expected. I do expect them to have a basic understanding of expected latencies for intra-DC vs. internet, why running Flask in production isn't a good idea, and if they're really sharp, an inkling of how Kubernetes networking works.
I do. Why should devs not understand the environment in which they are developing? If they don't understand the concepts then they're not developers.
Why do we call these people "software engineers" if they're not engineers. There is endless documentation, there is all the resources needed. Allowing devs not to know what they're doing is a complete failure of anything approaching professionalism.
It is also the opinion of the people who wrote the built-in webserver. If you try to run it in production mode, it'll emits this warning on startup:
> WARNING: This is a development server. Do not use it in a production deployment.
> Use a production WSGI server instead.
I don't expect junior devs to have a sense for what is production-grade and what is not, but if they try to ship software that explicitly warns against being used in production, you've got a real liability on your hands.
> Cause the whole article reads like "developers are whiny assholes who don't know shit about computers". And yes, it starts with an attack and ends with an attack too.
There is no attack in the text. There is a complaint that issues presented to operations often lack the basic level of detail and due diligence that they should have. You are free to disagree with the author's expected level of due diligence on issues; I think you'd be wrong to, but you can. However, it isn't an attack.
You perceive a non-attack as an attack, and respond with an explicit attack and name-calling. That actually makes you the aggressor.
It read more like "these developers are asking poorly formed, difficult to answer questions", and frankly reminded me of a LOT of r/CodingHelp problems I've seen lately. Aside from that the author seems to repeatedly have empathy and admiration for developers but thinks that there is a systematic disfunction. There is definitely a little "old man shouts at clouds" too, but at least to me this article read as a legitimate discussion of some pain points, certainly not a hit piece.
> 3. By your own affirmation you treat applications as black boxes that should be deployed using a runbook that should just work. This is ridiculous. Application's ownership is shared between everyone who works on it.
Maybe you're imagining the wrong scale of organization, here. An ops department isn't usually "embedded" into a development team (especially when there's more than one development team!); it's essentially a company-internal Platform-as-a-Service provider that the development team deploys their app to. From that PaaS's perspective, the app is an opaque workload. It's not ops' job to fix your app, any more than it's Heroku's job to fix your app.
> Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs.
"Infrastructure engineer" and "operations" are entirely distinct roles. (Maybe not at a startup with five people, but once you get to even 30-or-so, there's a clear delineation.)
Infrastructure engineers are fundamentally software engineers, who happen to know a lot about infrastructure, distributed systems, networking, etc. They know about the operational constraints of software. And as such, they usually get put in charge of release management for the software—i.e. get put in the critical path for changes—because they have an eye for what changes to the software might break the deployment.
Ops people, meanwhile, aren't anything like "in the loop" of your software engineering process. Their day-to-day is spent managing servers and various well-known software systems running thereupon (e.g. Kubernetes, Nginx, RabbitMQ, etc.) They get handed opaque components (those well-known systems, and also your app), glue them together, automate "around" those components using runbooks, document how to get things back into working order when they crash, etc.
In small companies doing "DevOps", there are no real "ops people." There are only infrastructure engineers doing ops.
When a company becomes large enough, there is a transition point where managing the servers and all the standardized stuff running on them gets too distracting to your infra engineers, and they find it hard to help with the app, because they're too busy fighting fires and doing maintenance to the operational infrastructure. At that point, you hire actual "pure" ops people, and build an actual "pure" ops department, to take that load off the infra engineers' plate, so they can get back to their true comparative advantage, of guiding the app in an infrastructure-conformant direction.
But that separation necessarily means that you now have people managing your servers who aren't engineers. They're technicians.
-----
A labored metaphor, for your enjoyment:
Your ops department is like the service center for a motorpool. The people working there are automotive technicians. They are not automotive engineers. They can't make you a car, or change the components of your badly-designed car so that they're better-designed, or tell you what your weird prototype car means with its weird nonstandard error messages.
They can do standardized probes, get industry-standard error messages out, and do things about them. They can swap out broken components for newer releases of the same components. They can replace consumables. And they can notice if something is weird in a statistical sense (i.e. if some of the weird proprietary metrics the car keeps are not within historical reference range), and point that fact out to your automotive engineer.
But you've got to have those automotive engineers, on staff, in the development team, to deal with that information.
From the article: ”Often they have not even bothered to do basic troubleshooting, things like read the documentation on what the error message is attempting to tell you.”
This has been the bane of my work happiness for a while now. I keep having to tell junior devs to actually _read_ the fine error message, just in case it actually _contains information about the error_, you know. Not that it seems to help much, it’s like they can’t get the concept into their heads.
This is 100% a problem with younger, bootcamp-”educated” devs, in my experience. I know the common wisdom on social media is ”no one reads text anymore”, but if that includes aspiring developers, it might be tough to replace the current workforce when that day comes…
It was not about 27 years ago that the senior sysadmin at my first real job told me: "Most young people don't have any hacker spirit at all, they just give up the first time they see an error message. The world is not going to survive that."
I was doubtful that this was a universal truth then, and I think it's the same now: there are a lot of people who do mediocre work, and they are and were supported by a smaller group of people who do really good work. And the world keeps turning.
One of the joys of the Internet and of open source is the increased ease of sharing ideas and solutions.
> they are and were supported by a smaller group of people who do really good work
I think the only solution for this is when "support" includes education. In the simplest form, we can give support by helping that colleague to find the issue himself, rather than giving the solution directly.
In a more advanced form, you're making structural changes to your company. Like in how you share knowledge with the team.
Reality check: they run off and nag someone else...very discreetly...until someone caves and fixes it for them. If they get enough pushback they will just quit.
I think that education ought to be available to everyone, but spending effort trying to teach someone who doesn't want to learn is a waste of everyone's time.
We've all heard the stories where someone bright joins a new workplace, is assigned drudgework and after a bit automates it until they only work two hours a week? That's someone who is willing to learn, and everyone else in the office was willing to experience boredom in order to avoid learning.
For all I know, some of those people would have perked right up if the subject matter were milling, millinery or masonry moving millier-weights of stone. Not everyone is interested in the same things, and even when their job is all about it, sometimes people aren't invested in it.
If it works, everyone involved wins. Sometimes it doesn't work.
> This is 100% a problem with younger, bootcamp-”educated” devs, in my experience.
In my experience it’s also common with older and college-educated ones; contractors trying to avoid extra hours; senior architects; and especially anyone who thinks ops is someone else’s job. It’s definitely not specific to age or training mode.
There are a few contributing factors I see: tunnel-vision focused on the particular detail they think they’re working on, causing them to ignore anything they “know” isn’t related; shoddy tools like much of the Java ecosystem where poor culture around logging trains every user that it’s normal to have huge amounts of log spew; etc. but the biggest problem I have seen is ego — either unwillingness to believe that the product of their staggering intellect could be less than perfect or that the mundane task of getting their grand vision to actually work is for the little people.
I’m thinking of a “senior architect” who was quite surprised to learn that networks are neither perfect nor instantaneous, and that his app might have some issues due to needing thousands of XHR calls to load the UI. It was so much easier to ignore the error messages and say the problem was Chrome. He had a CS degree – the problem was the wrong mindset and having been enabled to avoid good troubleshooting skills.
It could be argued that there is a technological Maya that has to be overcome before you mature as a developer. Most people grow up spoon fed the "it just works" ideal, even old school computer people never really had to doubt that their floppy drive would read a disk or that their PC BIOS would accurately boot their computer.
It's only after the technological illusion of Maya breaks that you realize floppies have read heads, hard drives have moving parts, CPUs have conductive traces and all of these are vulnerable to breakdown, entropy exists in the system and cannot be expelled, that the previous "ideal" state of your system was temporary, an illusion, that nothing always works the way it is supposed to and that your options boil down to "burn it to the ground and start over" or "leap into Hades both feet first to rescue the soul of what you love".
Most people go the first route. Buy a new one. Replace what is broken with something else. That way the illusions are never broken. The technology didn't fail, only its current & easily replaceable avatar.
As the Son of God once did, after its death it will rise again, immortally replaceable.
However, it is only after you have faced that 2nd trial by fire and returned with your elixir that you as a changed being can peer through the veil. The meme about "CPUs being rocks we filled with lightning & tricked into thinking" rings differently to you now.
You're touched the bones of the God and found that they crumble. There is no God here, only a beautiful shambling nightmare that has eaten the minds and souls of millions, built by mad scientists and engineers in a vain attempt to create the God whose physical absence they find themselves longing for the same way a neglected child longs for the embrace of their mother.
I think people don't read error messages if fixing the issue is beyond their immediate capabilities.
Computers are scary things that fail in counterintuitive ways. When the handle of your tea cup breaks, the issue is intuitive and most people will be able to understand why it is happening and how to work around it(handle it carefully from the top end end enjoy your tea?).
But when it comes to computers, often you need deep understanding of its inner workings to make sense of your observations of problems. Why Xcode would say that it failed to compile my project because usefulExtensions.swift already exists? What it is supposed to mean, I see only one file with that name? That information gives intuitive idea about the issue only if you know how the compiling process works.
Why would I know why the package couldn't be found? Unless of course I know how that package manager works. Then I can check if the package manager is configured to look at the correct places.
Most error messages are like that. Instantly makes intuitive sense if you know how everything is glued together and makes no sense and needs study if it's outside of you domain of expertise. No one reads error messages unless they can recognise the pattern instantly and there's a data(like the name of the variable) guiding you to the fix.
> Computers are scary things that fail in counterintuitive ways.
Not being scared of the tool and believing one's inherent supremacy over it must be the most basic criterion for practicing this craft, but these days this fear is nursed, at times encouraged, at times even exalted (corollary of the failure fetish) especially by those who publicly place themselves as ambassadors.
Any introduction to computers must start with the statement that they are all heaps of plastic and sand and the only things they are able to do are because some mortal sat down and spent time figuring it out.
People starting out now are at a disadvantage because their first encounters happen mostly through extremely polished looking apps and it is hard to see at the outset how one could go from weird incantations in a text editor to that.
I think it's harder now than it used to be. When I started programming, any error message you saw would almost certainly originate from the beige box on the desk in front of you, which had only a single CPU core and often didn't really do multi-tasking. Over time, our computers, operating systems, applications and networks have become vastly more complex, with the effect that you can't easily build an intuition about which component is responsible for a failure and why.
Personally, I love debugging things. I have a very good "theory of mind" for dealing with computer failures, and figuring out why the computer isn't doing what I might naively expect it to do is a lot of fun. However, it's only fun because I've been able to stay on top of the curve as the systems I work with have become more complex. Starting from zero today sounds a lot more daunting.
True, the stacks are more complex today, but the resources are greater. Back in the days when the error was in the beige box, if the books/CDs/DVDs you had on your shelf didn't address the exact problem, you had to roll up your sleeves or you were SOL.
Nowadays, research skills are more important, but I see a lot of devs who just don't have them. Can't find the answer on the first page of your first (poorly formed) search? Run get the senior dev. To me it reads like incuriousity and laziness, or lack of training.
I don't mind doing some coaching, but if you're a dev, and you can't even be bothered to read the error message, what does that say about your effectiveness?
> Any introduction to computers must start with the statement that they are all heaps of plastic
This scales to everything IMHO, everything is simple once you understand it. Levels of abstractions is what makes it scary and complex. I.e. electricity or fire is also not scary once you know how to handle it.
One should not have an unreasonable fear of fire, but "unreasonable" is a quite key word there. One should have a healthy respect for it and feel some fear if those around you don't.
First of all I agree with you. But I would like to note one aspect of this that has always been true. Many times the error message itself is terrible. They say that an error happened and then they give you absolutely no context for the error. My favorite example is when an application happily bubbles up the error from the operating system when a File read/write operation fails. The OS will tell what kind of operation failed. It won't tell you which file you were trying to read or write. It won't tell you what location you were trying to read or write. Basically none of the context you need to really understand what went wrong. You'll have to go read the code in order to mentally reconstruct what the context is. It's silly. If you are writing a file and don't at a minimum catch and then wrap the error with an error of your own that adds the necessary context then you are contributing to the problem here. And that's just one common example. There are many more.
I'm an SRE, so my programs are generally Python scripts < 1K LOC; maybe this isn't scalable, but I write verbose log statements (if it's launched with --verbose, of course). It's not that much effort to change `except OSError as e: log.error(e)` to `except OSError as e: log.error(f"Error accessing {file} - {e}")`
If I know typical causes of errors (forgot to connect to the VPN, etc.), I'll include them in the log message as well as things to check.
This is a large and ongoing part of becoming a developer. It also happens again every time you try to learn a sufficiently new or alien technology. You know you start to make progress when the error messages begin to make sense.
Often you need to know a lot of context before you're even able to determine what the error message is! One error message can lead to a cascade of other error messages, or it's something breaking down as a result of multiple layers of indirection, requiring the developer to careful track the trail of what went wrong and led to another thing failing, which broke down the next thing and ultimately, decided to stop the program and mention only the very last thing falling apart to the user. There might be a directly sensible connection with the original error, but often it's quite unrelated. An experienced developer often immediately recognizes: this is not the actual error message, that other thing is! But for a junior it's all equally incomprehensible.
It is detective work with many false leads, and being very new at something it can be so overwhelming you don't know where to begin and immediately assume you will not succeed finding out 'whodunnit', asking your senior co-worker for help.
> I think people don't read error messages if fixing the issue is beyond their immediate capabilities.
How would they obtain those abilities though if not while spending time on the issues brought up and learning how to learn.
I think sometimes people are just bored and can't be bothered to find the cause and solution to their issues, and over a long period of time that mentality sticks and becomes second nature resulting in phrases like "this software sucks, I need to read the docs to use it".
The problem is, learning is taxing and many times you encounter these errors when you have more important things to do.
When you want to develop your game and the IDE is complaining about something about locating some files, do you think that it is good idea to learn how that IDE organises dependencies?
Sometimes you suck it up and learn it and you know next time. However, your first instinct would be to look for ways to make the error go away so that you can immediately start working on the task that you are supposed to work on. That's why we have abstractions and when things work fine we don't know how things work.
It shouldn't be expected of you having complete knowledge of all computer systems, tools and frameworks before you can make a ball image bounce on the screen.
Taxing as hell. I have personal projects well underway that go unfinished because of tooling complexity or some other issue causing me to completely derail and spend days figuring out some type of in issue that has absolutely nothing to do with what I'm trying to accomplish. Granted, since I have gotten away from visual studio it is much better but I'm t still happens. If it isn't the IDE or package upgrades it's AWS or Azure issues.
> But when it comes to computers, often you need deep understanding of its inner workings to make sense of your observations of problems.
They're supposed to have that knowledge, or at least not be afraid to dive in and get that knowledge.
There's only one way to build an intuition of what kind of problem probably causes some error (most famously, if the error is completely incomprehensible, you missed a closing thingy on the previous line), and that's by doing the work a lot.
Very very true. When something causes a 1 point story to take three days; bring on the hacks and compromises and ignore anything that doesn't need dealt with to get it out the door.
I don't think so. We can do so many amazing things with the computers precisely because we don't have to know how things work. Computers are so many levels of abstractions over printed metal on melted sand.
People who know what they are doing will understand the errors of their own creations and will learn the workings of the tools they use to some degrees and will be able to understand the failing modes of these tools with experience over time. No one starts with complete knowledge before start building things.
> or at least not be afraid to dive in and get that knowledge.
Of course they should have the drive but people's first instinct would be to make the error go away so that they can do their actual work. People have limited time and energy, you can't expect a JS developer, for example, to study inner workings of a Linux box to understand all errors. It's cool when they do and gives them superpowers but it also makes them less productive as JS developers. Sometimes you simply need to implement that button to render on the server without studying the server.
My own perspective on this was that it felt like my brain would turn to goop when I got an error message, my eyes would cross and I would start frantically googling literally anything, skimming stack overflow, and getting nothing done. In order to progress I had to learn to slow down and start reading error messages and learning what they meant. Sometimes this meant I had to look up a bunch of words, one after another, to understand something incredibly dense.
So yeah, I think I largely agree with your assessment, and would only go on to state that the path forward is slowing down to learn vocabulary and think critically. You really speed up after that.
I have experienced this same issue many times. But from my experience, I cannot simply pinpoint it to young developers or their education. I have seen this behavior with several older people, university educated, with many years of professional experience. So not just an age/generational thing, in my opinion.
Maybe they have seen so many red herrings that they don't even trust that the error message could contain something useful and relevant? Or maybe they just learned to skim through everything, and don't actually read stuff.
I also don't understand why, when they ask for help, they can never be bothered to say what they're trying to do, what error message they got, etc. It feels like they're doing me a favor when I try to help them fix something.
Then there’s the StackOverflow effect: where you ask a reasonable question X and get a bunch of upvoted condemnations along with directions to do Y instead.
Right. When I was learning to code a more seasoned dev told me "The code doesn't do what you want it to do, only what you _tell_ it to do."
Really helped me gain the mindset that not only was it my mistake that resulted in code not running, but that it was fixable. Like a game of ping pong. You hit the ball, sometimes the compiler hits it back.
I think it comes down to passion and curiosity. Those are two things can happen at any age and education. It is also something that ebs and flows based on energy levels.
I think the same. It's so typical everyone here tries to pin characteristics to exact professional groups. The amount of anecdotal evidence here is too damn high.
Keep at it: some crusty guy telling me to just read the sodding error 12 years ago opened my eyes.
I think there's something about certain kinds of tools that are cryptic, unpredictable, and frustrating that can teach you to be helpless - to just Google and hope. It's fixable though.
I don’t know most of the times error messages on computers are garbage. People build up some type of fear for the error message format.
I have an example:
I built a logistics and invoicing tool a few years back with error messages that where human readable with clear proper messages that told the user exactly what they did wrong and it even proposed how they might fix the problem.
I don’t know how many times I had to go to the users workstations read the text out loud for them like they where a 5 year old and ask them what they thought it meant.
They always knew what it meant but I had to read it for them it was embarrassing.
And these where university educated accountants that where using the software.
After a lifetime of garbage error messages like “error code 4513” people just zone out.
For real. How difficult would it be to have the computer tell us what the error code means instead of a single sentence.
I get codes back in the day when storage for a whole book was costly, but that isn't the case anymore. Just tell us the error, show us the pointers, and then tell us what typical fixes are instead of expecting us to go to the internet for a solution.
My take is that cryptic error messages have always been a cross between 'protecting company secrets' and 'never really admit to a mistake'. Sure, MS or Apple could just throw up a dialog that said "Sorry, we trashed the file you were working on, here are the last 1024 chars" but people would actually be more angry then, instead of just "Error 1234 occurred" (or at least they'd have to go look it up to be mad).
As developers, we also have to be used to a lot of completely unhelpful errors. Yeah, couldn't connect to the DB, sure... oh, but actually because my code ate all of memory, why didn't you say that in the first place?
I suspect it grew out of electrical engineering. Machines/parts could only return errors as integers, so you would expect people to open up the documents and read that error code 1021 meant that the unit had caught fire.
Software just kept the tradition of error codes, since that meant you could also sell that juicy documentation (localized into whatever language you wanted) to the user as well. I suspect it also a localization issue because OracleDB would never return the table/column in the error message, so as to be easier to translate.
From my mentioned example one reason was cause the user loaded a csv file with incompatible data.
Or cause the user had filled in part two of a task but not part one and then tried to continue with parts of the task that where dependent on filling in part one.
Or the user tried to synchronize orders from the erp system but the erp system would not return any orders.
I always preach to people that they have to make it easy for others to help them. Don’t just say “it doesn’t work” and expect them to analyze the issue and take care of things. Instead provide information what you did, send log files, screenshots and whatever other information you may have.
I think the people most likely to fall into the “it doesn’t work” category are people who don’t have much experience troubleshooting difficult problems.
In the end it’s about compassion and understanding of each other. Unfortunately in a lot of companies the only direction people are getting is “get it done on time”. It’s rare that management asks people to have empathy for each other.
> I keep having to tell junior devs to actually _read_ the fine error message
I wonder if it's also to do with the environment in which they learn. When I was learning to program, like probably others here, I didn't have anyone around me who knew anything about computers so was generally on my own until my first job and had to dig through stack traces and read error messages and had to try and figure out what was wrong. Kind of a blessing and a curse as I imagine my rate would have been a quicker and I wouldn't have hit so many brick walls but I learned to debug independently.
> This is 100% a problem with younger, bootcamp-"educated" devs, in my experience.
You'll get a lot of pushback here but it's definitely true.
That doesn't mean it doesn't happen with CS grads as well, but it's quite rampant among bootcamp devs. I think the reason for that is that, since the bootcamps are so short, they "stay on rails" and mostly work on simple projects (that will give out something they can push to a github repo and use as a portfolio).
It's the same with git. Every bootcamp will use git and claim to teach it to their grads, but then watch them do anything on a repo with multiple users. A lot of them just rote memorized commands to pull and push to main and that's it. Branching? Rebase? Using the commit history? Never heard of.
For new hires from serious Engineering or CS Degree, they should have had at least a few classes dedicated to projects where they built something non-trivial. On top of theoretical classes teaching the fundamentals.
I do have one anecdote/counterpoint. At my company we do DevOps, but we have a central group that creates the templates we are supposed to use for various AWS resources, build plans, and deploy plans. It makes it extremely frustrating to troubleshoot issues specific to the templates or how you configured them because they usually aren't something you can google and the documentation for them is extremely basic. Sometimes you have to ask the people who created them since they have the deep understanding and experience.
I've been contracting at a company for a few years and things progressed from manually setting up servers (before I was involved) to using Ansible to provision multiple servers and now it's transitioning to a Kubernetes cluster and infrastructure as code from day 1 along with tomes of documentation to go with it.
It really is worth it to go the extra mile and write comprehensive docs, even going as far as writing them in a conversational tone as if it's a blog post or a book. I'm really happy I found a company who treats documentation and workflows as first class resources.
For a small team where only 1 person is working on this it helps eliminate the bus factor and it also makes it easier to have non-hardcore ops folks do code reviews on your IaC. Having them be able to get the gist of it with a little bit of background knowledge is so much better than nothing. All of this results in higher reliability of the services your company offers.
I agree that it has to be done to have consistency. I'm just pointing out that when using IaC designed by another team where the internal workings are largely hidden, us primarily dev guys are going to need some help with issues that appear to be ops related. (And yeah, occasionally root cause will be that I'm just an idiot)
"lack of newness" is a characteristic many will expend untold hours to extinguish. to my perspective, the "rewrite it in Rust crowd" is the peak; all non-Rust code is soiled, and worthy of replacement.
(it is very possible that the "rewrite in Rust" movement is just a guerrilla marketing project)
I don't really care for Rust, but all non-memory safe code could benefit from being replaced. This does, for some people, mean rewriting it in Rust because C and C++ make it harder to achieve the goal of memory safety.
this feels like one of those things that needs to be tackled from both ends
teach people to read error messages and simultaneously improve the readability (and utility) or error messages
i don't know why we put up with such bad error messages anymore. i imagine it's a function of stockholm syndrome and the difficulty in getting messages changed
I sort of blame exceptions here. They make it really easy to just let the error bubble up to the top. But often times the place where the error gets thrown doesn't have all the necessary context to have a really good error message. If you want a good readable error message you have to trap the exception at the appropriate place and then wrap it with the appropriate amount of context in the message. But the easy path is to pretend there is no error and let the very top layer surface anything that went wrong to the user.
Force them to write heavily templates C++ code, they either learn how to read pages of comprehensible error message to diagnose one little typo, or they can’t do their job and are forced to look for a new one.
It’s usually not hard to figure out what’s wrong from the messages, but man do they look scary and hard to understand when they appear. Yes I’ve been writing C++17 lately using some very template heavy libraries.
“Force them to write heavily templates C++ code, they either learn how to read pages of comprehensible error message to diagnose one little typo, or they can’t do their job and are forced to look for a new one.”
When I did C++ we sometimes made little competitions for the smallest change that can produce the craziest error messages. On the other hand I always found it extremely satisfying to make one little change that removed thousands of errors and warnings.
I've avoided C++ for most of my career, but one idiotic mistake I do remember making was accidentally leaving an open curlybrace at the end of one of my source files. The C++ compiler ran and reported 1000s of compilation errors all through every single other file -- in my code, throughout all the library code I included.
Easily diagnosed if you're working incrementally, one small change at a time, and making checkpoints with version control: `git diff`, carefully review the diff of what you changed since the last checkpoint where things were more or less working. I must have not been disciplined enough to work like that at the time.
Troubleshooting systems integration failures is also character building for getting better at diagnosis from errors. Sure, it's failing, but let's try to figure out the immediate layer of failure from the logs, error messages, symptoms: name resolution? tcp? tls? http proxy? authentication? authorisation? api spec misalignment? error in our application code or the system we're directly talking to? unexpected data? error in some other system that we depend upon transitively? each time you hit a new novel failure mode, or fail at one level deeper, you're making progress!
I had a nice one today where it complained about some thing not being invokable deep in some std code somewhere. Lots of crazy template instantiation errors. It turned out I forgot to pass the variant parameter to std::visit.
I don't think it's just because of bootcamp education, I think people are growing up in a world where error messages are either never displayed, or displayed in the from of "The program has did a sad :( try again later."
They're not used to reading error messages because they've been brought up seeing nothing but completely useless error messages.
We “process” maybe 100 times more text than we used - but to do that we scan for patterns and ignore without reading most of it. (Partly due to the prevalence of advertising as well, of which we prefer to read 0%)
> This is 100% a problem with younger, bootcamp-”educated” devs, in my experience.
I resent this. Not because I'm a bootcamp-"educated" dev. I'm not. But it suggests somehow that devs with CS degrees are somehow better in this aspect. If anything, they're arguably worse (obligatory, not everyone disclaimer).
I have never met these kinds of people and it boggles my mind that they exist. Actually reading error messages is the most basic thing I'd expect from a developer considering that you're going to encounter them regularly during the development process.
Some seniors remind me of street preachers shouting "Hell and damnation awaits you!", and they get much the same response from people. Not saying this is you, but I'd be interested to know what ways you've tried and what's worked or not.
I have the opposite experience, boomer devs that see red or yellow in their terminal and send me an email or jira ticket before trying to parse it. The younger generation at least can make judgement calls on warnings.
Big fat "it depends" on that. It might be superficially correct (dependong on the scope/skills of "support" staff). Even if there is a meaningful financial difference between the day rates, it doesn't necessarily follow when the process is viewed end to end:
1. If ops staff have limited expertise/authority, it's less likely they can resolve problems. They might acknowledge (so maintaining some aspect of client SLA), or have a limited set of pre-defined remedial actions (reset button). Anything beyond that, though, and it needs the dev team. So it's arguable whether the ops staff provide much value in the equation.
2. As a dev, there's nothing quite like the prospect of being paged at 2am on a Sunday to incentive more robust code.
End to end dev accountability isn't a panacea either - but the problem is more nuanced than just pay rates.
One of the best arguments for shared responsibility is that it avoids “not my job” thinking. I’ve seen large organizations burn resources and downtime because developers and ops are in an adversarial relationship where nobody has a stronger sense of responsibility for an application working than they do for shifting the blame to their counterparts. If everyone is getting things escalated to them it tends to cut through that cycle.
also, devs seem to really underestimate the pay support staff makes, especially those capable of troubleshooting deep, low-level and complex issues.
this might be less visible on the Dev side, but troubleshooting infrastructure is not an easy to find skill, especially if dozens of moving parts are involved.
More power to them. The rest of us 'Ops' guys are slowly transitioning to DevOps Engineers and SRE's and are gladly taking handfuls of cash because I both know how to read hex dumps, and are also smart enough to know your IP address isn't going to change because I replaced your ethernet cable.
L1 support are users of the software, just like their customers. If the application doesn't provide them with enough information to do their job, then that's a failure of the application.
And by "application" I mean the combined software, release, documentation, runbooks, etc. Not just the latest git tag pushed.
Often the root cause is in dev code without proper logging and documentation = dev problem. The only way for Ops to troubleshoot might be to sniff the network and make educated guesses, if it's recurring. All that extra work costs much more and halts more urgent production work.
I think some of the article is framed wrong. It's written as competent ops guy/team vs. incompetent developers. But I don't think that's actually what's going on. I'm sure there are competent developers saying the exact same thing about their relatively incompetent ops people.
We built a couple relatively simple applications for an enterprise client. It took their ops teams months to get both applications running in K8s, even though our deliverable was a fully functioning container. They were largely incompetent as far as we could tell.
But, I don't think it's worth being unkind or judging them. Every time they asked us a question we made an effort to point them in the right direction. There were other times it was a problem we couldn't help with, we kindly let them know that.
I think the reality is that the demand for competent IT and developers outpaced supply a long time ago and it's not getting better. Those of us who know and care about the difference should make competent co-workers and executives part of the job evaluation. Or, accept incompetence around you as a reality, help and avoid as wisdom dictates.
But, complaining that it exists and framing it as competent ops vs incompetent developers is both untrue and unhelpful IMO.
The latter part of the article that talks about the pace of features, complexity, and the lack of time is spot on though IMO. I think the article would have been better focusing here and avoiding the IT vs devs angle.
I started to write something, but you encapsulated it much more concisely. I've been on all sides of this over the past... 25 years, mostly dev, sometimes having to handle server/network/etc (before it was 'devops'), and worked on large and small teams.
Competent and incompetent people exist in all areas. Some of those incompetent ones can get better with time and support, and some can't/don't.
I think the other side of your story is missing, and there's no account of capabilities, priorities or even vendor stories. Why would you not know reasons for delay. That gap is the real problem.
If it truly was a competence issue, how did leadership solve it?
Oh boy, it feels like someone is ringing bells in my head because of aligned thoughts.
Let me share an experience. In 2010, I worked on a project for a large business in the US(Fortune 100). The process was set so rigidly that it worked well, but I was among the group of people who were mad at it saying”why is this so rigid? Trust us and let us do things faster!!”. Context : There were change management rules in place. The software was to be released only on a regular cadence of about 6 months, only after thorough integration tests, and approval from the change mgmt board. Should anything go wrong in “move to prod” there will be representation from dev, QA, Ops, change mgmt, and Mgmt orgs to immediately decide on actions until the release to prod is successful. There will be thorough documentation of what to do (run books) on what changes occurred, what their impact could be and how to rollback if something unexpected occurs. It was always a party after a successful release :-)
Trust me there were a lot of bugs, but they were mostly found and fixed during the laborious QA and integration tests by people whose job it was.
Fast forward to now, I am a “Cloud Engineer” in a small team that does everything from app development to building CI pipelines to running services on AWS to being on-call to keep them running.
I must say, I wish for the old days back. Sure, it was slow and laborious, but it resulted in better outcomes and manageability. IMHO, it also resulted in better reliability of software due to the diligence done by several layers.
It is easy to say do the same just faster in your small team. But, in practicality it just doesn’t happen. I work on setting up Observability one week, then onto designing infra for a new service, then onto some development and so on. I feel like my scope would have been limited, and I would have had an easier time becoming an expert at something than becoming so broad skilled like I am today.
Sometimes, old, slow, and mature is not so bad. Not everyone needs to follow the FAANG SV companies to be successful.
Here here. If the bulk of your “products” are for internal consumers, you likely aren’t paying enough to attract talent who know how to operate in the the FAANG model.
I like to distinguish between “product developers” (i.e. building products for consumers with guaranteed scale, so do it right the first time) and “project developers” (get it done ASAP and cut the corners you need to do so).
In the “project developer” world, 50-75% of your requirements gathering happens before a line of code is written. There is usually a “right way” to implement a process of which technology is only one component and figuring that out as you go will actually slow down the project due to the maker / manager schedule conflict. True “agile” in this environment just leads to scope creep as there usually aren’t dedicated product owners to say no to every little request.
I’ve stopped pushing agile as hard because the corporates simply can’t afford the kind of engineers to make it work correctly, and they don’t have the roles required to gather and feed requirements to a dev team in an agile format. Sprints are a good way to time-box feature development, but most business projects work better with a more waterfall approach. Your customers and project plan operate under waterfall so there’s less downside to begin with.
Great comment about “Project developers” and “Product developers”. It is almost an entirely different art to get the requirements right by iterating on a project, and bringing out a solution to life versus engineering a scalable, maintainable product that evolves after a good start.
I never had to think of such distinction.
Waterfall model has its downsides in extracting the requirements out properly whereas the Agile approach(the little I have seen of it) seems to lose the layered stability of a waterfall based approach.
excellent comments.
instead of straight waterfall, i would suggest a time boxed requirements phase, followed by incremental development with a reasonable cadence (dictated by the product; web might be 2 weeks, more serious domains might be 90 days).
you need iteration, but having a solid grounding on what you are going to build eliminates churn.
> I must say, I wish for the old days back. Sure, it was slow and laborious, but it resulted in better outcomes and manageability. IMHO, it also resulted in better reliability of software due to the diligence done by several layers.
Those were also the days where it took many years to go from Java 6 to Java 8. Or perhaps to try out Kotlin.
They were the days where legacy code was the norm, and we kept supporting it because nobody dared to change anything for the better. In practice, that's just not something you can maintain in a competitive market, because your competitors _will_ use new technologies and faster/better development processes.
"it just works" might be good enough for maintaining your application, but will it be good enough to find people willing to work in that code base or that environment?
I work for a large business where both the old and new practices are in place (mostly the new ones, though). Focusing on "going fast" is definitely not a good idea, but I believe there's a sweet spot in between.
Sure, mature processes encourage tech stagnation, and discourage even beneficial changes as collateral damage. But, as you say there is line somewhere at which project should move on from "Go fast, ship often, change much and get feature-rich" to "focus on correctness, stable releases, actually maintain our existing features". Perhaps it is really a cycle of both and missing one for the other leads to problems.
Perhaps all this description is showing is that in many organizations there simply is a genuine need for a "Developer IT" support function with appropriate skills and resources, and because there isn't one, it's being done haphazardly by teams who aren't a good fit for it, as the author describes. If there's a few niche issues then that's solvable by e.g. dev training, but if the issues are systematic as the article asserts, then that's an organizational problem that needs an organizational solution. If your company can't ensure that devs are capable and/or motivated to troubleshoot issues that work on their laptop but don't in a real deployment, then your company needs some "internal consultations" mechanism to connect them to someone who does have this capability and can explain and/or fix the issues for them.
Responding to "Someone will always have to own that gap and nobody wants to, because there is no incentive to. Who wants to own the outage, the fuck up or the slow down?" with "Not me." is not sufficient, it's a very valid question for which any organization definitely needs an answer pointing at some specific people - if it's not going to be pure ops people, it's IMHO not going to be the feature-developing devs as well, that would likely need separate 'site reliability engineer' teams as some major companies do.
I agree that something needs to change at the organisation level in your case, but i think it's hiring and promotion. This "developer IT" stuff is part of a developer's job. Juniors won't join you knowing it all, but they can learn it from seniors on their team who do. If you are recruiting seniors who don't know this stuff, stop, and if you are promoting juniors to senior before they've learned this stuff, stop.
The problem often lies with the entire Organization and not the Development team. I've had roles where Development was empowered to code the product and deploy the code, which necessitates certain access. At that point, troubleshooting is trivial and we can solve our own problems. It's amazing.
Throw in some red tape where I can't have access to logs myself? Then I don't care to fix it at all - chasing another team, that has diverging priorities, is complete a waste of my time.
If your Developers are tossing shit over a wall, I'd bet top dollar you work in organization B. In which case they are behaving accordingly. Don't empower me to identify and fix issues? Then I won't (and I won't lose sleep over it either).
I've been in both kinds of organizations. Without fail, every place that was like "organization B" (separate development and cloud teams) had some under-the-table workaround. The worst I think I ever saw was _every single customer's hosted environment_ had SSH access enabled with the same key, and it was shared between developers without much regard for security. A business worth hundreds of millions of dollars...
But of course they required two-factor authentication in O365, so it was SOX2 compliant.
I'm writing this as a DevOps/SysEng/whatever you prefer: I think there are two big type of developers groups in the industry, and two ways to think about these issues.
The first group advocates for an ultimate "NoOps" world where every team, composed only by SWE, is responsible for the whole code lifecycle, from inception to deployment to maintenance to decommission. This is aspirational in many, many companies and probably true in a bunch of really good companies. Anyway I still wonder if there aren't tensions there between product/business asking for developer power to bring new features/changes in and developers doing the Operations work beside writing business code.
The second group is formed by the developers described in the article, which just focus on their code and need a "developer helpdesk" for everything else related to actually deploying and operating the software written. IME this is how the vast majority of companies work, especially the "normal" ones. Some developer steps "up" and try to understand/do this extra work and they usually rise to the top because they are the "good" engineers.
To put some color on the "NoOps" world: it is not that product engineers are directly touching cloud or metal. We have an infrastructure group. It ships a PaaS. It doesn't get involved with a specific tenant of that PaaS unless a product engineer has evidence that there's something wrong with it & escalates a page. Product engineers click the deploy and rollback buttons for themselves.
> every team, composed only by SWE, is responsible for the whole code lifecycle, from inception to deployment to maintenance to decommission
This is how it is at my startup. All of the engineers are involved in managing the infrastructure for everything we build. I find it gives me much better insight into my app, and the feedback loop is much tighter since I am in control of everything.
This isn't true at all, though. Many organizations, especially smaller ones, seem to have taken it this way. Everyone is responsible for everything. But that wasn't the DevOps thesis. The idea was for eliminating silos in the sense that dev and ops would talk to each other continuously throughout the software lifecycle, with developers understanding operational challenges and creating software that didn't just meet functional requirements, but was easier and faster to deploy, change, rollback, and troubleshoot. Ops would understand the pressures of development and create automation systems catered to the specifics of their software stack.
It wasn't supposed to mean no more division of labor. Division of labor is a key innovation in human society that enables civilizations to exist. It was supposed to mean the teams in different categories of labor interact throughout and consider each other's needs, and not throw shit at each other over a wall and only ever interact through a ticketing system.
I believe that it was supposed to mean no more division of labour, and i believe that based on conversations with people who were early adopters of it ten years ago.
This waffly "breaking down the silos" stuff is a later redefinition, i believe reacting to the fact that the original meaning was extremely unpalatable to existing organisations, with existing employees and hierarchies who would be severely disrupted by it.
From Patrick Debois in the interview "Later, I saw a talk by Jean-Paul Sergent about developer (Dev) teams and operations (Ops) teams working together."
So no, breaking down the silos was baked in from the beginning.
But I still challenge the fact that it can really work well for normal performers and not only top notch developers. Taking care of all the phases of a software lifecycle is not an easy task, it has a big enough cognitive load. I understand that this is supposed to be done in (small) teams, but within those teams you still need some degree of separation of labour, you cannot have everybody having wide and deep knowledge of everything software. Well, you can do it, if you are Google and the like and set the bar for hiring really, really high. But that doesn't apply to most organizations and engineers. Not all of us are wicked smart.
The original thesis was better cooperation between developers and sysadmins (ops). It didn't focus on trying to make operations redundant or transfer every sysadmin into a SWE.
The team that writes the code should also deploy the code and get paged in if there are problems in production.
That creates a tight feedback loop that requires developers to learn and manage the whole stack, code defensively, and test enough to be confident to deploy to production.
Didn't test your code enough? You will be paged in the middle of the night to fix it. It creates a strong incentive to make good decisions because you will be living in the mess you create.
And this only works if your Ops team provides good tooling for deploying, logging, monitoring and alerting.
I have heard of companies deciding to do 'devops' and it turns into a free for all of dev teams having to handle/build things end to end. Everyone loses in that scenario.
In those cases, though, it's usually a team with some Dev people who know or can learn some Ops and some Ops people who know or can learn some Dev. It's not just laying off all the sysadmins, network admins, stack architects, and all then letting the developers freefall until they find a way to right themselves.
Yes, absolutely, but it's not actually widely implemented. Orgs does "DevOps" but they are just automating/writing as code some things that previously were done manually by Ops/SysAdmins. Now we have DevOps roles doing that same work but with other tools (Terraform, Cloudformation etc), much more automation, less gruntwork and toll but STILL used as "developers IT" nonetheless.
I am in Ops and I can understand this point of view. The author is probably overwhelmed with issues that are 'not his problem' and this is his rant on it.
To be fair, Developers are getting slammed with their responsibilities too. At one time it used to be that they could just know one programming language really well, like java, compile their code and hand it off to QA.
Now they have to know a dozen languages, frameworks, do their own testing, deploy the service, monitor it and trouble shoot everything in production in some 'cloud'.
Or they are just being lazy and this guy is sick of it. That is when you do your best to train people up and get them to put in the leg work. Ask pointed questions about if they Googled the error and help them work through the problem. Then add some things to the docs to help others out in the future.
> To be fair, Developers are getting slammed with their responsibilities too. At one time it used to be that they could just know one programming language really well, like java, compile their code and hand it off to QA.
Oh no, responsibilities!
Meh, the days of throwing a tarball to QA and log off at 5pm are gone, thankfully.
Some organizations empower their ops team to close support tickets by just saying "not enough information to diagnose a problem".
I've seen with my own eyes team metrics improvement (SLAs etc) by just counting the time the ticket was "in progress" to the ops team instead of waiting for the developer (or customer, whether internal or external) to reply.
This blog perfectly articulates the strife that inspired and drove the DevOps trend.
I am always saddened when I hear "our organisation has a DevOps team" - immediately this demonstrates the fundamenetal lack of understanding the very premise of what DevOps set out to solve: Bringing Development and Operations together.
Even the very name "DevOps" was constructed such to symbolise the combining of the two domains into one. But no. Now we just have a new cool title to throw on people who will be ringfenced just as they were before.
I'm more and more convinced that "DevOps" (and "Agile" before it) are just buzzwords that can be leveraged to make whatever change the person implementing it wanted to do all along, with zero regard for what the buzzword actually means. If real devops was going to happen, it wouldn't take a brand to sell cross functional collaboration. It would have just been yet another one of the constant stream of incremental improvements we make by folding lessons learned in industry into our own orgs.
Yep, because developer might not know what ops needs in terms of traceability, logs and so on, to be able to run their code in production without having to wake them up at 2AM. Similarly Ops knows a lot about what can be done with existing infrastructure, or off the shelf components, which can save a huge amount of work, while providing a more stable system.
I do mostly operations now, and I'm lucky enough to work with really talents developers, who care to listen to input, before writing 5000 lines of code. I also work with customers, who have their own developers, with their own weird ideas about the world.
The biggest problem I see right now, except for occasions cowboy pretending to be a professional developer, is developer picking technologies without understanding it. We work with customer who picked technologies because they're interesting, not because it's what they need. When performance is terrible it becomes and operations issue and being told "Kafka is not actually a database and should be used as one" often isn't the answer they want. Or try telling a developer that the code he worked on for three months can be done by the existing load balancer in a few hours or that the ORM is actually writing terrible queries.
DevOps team, as in: "We use the shared knowledge of both parties" is fantastic, but operations is frequently an afterthought and not involved in the design fase.
If we're to take "DevOps" as developers doing operation, I'd prefer that we do the opposite and let operations do development. I think we'd get better results.
I’ve found ESR’s “how to ask questions the smart way” [0] to be really helpful in these situations on both the asking and answering.
If I’m asking a question I explain what I’m trying to figure out, what I’ve tried, what I expect, what I’ve researched. Basically helping the answerer not waste as much time covering the same ground.
If I’m answering questions and don’t get this info, I ask it. And establish the expectation that this info helps me answer their question.
About 70% of the time, the asker adds in more info. 25% of the time I don’t hear back. 5% of the time I get a complaint that they are too busy or can’t answer the questions.
To the author of this article: Really great job on it, had great fun reading and also lots of truths in there. But this part:
”Often they have not even bothered to do basic troubleshooting, things like read the documentation on what the error message is attempting to tell you.”
This happens, but this just means that your Development Team needs some coaching or to improve their quality.
This tells more of a quality of the development team you have been working with. You have to pass along this feedback and ensure that Development team also works with professionalism as everybody else.
DevOps would tell be that "Dev & Ops" would look up issues together (Yes, he will be blocked as well WORKING with you), if you find that it was developer's fault. Tell them: "Hey, this is on your side. You saw how we troubleshooted together. Now each of us has new tricks to use in the future".
If you don't do that, you are the shortest path to get THEIR problem solved. And it is too easy to go that path.
This is a failure of management, not the "team". They don't need "coaching", this bullshit molly-coddling of people is pathetic.
Good management will ensure that if a problem like this occurs, they don't "coach" but they "counsel" the appropriate dev to do their job and not waste everyone else's time.
This reads like a guy trying to take complete ownership, while renouncing to any accountability.
I have been in this industry for 15+ years, and as a developer, I have a surprising amount of experience dealing with customers. Of course, when a customer complains about some feature not working, I would not just take their word for it. Customers mess up too.
What I would not do is brush their complains off. "This is a systemic issue". "They are causing problems". "They don't know better". "They don't have the correct incentives". Try telling that to a customer, or to your boss.
The obvious disconnect from his own team is the problem.
>developers are not incentivized or even encouraged to gain broader knowledge of how their systems work
This is the crux of the problem. Coding in isolation. Replies of 'It's java, it should work anywhere' etc.
The other gear grinding commom theme is not even doing basic troubleshooting. To the point of not even googling the error message or the symptoms, and being 'blocked' because they are waiting on a ticket they opened with the 'other' team.
We’re a small company so we sometimes do many things, but it’s taught me a lot of networking fault finding.
There’s some very clever ppl that know all about how networks/vm stuff work, and I’ve learnt enough from them that I can fix most of my own infra related things - or at least give them a run down of what I’ve done first to save them some time.
It got me back into hardware and networky stuff, so now I’ve got a MikroTik at home, some proxmox machines, Tailscale network etc - more fun than just spin up a box on DO and be done with it.
A lot of ppl just aren’t interested though, they just want to code (and maybe learn a new language) but because a lot of stuff is now PaaS and it’s super easy, there is no need to learn it (in their eyes)
I think incentive for developer is to be relevant. If you don't do it, someone else will. And that becomes the new norm. Like how DevOps has become the new norm.
I had to escalate to basically a CIO of a fortune 500 company for someone to take a look at the network performance of our system, all teams were blaming applications despite the evidences. It ended up to be a bug in a VMWare driver that was impacting their whole infrastructure.
I would add reasonable retry logic also. I've seen quite a lot of outages that would not have been noticed if there were decent retry logic with backoff, etc.
At one point running wireshark and reviewing network traces with developers was a full time job. Guess what percentage of time it was actually a network problem?
So I remember the times when programmers were power users and were poking fun at “lamers” and “lusers” who always went like “I clicked something and a message appeared, what do I do?”
And now people who call themselves programmers are like this themselves.
Just today I saw another “works on my machine” issue. The dev didn’t complain for 3 weeks that his latest code isn’t deployed. QA found out today (on a Friday) about it and the dev has his day off. The issues were not hard to fix, but it’s not the DevOps job.
Especially when the dev wanted to migrate from Java 8 to Java 11 and didn’t even attempt to lookup our documentation on how to change JVM parameters.
My biggest bugbears for anyone in tech are an incurious nature, and an inability to search documentation/Google.
Your code threw an error? Did you read it? Did you search Google?
A monitored metric dramatically changed with the last deploy? Have you investigated why that might be?
I am more than happy to help troubleshoot a tricky problem if you tell me what you've already tried. If you truly don't know where to begin, I'm also happy to teach you. What I am not going to do is fix your problem for you, with you retaining nothing.
A lot of the animosity between teams come from the fact that IT departments are being pushed harder and harder.
Programmers have to push out an endless stream of features; DBAs have to deal with ever greater amounts of data; network people have to deal with an enormous amount of endpoints (and now the network extends itself beyond the firm, so security concerns have grown exponentially).
The real challenge is to make your IT departments realise that they are not each other's obstacles.
The problem is often political. I have been at many orgs where the management of the dev team over promises something to the executive, and then when ops finds out about it and realizes the project is going to be a giant dumpster fire whose failure they will likely be blamed for, it becomes really hard for people to foster a “we’re on the same team” mindset.
As with most organizational dysfunction, middle management fiefdoms are to blame.
It always helps when the executive can see through this bullshit and ask the right questions, but often by the time this happens millions of dollars have been wasted.
That's oddly familiar... I frequently get: The server is slow. Well, no it's not really doing anything, but your applications is responding remarkably slowly.
Or: Can I get a bigger server... Yes, but you have 32 cores and 256GB of RAM, and your applications isn't that complex.
Having worked in Operations in some form or another for the past ~20 years this articulates so well the feelings I have been increasingly having over the past few of those 20 years.
Now I manage a small operations team and we experience pretty much all of the issues highlighted in the article.
There needs to be a rethink of how infrastructure, development and deployment is handled.. maybe the solution is to slow things down and insert a little carefully thought out bureaucracy between the layers (can't believe I'm advocating for more bureaucracy!)
If you've got 2 developers, they're both doing everything and on call 24/7 and all have read/write access to everything on demand.
If you've got 200 developers, you're going to start wanting a team of shift workers keeping an eye on the systems, and maybe you won't want every developer to have read/write access to production data.
If you've got 20,000 developers your working practices and infrastructure are almost completely cemented in place, and anyone who doesn't like them has already left because it's easier to change jobs than to get 20k people to change their behaviour.
well, they have to. Does not mean they do all that. They just skip the hard parts and focus on what is important in order to survive.
large organisations usually have a much bigger responsibility and is held to higher standards, frequently audited and controlled to stay within rules and legal compliance
100%. Over the last two years, a majority of my "sysadmin" work has been devoted to audit and compliance tasks. Mostly validating and working with auditors, but also making significant changes to work processes.
I mostly agree with the overall tone of the article but I do have to point something out:
> It is baffling on many levels to me. First, I am not an application developer and never have been. I enjoy writing code, mostly scripting in Python, as a way to reliably solve problems in my own field. I have very little context on what your application may even do, as I deal with many application demands every week. I'm not in your retros or part of your sprint planning. I likely don't even know what "working" means in the context of your app.
The point about not being in retros or part of sprint planning... I take up arms against that. I've worked for companies that have gone from waterfall to hybrid agile because we cannot get buy in from Ops to actually... you know... come to our retros, sprint planning and scrums.
Some things in this article is just pointing out the obvious... mediocre developers who push their problems and/or lacks on other teams. However, that quote the Author needs to look in the mirror. They exist only because of the products offered by the Company need resources. They have a responsibility to be business partners in that. If they aren't the company needs to re-align some priorities and it could start with Ops. Ops doesn't get a pass in an agile organization. The whole point of agile is to destroy them ivory towers. And if they were in those planning sessions, the developer might have already gone over the type of destructive testing that would have emerged from that collaboration and their DevOps relationship would be even richer.
Another root cause in our environment is alluded to in the article. With the rise of test frameworks, devs seem to test to prove the API is correct, not to find problems.
Another symptom of this is that when the QA/Staging function went away, load testing became perfunctory. Many of the performance problems we see should have been caught in QA. Devs are anxious to ship and get on to the next sprint, leaving app support and operations on the hook.
>Devs are anxious to ship and get on to the next sprint
I think it goes a step farther back to product. PMs and analysts put constant pressure on developer teams to complete work quickly and that time pressure shows up on the next guy's plate, etc
This isn't just specific to operations, I experience this amongst other developer teams as well.
I've had previous coworkers approach me about API "bugs" because they didn't bother to troubleshoot their app code and just immediately assumed it was a server-side issue.
Then I spend 10 minutes debugging the issue only to point them the error in their own code. I don't know if it's laziness or inability to troubleshoot, or both.
I've dealt with most of the issues in TFA and in comments here. Without engaging too much in the technicalities, I would offer that most of these issues actually stem from leadership, or lack thereof, and most often, at the middle management level, but sometimes middle management issues are just covering up upper management issues. Generally, I see these kinds of issues more often in non-technical management presiding over technical teams, because they have learned all the correct propitations to upper management and all the buzzword bingo for their teams, but lack a real understanding, and more importantly, lack the ability to form a coherent and actionable vision to correct these kinds of issues. (pet peeve issue with middle management is when they push others out of the interview process, and suddenly you have hires that don't belong at all.)
As for me, I'm currently watching a good devops team go down the drain because of a bad manager, so I'm seriously considering trying to move to management so I can help my employer do better.
This has been my experience as well. I used to work between infra and development and I saw first hand a constant stream of clueless devs that don't read documentation starting shit with infra and networking because they don't know what's wrong and just assumed it was the monkey-brained infra people who had something screwed up. The infra people were equally disdainful of the "stupid devs." Honestly, no one worked together effectively but the devs would just pull in a new framework or language, roll it out because some blog posts said it was cool, and then gripe at infra about perceived problems.
Now I'm in a dev ops team (as a dev) and we spend a very large swath of our time---troubleshooting infra issues. It's all AWS and our problem now.
There is a reason SRE/DeVOps Eng jobs are taking off in number and comp, and entities GitHub is (slowly) figuring out how to automate dev work.
Running code at scale turned into a very challenging comp sci program, and uptime vs code slickness is getting prioritized by clients.
The career support and innovation in that corner of the world (ops eng jobs) reflects it. Sort of gets after what software architects do, but the requirements to know that come way earlier in the career for Ops. Ops Engs with cloud knowledge, Python, and IaC tend to go far.
There is this one perfect illustration that I ended up using as a troubleshooting interview question in multiple places (sorry, it should now be considered burned).
Devs build a reporting framework. They test it. All is well.
Devs deploy reporting framework to production. All reports suddenly time out.
Devs complain to the network team that the network is broken.
Network team spend weeks checking every single interface and link, in all involved network paths (8with multiple daily "we are not seeing any network issues, is the application still having problems?", so definitely not done in complete silence).
Network team eventually pulls out the packet sniffers. Network team then asks if they can see the relevant code snippet.
Root cause is that in the test environment, the RTT between app server and (test) DB was on the order of 0.1 ms. But, in production, the RTT between the app server and the DB server was ~20 ms. And the code generating the report did NOT pipeline any of the data gathering, effectively making the report generation taking ~200 times longer.
Unfortunately, the network team was unable to deploy a higher light speed limit for that network data path and instead suggested that the application fetch multiple data items in parallel.
Cannot even count the number of times that my Ops teams had to write wrappers and o scripts to ameliorate issues suffered by apps rushed to production by shoddy dev teams. glad someone has written about it. Anyone in the business knows this has been going on forever.
The problem is the Ops teams get ZERO credit for enabling the shoddy work done by devs, the devs meanwhile get patted on the back, and frankly continue to be romanticized.
"move fast and break shit (and let Ops fix it silently)"
When developers see ops as IT for them, it's because ops (and overall management) is doing a poor job laying out the actual responsibilities of each role in the org.
Years ago, everyone working in tech was an IT generalist. They did everything (DB design, systems, applications, algorithms, code, networking, etc.). Today, the field has matured and people are able to specialize.
Sometimes, when old IT generalists work with new IT specialists, these sort of misunderstandings occur.
Years before everyone was apparently a generalist, you would have a separate DBA, a separate architect, sometimes even separate teams for implementing algorithms, etc. The mythical man month has a very nice section on splitting up the work over the various teams and that's a book written in the early 70s.
I think the actual boundary lies more in big vs small companies: small companies do not have the resources to hire specialists for every little subproblem, while big companies typically have enough employees that specialisation becomes a possibility.
I'm not sure. I take full-stack to mean the front-end, middleware and back-end of a webapp. I would still call it being a "generalist" when applying the concept to computer technology in general. For example, the CTO of an org should be a technology generalist, not a full-stack web dev.
Of course, this is just my personal opinion based on what I have experienced over the last 40 years.
What I am seeing is a need for more vertical integration. Teams need to be made to own the entire product stack. If you do this, they will be incentivized to make it simple and stable.
No one should ever get to play "not my job" while simultaneously throwing complexity grenades over to another team.
One thing that I realized as a Business Analyst and then Data Engineer is that communication is the key but pretty much no one can do it properly. I also realized that communication should be part of the job and leads and managers should hire people who can communicate things effectively.
If your QA team is a "thing" that gets features at the end of a sprint and churns out bugs or releases you're doing it wrong. They should be involved on a feature by feature basis working alongside the developer with QA time incorporated into every task. All unit/integration/system tests should be automated during the cycle so there is no "hand-off to QA". There should be less latency because you have a test expert speeding up implementing tests or being a force multiplier to developers by acting as an internal consultant who can advise on bits where needed.
QA as a discipline has evolved but from the sounds of it, it's not been widespread enough.
This is why teams should be cross-functional and product-organized, divided, if further is necessary, by product component not function, instead of function-organized.
Function-organized teams encourage knowledge siloes, and its-some-othrler-teams-problem-ism.
That sounds great in theory, but what happens if your dev/ops ratio is something like 15/1? How do you put an ops person in every team? I think it's the right answer but it seems impossible to put in practice.
I think the article starts with complaints about incompetents devs, then it heads towards complaining about automated QA, increased complexity and faster paces.
In the end, overall, it sounds like an incompetent ops, who is unable to keep up with the pace of this industry, complains about incompetents devs and that industry does not wait for people to catch up.
These are all wrong.
Complexity comes with valid reasons. High demands thin out the pool of competent devs/ops. We, the lucky bunch who earns 2x-3x salaries of other professions, should be the most humblest bunch, especially if there is frustration within ourselves about the fast growing reality.
To my the line between dev and ops is where we put it. And I like to make it very explicit.
In shared hosting times. Ops maintained Puppet definitions, create new "deploy environments" (using Puppet), gave/revoked server access the employees, monitored servers. Devs maintained the source repos and deployed to the environments provided by ops.
Now we live in virtual machine times (docker). Ops does the cloud infra (terraform), monitors the services, gives/revokes access to cloud services. Devs maintaines the source repos and deploys to the cloud clusters provided by ops.
I agree with the author's stresses. What they miss in their recollection of the halcyon days of large teams of experts is that it's extraordinarily expensive and (in many cases!) potentially wasteful of business resources to have experts on staff to maintain stable equipment.
Don't get me wrong, I appreciate well operating systems and the people who make that happen. There is going to be a beancounter wondering whether the very expensive, trained engineer can operate faster/cheaper.
I disagree with pretty much everything in this article, apart from the fact that people often come up to me with a question of the “I tried nothing, and I’m all out of ideas!” kind.
The author seems to want it both ways. They want the devs to fix their own problems, but at the same time give them zero control of the stack (we have to provide them with guide rails to prevent them from hurting themselves indeed).
This is a great showcase of the silo mentality and split.
One should not build silos where experts sit.
One should participate in a team of many different experts.
If you still have to call "whatever department" for fixing your slow SQL query, your disk space, your repo access, production deployments etc. Then in most cases I feel you should get out and find a place where silos are not being exercised.
Is it? In highly regulated environments, silos are practically a requirement. Security access controls are intentionally put in place to limit access to systems and their respective pieces. If you need repo access, or access to the CI pipeline, or access to the database, you have to go to the appropriate channel.
The scale of security necessities in organizations runs from:
- no need for anything but minimal security, perhaps because the business is trying to surf the margin between their AWS bill and their Google ad revenue.
- there is only a need for security in the part of the business that deals with money
- some stuff that the users do, they would prefer to maintain integrity but they don't care a lot about confidentiality
- the users want reasonable confidentiality, too
- everything about the business is money or secrets
Where your business is on that scale determines how much you are regulated and how many internal gatekeepers are necessary.
Everyone needs to be a bit of everything to mitigate the cases where one team doesn’t understand another team’s domain and Applications begins blaming IT or Operations admits to not understanding the applications they facilitate.
Haha nice one, I love to be a pure “cloud” engineer just because i can send a idiot dev to idiot azure support and enjoy they solve trivial “i don’t read docs” issue for months.
This entire article is written with such profound misunderstanding of DevOps - perhaps one induced by vendor marketing - that it's effectively meaningless.
Yes, developers should understand the operational environment a system runs in, and should be capable of advanced troubleshooting. But the rest of the post is simply tired screed about how "the old days" were better, despite the fact that they manifestly were not.
I laughed when he mentioned the Node developers. Come on guys, 90% of Node developers couldn't even code their way out of a paper bag. The quality of developers is shocking.
Because it treats Dev and Ops as having identical, equivalent skill sets. Attempting to make this true invariably leads to disaster. In over 30 years of professional software development, I've never seen the problems it purports to solve.
They are similar skill sets but the domains are different. I am not expected to understand every javascript framework that is thrown at developers and they are not expected to know the inner workings of our networking, kubernetes or service meshes. I will concede that every company wanting an Ops person to be a full fledged software developer is a little ridiculous.
I have been at this for 25+ years and I remember the days of silos. I remember developers passing off code to QA and it coming back days later with bugs. I also remember a lot of 'not my problem' coming back from developers. Sometimes it wasn't their problem; often it was.
Either way, the person with intimate knowledge of how the code works should be the first person that looks at the problem. In SaaS, that is production and should be the developers (within reason).
As Ops, my goal is to make that as easy for possible for developers. That means automating everything I can so that the right tools are in place for deploying, monitoring and alerting are there. It means that I have to automate spinning up and destroying infrastructure as quick and easy as possible so that we can meet your needs and also keep costs down.
I have also seen companies fail at becoming 'devops' in the most terrible way. They took developers and made them own everything from code to deployment to VMs. The developers had so many pieces to understand that the only guarantee was failure. That was a terrible startup to work at.
> They are similar skill sets but the domains are different.
Exactly. They're specializations, like heart surgeon vs. orthopedic surgeon.
> I have been at this for 25+ years and I remember the days of silos.
32 years for me. Silos evolved out of the wild west of the 90's and early 2000's. Which evolved out of the strict controls of early computers run by a cult of Operators where the devs couldn't even access the machine directly. It's a cycle, where management tries to remove people, only to have to put them back later. I've seen it over and over.
> I remember developers passing off code to QA and it coming back days later with bugs.
It is literally QA's job to find bugs that developers missed.
> In SaaS, that is production and should be the developers
SaaS or not doesn't have anything to do with it.
> As Ops, my goal is to make that as easy for possible for developers.
As a developer, my goal is to deliver high quality code that meets the requirements for performance, stability, monitoring, security, and functionality.
> They took developers and made them own everything from code to deployment to VMs. The developers had so many pieces to understand that the only guarantee was failure.
I've never seen DevOps done any other way. Hence my original comment.
On mobile this webpage has a thin hovering black bar at the top that fills to the right as you scroll further into the article. Very nice feature that I have not seen before.
We used to have this browser-supplied thing which would tell you how far you were in the document, what percentage of the document you were currently looking at, and afforded you the ability to quickly change your position.
You and your dinosaur technology, next thing you will want clickable interactive text to be in a different font or colour to differentiate from regular text.
I have done both application development and ops. In my current job, I am doing both.
There is a big difference between the mindset of what makes a good application developer and what makes a good ops person.
Application developers, by and large, have a sort of "sandbox" within which features are developed. This sandbox results from working with abstractions, each with some kind of guarantee. For example, most application developers assume that what you write into memory will be what you get out. That is, hardware is abstracted. The idea that the memory chips themselves can have defects or can sometimes fail, even if one gets error-correcting memory chips, is a violation of guarantees. Memory that just works is taken for granted. Another example is assuming that the system clock is monotonic.
This extends to things like networks, storage, operating characteristics, and so forth. Very few application developers get into that nitty gritty, let along all the plumbing and interactions among different systems.
I've seen application developers get incredibly frustrated and angry when those underlying guarantees are violated in some ways. I've been like that when I put on my application developer "hat". The main reason is that the developer is holding as much of the state and logic as they can, and they do this by excluding things through abstractions. They want their tooling and platform to just work so they can focus on writing good software.
The thing is that, for a good ops person, all those nitty gritty and plumbing is the focus of their jobs. It takes a very different mindset to troubleshoot: you start looking at those "guarantees" and find out what they are actually doing.
I once interviewed at a place which has this amazing way of figuring out if someone has the mindset and tenacity to be a good ops person. It was not writing algorithims on a white board. It was a deceptively simple task of installing a piece of software. And even though there are documentation for installing that software, there are not documentation for installing that software for every single environment and requirements and its interaction with other systems. When adding in a time crunch, and the scrutiny of an observer, that simulates a pretty typical day pretty well. You have to have enough emotional intelligence to keep working through it until it works. Documentation is always sparse and can't be guaranteed to be correct. Runbook? Good idea, but there is no way even meticulously crafted runbook for one component is going to be able to describe how systems interact with each other. Someone, somewhere has to figure that out. (Well, they don't have to. We can just let the system fail).
And sometimes, as an ops person, you have to open that "black box" and read code. Just like sometimes, an application developer needs to pop open the abstraction layer and pull out netcat or sysdig.
In the end, I'm not lamenting that DevOps blurs who owns what. Maybe this is because I've mostly worked on small, early-stage startup teams. Complexity has to live somewhere. I like working on the teams where people talk to each other to figure things out.
1. Application developers are your users. If we application developers took offense every time a user tells us that things are not working, we'd be pretty pissed off all the time. Educating and empathizing with your users is part of your job.
2. Talking about how it was better before: QA teams sure do buffer a lot of crap. They also cost a bunch and slow down time to release. Yes agile is causing problems. The bureaucracy and stiffness of organizations before agile was no nirvana either.
3. By your own affirmation you treat applications as black boxes that should be deployed using a runbook that should just work. This is ridiculous. Application's ownership is shared between everyone who works on it.
4. And yes, as developers, networking or physical drive space are things that we tend to abstract away. Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs.
This all feels like someone who used to not do anything suddenly being asked to take part in what's happening...
EDIT: apologies for the strong language and sounding like an asshole myself, but I certainly feel irritated when someone takes the time to write a 5000-word article complaining about whiny developers who thought they could own ops but actually don't know anything and scream for help when they themselves are the cause of all evil.