If your on-prem team can't spin up a VM same day, then firing them is probably h...

photonthug · on Jan 15, 2024

Despite years of friendly sounding devops philosophy there's times when devs and ops are fundamentally going to be in conflict. it's sort of a proxy war between devs who understandably dislike red tape and management who loves it, with devops caught in the middle and on the hook for both rapid delivery of infrastructure but also some semblance of governance.

An org with actual governance in place really can't deliver infra rapidly, regardless of whether the underlying stuff is cloud or on prem, because whatever form governance takes in practice it tends to be distributed, i.e. everyone wants to be consulted on everything but they also want their own responsibility/accountability to be to be diluted. Bureaucracy 101..

Devs only see ops taking too long to deliver, but ops is generally frozen waiting on infosec, management approving new costs, data stewards approving new copies across ends, architects who haven't yet considered/approved whatever Outlandish new toys the junior devs have requested, etc etc.

Depends on exactly what you're building but with a competent ops team cloud vs on prem shouldn't change that much. Setting aside the org level externalities mentioned above, developer preference for stuff like certain AWS apis or complex services is the next major issue for declouding. From the ops perspective cloud vs on prem is largely gonna be the same toolkit anyway (helm, terraform, ansible, whatever)

ownagefool · on Jan 16, 2024

Whilst often true in practice, this doesn't have to be true.

The reality is, a lot of these orgs have likely already discovered devops, pipelines, deployment strategies, observability, and compliance as code.

There's basically little in compliance that can't be automated with patterns and platforms, but in most of these organizations a delivery teams interface with the org is their non-technical delivery manager who folds like a beach chair when they're told no by the random infosec bod who's afraid of automation.

I've cracked this nut a few times though. It requires you be stubborn, talk back, and have the gravitas and understanding to be taken seriously. i.e. yelling that's dumb doesn't work, but asking them for a list of what they'd check, and presenting an automated solution to their group, where they can't just yell no, might.

steveBK123 · on Jan 15, 2024

Yes, of course management is often the problem.

I think it helps when people actually take a step back and understand where the money that pays their salary comes from. Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog. Sometimes the people that are the most hops from the money are the least aware of this dynamic. Bureaucracies create an internal logic of their own.

If I am writing some internal software for a firm that makes money selling widgets, and I decide that what we really need is a 3 year rewrite of my app for reasons, am probably not helping in the sale or the production of widgets. If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

These are the kind of orgs that someone may one day walk into, blast 30% of the staff, and find no impact on widget production, and obvious 30% savings on widget costs...

photonthug · on Jan 15, 2024

> If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

Well in this example, the ops team slowing down pointless dev work by not delivering the platform that work is going to happen on quickly are effectively engaged in costs savings for the org. The org is not paying for the platform, which helps them because the project might be canceled anyway, and plus the slow movement of the org may give them time to organize and declare their real priorities. Also due to the slow down, the dev and the ops team are potentially more available to fix bugs or whatnot in actual widget-production. It's easy to think that "big ships take a while to turn" is some kind of major bug or at least an inefficiency, but there are also reasons orgs evolve in that direction and times when it's adaptive.

> Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog.

Part of my point is that, in general, departments develop internal momentum and resist all interface/integration with other departments until or unless that situation is forced. Structurally, at a lot of orgs of a certain size, that integration point is ops/devops/cloud/platform teams (whatever you call them). Most people probably can't imagine being held responsible for lateness on work that they are also powerless to approve, but for these kind of teams the situation is almost routine. In that sense, simply because they are an integration point, it's almost their job to absorb blame for/from all other departments. If you're lucky management that has a clue can see this happening, introduce better processes and clarify responsibilities.

Summarizing all that complexity and trying to reduce it to some specific technical decision like cloud vs on-prem is usually missing the point. Slow infra delivery could be technical incompetence or technology choices, but in my experience it's much more likely a problem with governance / general org maturity, so the right fix needs to come from leadership with some strong stable vision of how interdepartmental cooperation & collaboration is supposed to happen.

marcus0x62 · on Jan 15, 2024

I've never seen an IT team that couldn't spin up a VM in minutes. I have seen a bunch of teams that weren't allowed to because of ludicrous "change control" practices. Fire the managers that create this state of affairs, not the devops folks, regardless of whether you "go cloud" or not.

sofixa · on Jan 15, 2024

I've met multiple customers where time to get a VM was in the weeks to months. (To be fair, I'm at a vendor that proposed IaC tooling and general workflows and practices to move away from old school ClickOps ticket-based provisioning, so of course we'd get those types of orgs).

And more often than not, it had nothing to do with managers, but with individual contributors resisting change because they were set in their ways and were potentially afraid for their jobs. Same applies for firewall changes btw.

steveBK123 · on Jan 15, 2024

I think a lot of HN crowd hangs out at FAANG/FAANG adjacent or at least young/lean shops, and has no idea how insane it is out there.

I was at a shop that provisions AWS resources via written email requests & clickops, treated fairly similar to a datacenter procurement. Teams don't have access to the AWS console, cannot spin up/down, stop, delete, etc resources.

A year later I found out that all the stuff they provisioned wasn't set up as reserved instances. We weren't even asked. So we paid hourly rates for stuff running 24/7/365.

This was apparently the norm in the org. You have to know reserved instances exist, and ask for them.. you may eventually be granted the discount later. I only realized what they had done when they quoted me rates and I was cross checking ec2instances.info I can guarantee you less than 20% of my org (its not a tech shop) is aware this difference exists, let alone that ec2instances.info exists for cross reference.

No big deal, just paying 2x for no reason on already overpriced resources!

shermantanktop · on Jan 15, 2024

I went from that type of world (cell carrier) to a FAANG type company and it was shocking. The baseline trust that engineers were given by default was refreshing and actually a bit scary.

I’m not sure my former coworkers would have done well in an environment with so few constraints. Many of them had grown accustomed to (and been rewarded and praised for) only taking actions that would survive bureaucratic process, or fly underneath it.

ownagefool · on Jan 16, 2024

The problem is the strong players are less likely to stick around, so you often do end up with folks who can't do the work in minutes - though, the work is usually slightly more than clicking the "give me the vm" button.

steveBK123 · on Jan 15, 2024

Teams are what they DO, not what they CAN DO.

marcus0x62 · on Jan 15, 2024

Ok, but I’m not sure what that has to do with what I posted.

acdha · on Jan 15, 2024

> If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud".

I haven’t seen this be due to one set of incompetents since the turn of the century. What I have seen is this caused by politics, change management politics, and shortsighted budgetary practices (better to spend thousands of dollars per day on developers going idle or building bizarre things than spend tens on infrastructure!).

In such cases, the only times where firing someone would help would be if they were the C-level people who created and sustained that inefficient system.

ownagefool · on Jan 16, 2024

They probably should be fired, but it's actually complicated because the orgs tend to be staffed with departments that believe this is the way things should be done, and best case the replacement needs to compromise with them, worse case they are like minded and you just get more of the same.