

IT Operations has a Cultural Problem - gabrtv
http://blog.opdemand.com/

======
gergles
Let's see. Linkbait title, latest buzzword, consistent use of "bureaucracy" (I
do not think that word means what you think it means - you cannot wave a magic
DevOps wand and make bureaucracy go away) and insinuations that operations
departments are "outdated"... Yeah, seems like a worthwhile article with a
good point.

I sure want Joe Random Engineer committing code that goes live on our real,
grown-up site, where we make real money, and a failure leads to us losing real
money. Ops departments exist so that that can't happen. If that means you have
to wait a day before testing your latest code in production, I don't see this
as a bad thing.

The "cultural problem" is in people who think that operations departments
don't need to exist because "like, how hard is it to run servers? We'll just
put it 'on the cloud' and magically all of our security, reliability, and
availability problems will be solved"

The biggest piece of nonsense is this concept of a "private cloud". What the
fuck is a private cloud? Oh, you mean a remote datacenter, like we've had
since the 70s. OK.

------
gabrtv
Jamming ephemeral cloud infrastructure into an ITIL-style bureaucracy is like
jamming a square peg into a round hole. You can push as hard as you want -- it
ain't gonna fit.

------
jvehent
looks like somebody didn't like it when boss said "no, you can't put the new
accounting system on heroku"

------
ddw
A few of the responses here are snarky and understandably so considering the
author's reasons for this post, yet the problem remains. How does cloud
computing fall within the traditional IT ops model? Anecdotally I've worked
for a large city and they're still a little hesitant of cloud computing
because they see it as a threat to their employees. I'm not sure how it'll
shake out but they'll move towards cloud computing eventually and developers
instead of operations could/should manage it.

~~~
gabrtv
I would argue that that operations engineers need to become more like
developers, not that developers should be operating critical systems.

The cultural problem can also be framed as a transition from a server-centric
operations model to an application-centric one -- something James Urquart
wrote a great post about for GigaOM: [http://gigaom.com/cloud/what-cloud-
boils-down-to-for-the-ent...](http://gigaom.com/cloud/what-cloud-boils-down-
to-for-the-enterprise-2/)

------
ogghead
"agile management" is pretty much always going to mean "less management," so
it's understandable that management is kind of schizophrenic about the DevOps
movement

~~~
Schmidt
It's not about less management, it's about trusting your employees and their
judgement. Accepting that failures happen and learn from the mistakes.

------
KevinEldon
In my experience this is completely true of large organizations, "Most
operations departments are inflexible and inefficient because they rely on
specialized engineers glued together with manual processes and a large IT
bureaucracy – all fundamentally at odds with the fast-moving, application-
centric world of cloud computing."

This is a cycle. If the management of the Operations organization is measured
based on reducing downtime they control what they can, Release & Change
Management. This kills frequent small releases, so development teams have to
build big releases. If management in development organizations are measured
mostly by delivering on schedule they cut scope. You end up w/ development
organizations delivering the minimum to ensure they meet the project mostly
artificial timelines for huge releases. Suggesting small frequent releases
sounds good to development (assuming they can reduce the operational paperwork
associated w/ releasing), but jeopardizes Operation's control of stability so
Operation's resists it. Suggesting that more get delivered in each huge
release jeopardizes Development's ability to meet project deadlines because
there is so much unknown and the commitment is expected up front, a quarter or
more (I've seen 18 months) in advance.

There are reasons for all of this; it's not bad people, just a consequence of
large organizations. Reducing downtime reduces costs because you can cut
support staff. Delivering on time increases productivity because code that
isn't being used is useless code.

------
drivingmenuts
If my local server providing a vital service goes down, I catch hell. If my
cloud server providing a vital service goes down, I catch hell and can't do
anything about it except bitch at customer service who has their own set of
priorities and a TOS protecting them from any meaningful action on my part.

So, what's the right option there?

~~~
gabrtv
Clearly the right option depends on the specifics of the service and the team
managing it. However, unless you have a spare server sitting around, you're in
the same boat either way, right?

Outages at serious cloud providers like AWS are usually restricted to
availability zones, though there have been a few high profile exceptions where
entire regions were affected. In general though, with AWS you can redeploy
your server rapidly if you have your infrastructure blueprints kept as code,
and your data backed up to EBS snapshots or S3.

Just this week I had a high-traffic Wordpress blog shit the bed on EC2/RDS.
Using the tools we built at OpDemand, I was able to clone the platform and get
it back up and running in < 60 minutes without any HA. I think < 60 min
recovery time is probably a stretch for most on-premise environments..

------
zenpocalypse
troll much?

