There aren't enough humans for cloud-native infra

dangus · on Dec 10, 2019

It’s a beautiful cringe-worthy big enterprise marketing piece preying on the technical ineptitude of certain companies’ leadership.

1. Point out how the world is changing so fast now, and that computers are super extra hard (databases are 500 components now, that sounds pretty hard)

2. Mention how you can’t just get competent employees and trust them to build anything!

3. Don’t worry, big consulting/solutions company (Cisco, IBM, Oracle, etc) has a solution to your deepest fears! Just contract out inferior products and consulting services at 10x the price of doing it yourself.

caust1c · on Dec 10, 2019

A shining example of how far off the mark Cisco is.

Pretty sure this is just a marketing piece, but regardless, the industry needs to stop thinking of software development and operations as inherently separate. Developers should operate the software they write.

crankylinuxuser · on Dec 10, 2019

That's fine for the software they wrote. But there's a LOT we sysads do outside of that area.

Have you seen a webapp go off the rails? There's millions of weird fail-conditions. Can they learn what to monitor, at what frequency?

Security... 'Principle of least privilege' is at odds with "make it work". I've seen devs assume the database server has commodity connection to the internet. They never asked if it did or didn't. And I've fought with them on the dev AD stack with an attempt in keeping the GPOs in line with production. But it's just 'too hard'.

And I've seen devs go 'just spin up some more machines'. But unlike yesteryear where we provisioned hardware for cost for X years, aws/azure/gce "allows" devs to have a direct unlimited budget tied to the company. We sysads learned responsibilities, and how to budget, and how to requisition. Someone pressing a button on a dashboard, or writing a script doesn't. You'd be surprised just how angsty the C levels are with this cloud computing and developers with near-free reign.

wwweston · on Dec 10, 2019

> You'd be surprised just how angsty the C levels are with this cloud computing and developers with near-free reign.

The flip side of this is the C levels who have absolutely no experience monitoring cloud spend but have heard the marketing pitches or smelled the "best practices" in the air and are sure that moving to cloud services needs to happen yesterday whether or not demand patterns justify it.

jrockway · on Dec 10, 2019

> Can they learn what to monitor, at what frequency?

Yes. That is part of the design of any system; observability is in everyone's mind when they are writing software. Software that fails mysteriously means less weekends and less time for drinking beer. Software that tells you exactly what it's doing means when something goes wrong, you fix it, and it never happens again. Or, it's so transparent you figure out the bug before you even check it in.

> Security... 'Principle of least privilege' is at odds with "make it work".

There is certainly value in having strong security principles, both in terms of policies and infrastructure. Infrastructure is key here; an ACL-checking proxy in front of all services, mTLS out of the box, the ability to write, deploy, and manage small services that are easy to audit... all help make security the rule instead of the exception. This is what all those "service meshes" and "policy frameworks" and "orchestration frameworks" aim to provide.

> I've seen devs assume the database server has commodity connection to the internet. They never asked if it did or didn't.

That's why the dev team should be operating that service.

> And I've seen devs go 'just spin up some more machines'. [...] You'd be surprised just how angsty the C levels are with this cloud computing and developers with near-free reign.

In my experience this has always been a utilization problem. One shared cluster for everyone is too complicated / expensive / dangerous / conflicts with empire-building, so every team has a bunch of t3.4xlarge instances that average 0% CPU load just because. Meanwhile, companies that care about their computers have seemingly-complicated orchestration frameworks so that they can extract value from every CPU core.

The good news is, a lot of what used to be manual (wrangling machines, consolidating resources, mechanical aspects of security) has some serious open-source investment. People don't need to schedule programs to physica computers anymore. People don't need to write authentication and mTLS code into their application anymore. You don't need to fill out a form and create a purchase order to get a TLS certificate anymore. Times are changing.

Your experience seems to have been shaped by supporting bad developers. We aren't all like that, though, and you'll find that there is less and less room for the incompetent these days. It's true of any field; someone builds you a house and the floor isn't level. "Don't build houses anymore," you say! No! Just don't hire the guy that did a crap job. Everything gets a lot easier when you set out to do a good job.

AgentOrange1234 · on Dec 10, 2019

Sounds like a recipe for disaster.

I’m a good programmer in my domain. I have enough ego to believe that I could probably figure out how to install things and get them working... for awhile.

But I know that in the long run I would fumble it so badly. I don’t follow security bulletins, I don’t know the first thing about how to set up database backups, I don’t know how to configure a web server, and so on...

taneq · on Dec 10, 2019

I don't think "developers should operate the software they write" means "developers should be sysadmins", more that developers need to be tightly coupled to the actual uses of the software they're developing. You can't write good software in a vacuum, no matter how carefully you analyze requirements and specify functionality. You just can't. You need to be close to the users. Even if you can't eat the dog food yourself, you need to watch the users eating it.

majewsky · on Dec 10, 2019

Exactly. The org (and all constituent team members) needs to think about problems end-to-end.

When you notice a problem, setup monitoring to collect metrics about the problem and alert on it before it becomes a downtime. Have ops write down their experience with that problem into a playbook, then have dev figure out which parts of the playbook can be automated, or if the system design can be iterated to remove the problem class altogether.

If you're a dev and you don't listen to ops, you're just as doomed as an ops guy who doesn't read the dev's manuals.

eximius · on Dec 10, 2019

I think you should be able to set up database backups (it can affect your overall architecture, so someone in development ought to know). I think you should be able to configure a web server for the same reason.

Security dependencies/updates? Those are largely transparent and a matter of noticing and updating. Operational changes like updating a systemd config or something. They affect your platform but not your application.

kthejoker2 · on Dec 10, 2019

Who know it was all so simple? Everyone should just be a jack of all trades and master of none!

Underestimating the complexity, variety, and difficulty of all the tasks of delovering professional-grade software is the biggest challenge in the industry.

It's like telling a soccer player to be his own cook, trainer, and psychologist. Specialization means letting the law of comparative advantage do its thing.

It has nothing to do with "what I'm able to do" and more about being optimally efficient.

eximius · on Dec 11, 2019

I didn't say it was simple. I didn't say you needed to abandon mastery of a field of your choice.

It is hard. But a soccer player should know how to make pasta, cook meat so that he doesn't get sick, even if he isn't a gourmet chef. He should be able to do physical activity without harming himself, etc.

We should all be generalists to some degree. Where on that spectrum of specialization we should be satisfied is a valid debate.

I also don't disagree that there are costs/inefficiencies generated by being more generalists vs more specialized. But it also depends on the size of the organization you can rely on. If I work at a startup, I may need to be more of a generalist. If I work at FAANG, I can specialize because I can depend more on the specializations of others.

danharaj · on Dec 10, 2019

How did you get good?

chadcmulligan · on Dec 10, 2019

> Developers should operate the software they write.

This was how things used to be in most places in the 80's and 90's - it was a disaster. Development and Production Support have conflicting goals, developers want to fix the bug/add the enhancement and release, production support wants to keep things going and have no downtime. Production releases and gate keepers became the solution, to mix them up again is bad.

Maybe if there was an infinite pool of developers that are skilled in all the applications that are being run and all the environments they run in then this would work but even then developers aren't good at keeping things the same, it's not their nature imho.

Edit: Not only that - the temptation to just pop in to production and fix the latest bug is too high, its only a little fix - what could go wrong :-) - if the developers have the keys to production then this happens. Gate keepers are good and make you test your code before its ever released.

indemnity · on Dec 10, 2019

There is a little thing called containerisation, immutable infrastructure and infrastructure as code that has happened.

This stuff is table stakes for modern applications.

Who still manually patches applications in production? Who deploys artifacts that didn’t come from a build/test pipeline? Who still SSH’s into a server and hacks a config file instead of applying the configuration change with a traceable commit / PR process?

Sysadmins who want to return the the glory days of the 90s, regaling us with tales about how everything we’re doing has been done before, and missing what’s different about it this time, and why won’t we just build .deb packages and hand it to them to run on a big baremetal server they will administer for us, that’s who.

I know, since my first 5 years of my career I was doing just that, working with crusty BOFH greybeards, before I went over to the dev side.

Ok, boomer!

chadcmulligan · on Dec 10, 2019

I think you've made my point. As well as developers looking after code and producing a solution that works and meets the business case they have to look after all these things as well. seems a lot of work that is best left to admin people.

oweiler · on Dec 10, 2019

Sorry but that is BS. Devs and Ops should work together to achieve a common goal, not work in separate silos. That's the heart of DevOps, not devs doing ops.

preillyme · on Dec 10, 2019

Cisco should really focus on providing an actual solution to the market.

sudosteph · on Dec 10, 2019

Disclosure: I develop automation for a company that provides managed cloud services for huge companies.

Am I reading this right? Are they really saying that the main issue is hiring DevOps/SRE engineers at scale is hard - but that hiring and keeping an additional team of AI engineers who understand cloud infra is somehow more achievable for the average company?

I just don't see that playing out well for places where cloud and/or AI technical expertise is not a core competency. There are managed cloud providers who specialize in these areas. If your problem is literally that you can't scale DevOps, it's at least worth pricing out those services before committing to hiring an even more specialized role for infra purposes.

thinkingkong · on Dec 10, 2019

I think its more hinting at a product suite or other companies that layer on top; not that you need to build AI solutions yourself.

That being said I think the major issue isnt devops hiring. Its having poor SRE practices, no breathing space for engineers, and scaling for the sake of it, most of the time.

carty76ers · on Dec 10, 2019

> You can’t just have a database admin,” Pandey explained. “A database is now 500 components. So you need your [site reliability engineer] organizations and your DevOps organizations to be aligned to that.”

LOL, what? Maybe for the cloud provider... but what’s presented to the end-customer is most certainly not 500 components.

hawkice · on Dec 10, 2019

I mean, they claim you can't just have a database admin. Let's be clear, you can just have a database admin. (For nearly all values of "you")

kbr2000 · on Dec 11, 2019

Yes, and even two or three, not to put all your eggs in the same basket...

_bxg1 · on Dec 10, 2019

One just has to ask oneself how much of this complexity is actually needed. Reminder: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

hinkley · on Dec 10, 2019

If I ever need an ego check, I go look at the number of requests per day Wikipedia gets. Our traffic is sad by comparison.

bobberkarl · on Dec 10, 2019

How can a database become 500 components? What are they doing???

SnorkelTan · on Dec 10, 2019

Probably referring to large scale deployments of Spark clusters where the number of servers you're running for the cluster is in the hundreds or thousands.

lykr0n · on Dec 10, 2019

I still count 3 to 5 components.

kthejoker2 · on Dec 10, 2019

Eh, i cant get to 500, but I think it's more like "the database" has been decentralized, so now you have

* a dedicated key vault instead of installing certs directly on the DB server * separate IAM / SSO * separate monitoring * separate logging and audit * separate vnet, subnet tooling * separate orchestration * separate object store / data lake * separate queuing * separate replication / backup * separate tools for streaming data, batch processing, job running

The only real additional vectors for a "Modern DBA" are CAP theorem, elastic scale, cloud security, multi cloud, and the sheer belwidering variety of platforms you may be required to support.