Hacker News new | comments | ask | show | jobs | submit login
The cloud skills shortage and the unemployed army of the certified (itnext.io)
195 points by rajeshmr 14 days ago | hide | past | web | favorite | 163 comments

I did some freelancing cloud work recently and was shocked at what I saw out there. From small one person operations to startups to large multinational Fortune 500 corporations I saw the same pattern repeated over and over. People using the cloud to spin up infrastructure that have no experience building out infrastructure and making really bad and dangerous mistakes.

Amazon, Google, Microsoft and others make it simple to get going but the devil is in the details when it comes to building it right, safe and ready for production. Just a few examples:

• Putting RDS SQL Servers on a public IP with no protection

• No templating of servers so if it disappears you have no record of how to get a new one running again

• Servers with SSH password authentication turned on with passwords of “Password”, because SSH key auth was “too hard for our devs”

• No backups because “it’s in the cloud, isn’t the cloud backing it up for me, like iCloud?”

I have an AWS Solution Architect Professional certification and agree you could get a cert and still not be qualified to run or design much on AWS but it does put you ahead of much of what I’ve seen out there.

I work for a large IT outsourcing company and I have seen the opposite of it with our clients. Most of our clients wants to be on cloud but without any change to their habits so we end up building complete traditional data centers in the Azure or AWS.

Most applications are simply cloned from existing On-Premises Infrastructure because every project have a 2 weeks deadline and every single stakeholder just wants to stick it in. Patterns like Auto-Scaling etc is just a far fetched dream.

Supporting such Infrastructure where every server is a pet trying to survive in a Slaughter House is a deeply painful.

In big companies, 95% of apps are still old school: firewall -- load balancer -- 5 front ends -- 3 back ends -- two database servers. And the people who developed most of these apps left these companies long time, what these companies have bunch of managers and some developers who maintain that code.

People of this kind are extremely conservative, because they are not in a position to fix major issues if something goes wrong--esp new architecture. So, they want to lift as is to aws.

A decade ago, I was involved in physically moving three solaris servers that were running un-supported BEA weblogic instances. When I powered on these three servers on a different rack, weblogic cluster failed. No IP change, just moved machines to free up racks in one row. And the people who developed the app, and the dev architect behind it were so mad at us. In the end, they gave up on that app.

That's how it goes.

It's not old school, it's what every web company is. A few load balancers, a few apps and a few databases.

In principal yes, they look like any modern web application but reality is different.

For example stateless web component is a primary requirement to allow servers to fail as well as auto scale. Most enterprise applications I have seen falls flat on face.

I have seen enough cases in which customer endlessly debates why his snowflake of a web server went down and expects a RCA with god knows what details.

They call this "lift and shift".

It's a terrible idea, but cloud providers love it because they end up getting paid a lot of money for idle capacity.

Exactly and any sane conversation is met with verbal arrows loaded with words like risk mitigation, minimizing business impact, being agile etc.

IMO this is why Google Cloud was behind AWS (talking 2008-2013 or so). Google bet on PaaS while Amazon bet on IaaS. The dumb corporate money is still behind IaaS, as inefficient as it can be.

Well a lot of these people are smart BUT who wants to risk his job for an adventure which if successful will help the company and if unsuccessful will probably get him fired.

That’s why the real money is in consulting. They can outsource blame.

There is nothing wrong with doing a lift and shift as phase 1. But too many companies stop there and too many “consultants” are just old school net ops guys and that’s all they know.

I said in another post that I was a dev lead for a company that decided to move to AWS. After dealing with “consultants”, realizing how shallow their knowledge was after I started learning about AWS and seeing how much we were paying them, I changed my whole career focus.

I got another job at a small company that was running completely on AWS who needed some “adult supervision”, self demoted to an individual contributor and I’m getting real world AWS experience so I can become an overpaid consultant who actually can consult with developers, devops, and infrastructure departments.

Same experience here.

I am "lucky", I think, as my employer has already been through some very painful experiences (and learned nothing, apparently) - and they seem content to let me do whatever the fuck I want, as long as I sufficiently song-and-dance them. And yes, I've made some mistakes along the way. But I've also delivered results I didn't even think were going to be possible when I started. Advice I'd give to people walking this path:

1) You begin in an emergency. Everything is always an emergency. Because production is down. And you will always be wondering why your customers just don't drop you. But do take a moment to establish baseline metrics. Some way to measure how much you're getting, performance-wise, in terms of uptime, records processed, users, customer calls per day, cloud costs. Whatever you can pull out of your ass for minimal effort. Get SOMETHING as a baseline. As you go along, if you miss a deadline, nobody will question you if you can show progress in your metrics against that baseline.

2) The theoretical ideal model web apps may perform radically differently when you start monkeying with the architecture. If the original developers are gone, if the code is more than 5 years old, it may not be feasible to debug the squirrelly behavior you see when you change from a traditional proxy to say, an AWS load balancer. You can burn a shit ton of time trying to troubleshoot things like this. Slapping a band aid on top of 10 years old band aids is not always the right thing to do. But sometimes it's the best approach to delivering results that keep people convinced to fund your effort.

3) even if nobody else in your company has a fucking clue what you're doing, or understands it; document everything.

I’m getting too old for BS and politics. I just keep my running shoes around my neck. When things get rough time to go.

That's ok though.

Get them on the cloud, then they can have the 'option' to start utilizing pure cloud type services, and migrate over.

Just reproducing their current setup on the cloud in many ways makes perfect sense.

Remember that for most companies, IT is just IT - it's an enabler, but not a huge differentiator. What 'works works', and tech change is expensive and risky.

Going to the cloud and shifting paradigms is two big steps, they can be taken one at a time.

But yes - any HN reader would be kind of 'shocked' to walk into almost any company in the world and see how 'high tech' really works. It's a valuable lesson in perspective, though.

Well it does not work that way in reality.

Legacy apps asummes a certain degree of reliability from the underlaying infrastructure. This assumption falls flat on its face in the cloud. Netflix wouldn't have needed to develop Chaos Monkey if this wasn't the case.

Secondly leap to cloud native never happens because no one wants to take the pain once it works even barely. Any issues can be pushed onto Infrastructure team. It's pure and simple risk avoidance.

I think that the core of the issue lies in their certification style: simple multiple-choices question, with many of them certifying that you understood that you should use the branded product (example: Amazon RDS instead of managing your own posters cluster in the cloud -- which might best suit your use case).

So you're tested on your understanding of the platform but not on your actual competences on it.

I've seen this in other areas however: people getting LPIC certifications and failing to run a proper grep.

In my experience, Red Hat is one of the few tech companies that get the certification program right by using an hands-on approach: either you actually are capable of performing tasks at a certain level, or you get not certification at all.

There are many professionals certified on AWS that aren't really able to perform properly on that platform, for example. Add the fact that plain old system administration I'd dying (less and less needed with the cloud and things like ansible)... And here we are.

Hm, I have a mixed opinion here.

The AWS Associate certs are similar to what you describe, particularly the "Solutions Architect" which is IMO common sense for a systems admin. The "Developer" one, though, requires memorizing some trivia around APIs and limits (eg., SQS queue sizes and stuff like that).

However, the AWS Professional certs are said to be MUCH harder, and while multiple choice, require reading a lot of scenario information quickly, and then selecting more than one answer (sometimes not being told how many answers). I haven't taken it but I expect to take the DevOps Professional exam at some point. I could be wrong, but everything I've seen so far, including sample questions, suggests it requires a fair bit of hands-on experience.

However, where everything I said becomes false is around the culture of answer dumps. I'm pretty sure there are dumps sold out there for cheaters to purchase. It's a pretty silly thing to do, especially because you'll be fired if you demonstrate incompetence in the field, but it does undermine the multiple choice test mechanism for certifications.

I can definitely tell you the Professional exams are harder. It’s hard to fake your way through that. I breezed through the Associate ones but the professional ones took time just to read and comprehend the questions.

I would never try to get a pure AWS job with just theoretical knowledge.

The Kubernetes certifications are also hands on, and actually require a lot of operational knowledge that someone who uses it on a daily basis probably wouldn't even be exposed to (like bootstrapping it from scratch as opposed to using more commonly accepted tools).

From what I can tell, AWS recently offloaded their certification program onto a third party, so I doubt it will improve anytime soon.

Yes, as part of the Certified Kubernetes Administrator exam, you spin up, configure, and debug 7 clusters.


(Disclosure: I helped develop the exam.)

How stable is the exam at this point? I was looking into it a few months ago, but it seemed that Kubernetes' own APIs and primitives were expanding at the time so I figured I should hold off.

(Also heard your interview on Software Engineering Daily the other day, very informative)

The exam had some rough edges in the first 6 months but has been quite stable for a year now. We update it every 3 months to match any changes with newer K8s versions, but the changes are quite minor.

And, thank you. I've been a listener of Jeff's for a while and enjoyed our chance to chat. https://softwareengineeringdaily.com/2019/01/14/kubernetes-i...

The worst part of reading your list is that as I read through it I noted my own encounters with each of them, and then some:

* what do you mean who installs patches? I think our cloud provider does.

* (same, but for "firewall" or "policy")

* "I think they turn them off automatically if they're idle"

and so on. The "devops" hype has put people who think nothing of operations at all in charge of something they don't understand.

On due diligence projects one of my temperature taking questions for AWS is "are you using services besides EC2 and S3?". Of course we'll dig into things, but it does give me a good idea of what that part of the DD engagement is going to look like.

It tells me whether they have bothered to learn how to do stuff in the cloud or are just applying their old patterns to the cloud (at that point, they should just have colo/managed servers - they would save money, but usually the driver for the cloud came from the C-Suite in those situations - either way some training and a cost cutting project usually pays for itself in those situations).

> because SSH key auth was “too hard for our devs”

This always kills me; the worst part is that keys are easier to use after you take 5 minutes to set them up once!

I consulted with a place where they brought in a contractor to do AWS stuff. He spent weeks troubleshooting tomcat on EC2, his resume said he knew cloudformation yet when he was asked to write a lambda function using the tool, I had to do it for him. Knew nothing about troubleshooting Postgres on RDS etc. Fast forward a couple of years and he had a couple of AWS certs and was now being employed by some firms to better implement AWS usage. Myself, have no AWS certs myself but have done migration and large project work for several firms. So I think they can definitely be useful for baseline knowledge. This same person I mentioned above also said the company I was at at the time (a startup) should hire him as an Architect yet he doesn't do any app-level coding etc. So I suppose while the cloud bootstrap work is around for now, I definitely see this type of stuff getting consolidated.

There is a difference between knowing AWS and knowing applications that run on AWS. Not knowing Tomcat and Postgres is excusable.

I’ve been using C# and MySQL and SQL Server for years. I’ve only done APIs in Node and Python with lambda. The on,y web server I know well is IIS.

Oh wow. Those are some bad ones!

I'm normally pretty certification averse (I don't even really value my undergraduate degree much), but I'm thinking of getting the AWS Solution Architect cert, because it seems like the material out there for learning that stuff is genuinely useful (and once you've learnt the material, you may as well get the certificate).

Cert + personal project, that's my recommendation.

Associate's cert is mostly memorization and high level, but doesn't qualify you for being a trusted individual to do the job. It's good to make you aware of the right way to build things in AWS.

For $2/month you can make a static website, posted to S3, with a CloudFront front end and web hook to lambda to trigger content refreshes on specific Git actions.

I'd hire someone who did this with no professional experience with or without certification for an entry level position. That personal interest and drive is worth 1-2 years experience in my book... And I don't look for more than 6 months experience in entry level (I offer internships/ hourly for that group)

I guess so. But that's the easy part of AWS, right? I'm pretty sure (having used AWS a little, but not CloudFront or Lambda) I could start building that now and have it done by this time tomorrow.

The tricky part seems like it would be dealing with tens-to-hundreds-to-thousands of machines which should automatically deploy and scale. And things like databases, which might also need to scale up/out at some point, and ideally should do without downtime.

That's much harder to try out on a personal project (due to the cost, but also due to the lack of genuine need shaping the implementation)...

You would be surprised. Most companies I work with are operating on only 1-3 machines, and using the CloudFront infrastructure to distribute content to edge locations around the world. They a handful of Lambda functions to trigger various random tasks like backups, notifications, git pulls, deploys, etc. Oh and S3... tons of S3 storage.

I have a client that does $50M a year on a SAAS product using only two EC2 isntances, but 1 of those instances is a developer/beta/test server. Only 1 is used to manage sales across the entire world. They rely heavily on Lambda and SNS & SQS as well.

Another client operates eCommerce at around $80M a year revenue using 3 EC2 instances. One of those is a test/dev server and the other 2 are used to distribute load and for redundancy with a load balancer sitting on top.

I have 2 other companies that are smaller that generates about $2M and $5M a year respectively that both run a static site hosted on S3, using Lambda for deploys and cloudfront for distribution like mentioned above.

Point being, you would be surprised how low key many company's AWS deployments are.


If I could pin one skill that will stand you head and shoulders above the rest among cloud architects, it is to learn Serverless functions. In this AWS world this is "Lambda". Microsoft calls it "Azure Functions" and Google Cloud Platform calls it "Cloud Functions". If you can write serverless functions AND are AWS certified, that seems to be the hottest thing that I have seen.

That doesn't really surprise me. It just strikes me that the level of knowledge/skill necessary to manage that isn't all that great.

It especially confuses me that people want people to "learn" serverless functions. Serverless functions are just like normal functions, except with a library to use for communicating with the outside world!

If I was hiring, then I'd be asking people about VPCs, IAM roles and Security Groups. Those seem to be the complicated bits that you need to understand to do pretty much anything. The service-specific APIs seem relatively simple...

Let’s be real - how many business projects honestly require tens to thousands of “machines”? And what fraction of IT people work on those projects?

There is a huge amount of IT work that never gets to that scale, and of the IT projects that get there, lots don’t actually need to be.

I guess it's all part of the "silver bullet" and "hype" culture. People want something that will automatically scale their new giga-scale web product that is going to be "the new facebook!", and managers want something that will make them rich without too much work, so they sign up for all the latest cool tools instead of thinking deeply about their real requirements and good old fashioned hard work. Some of the latest cool tools really _are_ amazing and worth every penny and more, but I'm sceptical about some of them...

> The tricky part seems like it would be dealing with tens-to-hundreds-to-thousands of machines which should automatically deploy and scale

You must be dealing with exceptional companies who have application which need auto scaling. And need alone is not a sufficient condition, application should support it too.

Most application are simply lifted from legacy with no code changes and serves at-most a few hundred users at max and will crash and burn if autoscaled.

Autoscaling is not just about scaling. It’s also about high availability.

I have two or three ASGs up with a minimum and maximum of two and an HTTP health check that will kill an instance if the health check doesn’t pass and bring up another one.

I also have applications that autoscale based on queue sizes.

Auto scaling means you will be provisioning the server on the fly when needed which manes names, IP addresses are not important.

Scenario I am referring to involves application team closely guarding their servers like babies which means they are married to server names and IP addresses. These are cloned from on-premises and load balancing is just sticky sessions going to one node with other one sitting idle until first one fails.

And that’s a bad design. The limit of the amount of times that I will get up in the middle of the night because of a non redundant, non HA process is one before I put that on the top of the list to redesign the architecture (not the program necessarily).

I know but that's how risk averse people at large companies behave. They are a) they lack understand and b) risk averse because risk reward ratio is high.

First group is rare.

I think auto-scaling is only really relevant to 12-factor-style stateless applications. Otherwise you have to do sticky sessions, and that complicates things considerably, and isn't even a silver bullet.

Even with on prem solutions, I would hope that most websites are behind a load balancer with at least two instances. Storing session state on the web server hasn’t been something that I’ve ever done and I have been developing websites since at least 2002.

Legacy applications may not be websites, and might be stateful (or so I am told).

But how many store data locally as opposed to in a database? Even for stateful applications, if you have the source code, you could put an http listener on it for your target group to ping as a health check and put your data on FSx. If the app became unhealthy, kill the server and bring a new instance up. Have a minimum and maximum of one.

I wouldn’t go that far though. I would probably use something like Consul to do a local health check first and restart the app if it dies and then do the target group/health check as a second level check.

I was thinking more like some off-the-shelf solution purchased years ago that is something of a black box to those that operate it... subtle failure modes, lack of instrumentation, and opaque inner state.

Agreed but IT bosses just picks up these keywords and start throwing them around as benefits of cloud. Heck I had a CIO getting excited about moving to micro services when his entire landscape had only COTS application.

This is also typical of companies that don't use the cloud. Some companies just have poor security practices!

Or they fire the people that know this stuff now that they are in the cloud :-)

Some of this is lack of IT experience in general vs with AWS, and just plain laziness.

Still, it's interesting because some of it requires that you actively work against AWS to achieve (e.g. publicly accessible RDS). Could be that they gave up on, say, security groups and just found that to be the easiest way to "get it to work" (a frequent overriding directive).

In any case, I'd put some of it on Amazon too. Their documentation always struck me as piecemeal and task-oriented; lacking a sense of overall architecture or design.

But, I suppose that's what certifications are for.

Never seen any of that in any startup. Did see not using terraform and inf being managed from UI and not reproducible from scratch.

Sounds to me like this is an opportunity to sell security/disaster prevention services to the companies you describe.

Walk in and show a list of externally visible problems (in a way that clearly does not suggest "hacking"), and propose a project to get them correct.

> Sounds to me like this is an opportunity to sell security/disaster prevention services to the companies you describe.

I have an acquaintance that is aiming at that market:


The question: "Why it’s so hard to find roles in cloud technology, while jobs go unfilled."

The answer: "The key thing here is that businesses are not hiring fresh-out-of-certification AWS experts to take over the infrastructure of their established operations teams. They want people with solid experience, a risk-free bet that the new hire can execute the tech flawlessly. This often means insane job requirements [...] and an existing track-record of success."

So jobs openings stay empty because companies have unrealistic expectations. Is that something new and surprising? This is the pattern in tech recruiting as far back as I can remember.

Wait. These aren't really unrealistic expectations. If you want transformation against a risk appetite of hardly any risk acceptable, you jack up requirements or no go. That sounds business wise to me. In my firm we've had many cloud related problems. The result was higher and higher demands in the direction of our suppliers. Obviously this limits the number of possible suppliers, and makes some suppliers unhappy, but it is proper business in line with the way we want to reach our goals.

It becomes unrealistic if those people are in short supply but the companies also try to keep compensation low.

If you need a special expert fast, but you don’t expect to pay a lot, then you are being unreasonable.

Judging by the claimed urgency and how wisespread the need is becoming, not every company will be able to get an experienced person for this role.

> If you need a special expert fast...

A job posting doesn't necessarily mean that they need someone fast, just that they'd like someone, someday, for a certain price. Companies have many important but not urgent goals that can wait for the right people.

Sure, but that seems not to be the case with cloud migration and engineering at the moment.

They really are, when every employer insists on 3 years' experience with a 3-year-old product and only 10% of candidates started on day 1. That's how you get 90% of experienced candidates unemployed while 90% of roles remain unfilled.

Especially unless they want to pay for it. Most experienced people already have all the jobs they need. No one else is going to train your employees for you. You don't want to pay the premium? Hire on potential and award those that do well. I don't know how many of my friends in these kinds of roles had their pay stagnate while the company were hiring e.g. consultant. Eventually they leave and things gets even worse for the company when the new project fails, because now they don't have either old or new infrastructure working properly.

Wait you mean companies should pay for training? That's unpossible. I thought training on high level systems just happened by unemployed people taking free Youtube courses or paying for their own training. Why are we asking so much from companies. It's bad enough we only give them massive tax breaks but now we want them to train people? What's next paying them well for experience? /s

Yes, AND one more factor no one is mentioning: resume inflation. If people read that they need several years of experience in a product or tech that's only several years old, like Kubernetes for example, then they will start to lie. And since they start to lie, others feel they should lie, too. The moral of the story is: tech management frequently doesn't know what it's doing.


But what they should do is have hiring managers who understand that people with experience and wisdom, and ability to learn, can be applied to new technologies more effectively and securely (with past experience of seeing what goes wrong in infrastructure) than people with 5 years of the latest buzzword technology. Have a look at DevOps jobs descriptions lately and notice:

. 5 years experience in AWS/Cloud Deployments

. CI/CD pipeline

. Containers a must (Kubernetes / Docker)

. Basic bash scripting

. One of Python/Ruby/Node/etc.

. BS/Engineering Degree or equivalent experience

That's it.

Now you understand where horror stories in infrastructure come from, and why companies hire expensive consultants or "DevOps Shops" to repair shoddy setups.

This, however, is one step above the following:

"We're seeking a full-stack developer to develop our front and back-end systems, including designing and enhancing AWS infrastructure utilizing Docker / ECS and CI/CD pipeline via Jenkins or equivalent CI system." That's the Startup 10X Engineer Special that pops up a lot.

Yup, good luck!

I have a certification, and I hire people after asking for AWS certifications on the job description. My best hires have been people with IT experience, with or without certs.

Honestly, AWS solutions architect cert is to easy to memorize and pass, apparently. I had someone with 2 certs not know which AWS service would work as a good "serverless replacement to running scripts".. (Lambda) and didn't know how you could schedule lambdas to run every few minutes or on a Cron expression.

Basically, people think this cert means a lot, but it doesn't on it's own. It's like getting A+ certified without any evidence you ever touched a computer component. People are putting the motherboards in the right place, but don't know how to orient it.

Don't get me wrong, the cert can have value... But I like for people who have the cert and use AWS in some way... Even a GitHub webhook to Lambda.. or an s3 static site with CloudFront... Some evidence they actually touched and used an AWS service.

So yes, I don't like freshly certified individuals, because the AWS _associate's_ certs are NOT an expert certification. Sure they are slightly hard, but they are certainly high risk if you expect a solutions associate to migrate your data center or databases. Someone with 3 years IT experience has a better shot

I’ve been going down the certification road for a little over a year and should have 7 (now 4) by the end of the year. The company is paying for them and paying me above the local market average (I’m not talking about SV money here) - especially considering that I negotiated not to be made lead.

I started down this road as a dev lead for another company that wanted to “move to the cloud”, but at the time I didn’t know anything about AWS. Studying for the associate cert was mostly to get an overview of AWS so at least I could talk intelligently about from the infrastructure side and know what I didn’t know from the development side. I knew I would have to dig into the SDKs to actually do anything.

They hired a bunch of consultants who were old school net ops guys that didn’t know anything about AWS besides how to mirror on prem infrastructure. They were “lift and shift” guys. Of course it ended up costing more. I didn’t know then that AWS always costs more unless you are willing to change your processes, let a few of your infrastructure people go and let AWS do the “undifferentiated heavy lifting”.

I saw the situation was hopeless and all the company wanted to do was host a bunch of EC2 instances and the netops guys and consultants were so entrenched that there was no hope of me actually getting any real world experience with AWS.

I changed jobs and found a small software company that was already on AWS but the newly minted CTO wanted to actually take advantage of AWS. The MSP they had chosen before he came aboard were also a bunch of lift and shifters.

I was hired with no real AWS experience to lead the effort. After doing a lot of green field small proof of concept projects and fumbling through, I feel badly for anyone who tries to implement anything on AWS with just the certifications, without the industry background, and not working for a company with a business support agreement plan with AWS.

Good info.

You are right, AWS can cost more, but I disagree with "always". I run my personal website for $15/year, and most of that is the domain.

I would say: if you're trying to list and shift to AWS, you're doing it wrong, and it will always cost more. You also can't really fire many people aside from pure network guys. Server admins will still be needed for the virtual servers.

If you take the opportunity to redesign, you can realize cost savings. I still advocate avoiding layoffs, because you can retrain competent ops people to support the new infrastructure. The grizzled old guys that have trouble adapting are still usefull because chances are high you are not fully serverless within 3-5 years after a lift and shift.

As you become more familiar, you realize the tweaks that can be made to save money. We used to spend $600/month on 3-4 concurrent CICD workers. We swapped to docker-machine Spot-based CICD and spend $100-140/month for 12+ concurrent workers (we started at 6 and keep raising this number because the costs simply are so small)

Our front end is all static files (JS driven API calls). The $100/month server is now under $5 month for S3+CloudFront.

So I agree, AWS always costs more if you are trying to replicate a physical data center and ignore all the possibilities for improvement... But with slight tweaks (even starting with auto scaling.. or even more simply, timed lambdas to turn on/off servers overnight), you can reduce costs. Migrating to containers is also a huge cost savings, since you can pack more work into fewer servers without worrying about as much (I lived the nightmare of virtualenv's and managing 12 apps on 1 server.. I prefer containers now as an ops guy)

My goal and marching orders are to use AWS managed services for anything possible. We (the CxOs and myself) agreed that out of all of the business risks we had, our dependence on AWS is the least of them.

For builds we are completely serviceless. We use CodeBuild with mostly the prebuilt Docker images and one custom image to build some .Net Framework code. We use CodePipeline to deploy lambdas with CloudFormation and Code Deploy agents for EC2 instances.

For any new work, we have to justify a need for servers instead of using lambda for both APIs and back end ETL work loads - with step functions. There are a few back end workloads that we will need to transistion to Docker eventually but even then we are looking to use Fargate.

I have liked Corey Quinn's advice about saving money in the cloud. Here's a tweetstorm about the topic:


I could not tell you the names of many of the AWS services, but I have lambda functions running in AWS (written in Go). They monitor my DB loads in RDS. I get email and text alerts when things are not as expected via SNS, but I can't recall the name of the AWS service that I used to schedule that alert... CloudWatch maybe. For sure it was CloudSomething ;)

I'm not AWS certified, but I've been a technologist for many years and I majored in CS, so it all just sort of makes sense. Some names such as ELB, RDS and Lambda are intuitive. But some are overloaded, for example Alexa. They have Alexa for business and Alexa Top Sites (domain name research). BTW, has anyone else used Alexa Top Sites in AWS, it seems way overpriced to me.

I'd be leery of someone who knew all these names, but didn't have much actual experience as a technologist, but in an interview, they'd probably sound better than I would.

Your response is exactly the right kind for an interview.

"I don't use those AWS services, but let me tell you what I have used". I agree, most people won't know one of: Sumarian or Sagemaker, but chances are you can easily know most features that matter for your job interests. Most new AWS services over the last few years are niche markets. A good company to work for would not expect you to know this stuff without experience. I personally don't expect someone without certifications to answer AWS questions, so it's always a plus when they can. But if they have certifications and can't answer any questions? That's a big red X. I can't speak for everyone, buy I usually have 2-3 different questions on different areas, and if you're certified and have used AWS, you should be able to get 2 of them.

Ex: what is the difference between EC2 server types M, R, and C.. my previous lambda question, and usually a question on ECS or CloudFront

I have used Alexa Top Sites but have never paid for it. There is already a free csv you can download daily that contains the top million sites. For more granular lists, Builtwith often satisfies my needs.

Lots of questions about this. Some say it is old, stale data. I don't believe it is current. They charge 25 cents per 1,000 queries in AWS. Why give that away for free?

See thread here: https://gist.github.com/chilts/7229605

It is good enough for my needs even if its stale. I just want the top 100k sites and dont care about trends, and precise rank. I can see how someone else would want to pay for it though.

I maintain they bought that service just for the name (and toolbar installs).

Let me also state: If you have a certification and are having troubles landing a job, do a personal project to highlight your capability. It doesn't have to be hard.. or expensive! It barely needs to even look good -- if you're a back end guy, getting into an interview room and taking about the great CICD or script setup you did matters more than if it's a white page with black text... And if you're certified, you should know there's plenty of free it under $1/month options for AWS service. I hired one guy without a cert because he showed me the game server he set up in AWS to play with friends

I'll never understand why so many businesses in this industry seem to be so short-sighted when it comes to talent development. What these companies need is an experienced leader and a small team of more entry level people, some of whom will stick with the career and become tomorrow's experienced leaders. This is how it has worked for time immemorial. Why do so many companies seem to be looking for only one or the other level of experience? You need both!

With average stay 2 years as is statistic right now, the person will leave before or around the time they will become experienced.

Those people may stick with career, but not with company. Moreover average startup does not last two years either, so the company won't exist at that point.

All the company has to do is pay the person their new market value. I came to a company with lots of development experience with the sole purpose of getting AWS experience and leaving in two years.

After about one year, I had a discussion with my manager, I said I did everything I said I would do to help him meet his technical goals, I have both the theoretical knowledge and real world experience and my market value is $x. They gave me $x and now I have no plans on leaving soon.

You're right, it's a collective action problem. It would be better for everyone if everyone developed entry level people, but looking at just your own company, rather than the whole trade, it may make more sense to avoid investing in entry level employees. So I guess I lied when I said I'll never understand it. I do understand it. I just think it's misguided.

That, and also that in emerging fields like this, everything moves so fast that most people 'at the coalface' have learned substantial parts of their skillset on the job. So when one of these people leaves, you're not looking for "a PHP dev" or "a DBA", you're looking for "someone who's good at PHP 5.6, in-depth knowledge of postscript interpreters, and also adept at kernel hacking and assembling servers" or whatever odd combination of roles the outgoing person had taken on and learned during their tenure.

The idea of “you can find a job immediately after getting a degree or certification” ended a few decades ago. Today, you need some experience, especially in cloud computing.

Thankfully, in this field there are few barriers of entry, save for having some natural curiosity. All clouds have free tiers, excellent documentation, sample projects, etc. Kubernetes is free and open source. Technologies are interesting. Play and learn, and you’ll get a job.

Completely agree.. to many applicants I see think having a cert alone makes them gold, but they can't answer a thing about the topic that's not in the certification exam.

APPLY the certification in some way that YOU own and you're better than 50% of applicants I see. Many have certs, some claim some knowledge.. but few seem to actually have done anything themselves... They claim cloud CICD knowledge, but it was set up by someone else, and they don't know what a security group (firewall) is.. or they deploy to CloudFront, but don't know where the files are behind that... It's frustrating as a hiring manager spending an hour on these interviews with people who expect that paper to mean something without even 1 week of their own experience to back out up

As a developer, it's quite frustrating trying to work out how to communicate "No, I actually know how to do X (well)", because presumably the people who don't will say that too!

It's fine once I get to interview, because most interviewers can tell when someone's bullshitting (and that I'm not), but CVs are a bit of a nightmare because I have fewer years of commercial experience than a lot of people.

I recommend fine tuning resumes for the job requirements, and a cover letter that emphasizes a passion for the language they are looking for (or a desire to learn something you can't claim experience in that is requested). Publicly available samples of work also helps.

Mega corps that machine parse applicants are a pain to get into, but you probably don't want to work there anyways :-)

Don’t blindly submit resumes. I always go through local recruiters who I’ve used over the years. They vouch for me with the hiring manager. I can say in 20 years I’ve never had a recruiter come back to me and say the company wasn’t interested unless they had already filled the role or they had budget issues and closed the req.

There's also free training available for Kubernetes on EdX: https://www.cncf.io/certification/training/

Only personal tinkering doesn't count as "experience" in those companies books...

I tend to hire a lot more people that personally tinker without certifications, than those with certifications that don't have a pet project.

Certification proves memorization. Tinkering proves interest. I don't want a candidate that is not interested personally.

It does count on the interviews. A lot.

Create an LLC. Now everything you do is "selling cloud enterprise-solutions" (as all businesses are enterprises and you aren't going to sell to your grandma).

I think personal projects count a lot at CV filtering stage.

Knowing Kubernetes will not help you at all with most cloud infrastructure. But they all allow you to set up a free account.

>They want people with solid experience, a risk-free bet that the new hire can execute the tech flawlessly. This often means insane job requirements [...] and an existing track-record of success.

So much this. I've seen job listings that say "DevOps" and they really mean "SRE" or "Software Developer to own the whole SDLC w/On-Call Rotations".

Of course, the industry is changig but I think the industry hasn't let the "lead-in" occur, which existed in times before, and carry-over into the new momentums. For example, requesting 5+ years' experience in enterprise-level deployments in Azure and/or AWS isn't precisely synonymous with 5+ years in OOP.

I think that the reason for adding to the insane job requirements is that they're trying to condense the whole SDLC down to three people - at most - (e.g.: a PM and two devs) and still produce Fortune 500 level code products for the world to consume; thus, making the actual intial cost overhead to produce the initial product much lower.

This is, of course, pure conjecture and you're welcome to tell me that I'm talking out of my ass but, as far as my interviewing experience goes - so far, they want someone who's owned a small product (from an SRE perspective) for a few years' time, which doesn't really "translate" across the industry, yet.

SREs might come from Google, FB, Twitter, MSFT, Avanade, etc. but doesn't seem to - as far as I can tell - translate to an industry standard, across the board.

So, you'll have a developer with 7+ years' experience with OOP and maybe does the full-stack but, because he's never been exposed to things like being responsible for the deployment of the product (because it's responsibility of another team at his company, to make sure it goes through a security review cycle, has been tested in tested pre-prod environments, etc.), he'll automatically be "disqualified" from being considered for the given position - because the company doesn't want someone who can learn it, they want someone who can do it - literally, starting the next day.

(Sorry for the long tirade but the insane requirements are only insane because we don't understand the rationale behind them; understanding the reasons "why" helps us determine how the whole sector is shifting to new "paradigms". <insert Office Space reference here>.)

Cloud tech also changed the economics of operations and support. 10 years ago you might need 3 trained ops folks to run a production service on a given tech stack. Often the only real requirements for these jobs was a competent claim that the person could keep the lights on when the server went down at 4 am.

Modern cloud shops don't need dedicated staff to manually reboot servers, manually install software, or manually handle unavoidable hardware problems. Often what they're looking for is core software knowledge to add automation, fault tolerance, and scalability to their stacks under the banner of platform/devtools/backend software roles.

I am sick of this non sense. It's because nobody knows what they want and those that should be the ones knowing it all also do not know what they know and what they need to learn.

We built immensely complex systems that allow us to scale. The thing is we have increased the complexity so much that we no longer can justify hiring incompetent.

Simplicity is beautiful and we will have a renaissance of simpler systems. Something along the lines of what Go brought to the game.

>Something along the lines of what Go brought to the game.

Where the biggest adopter, Kubernetes, is ironically a way too complex system of automation routines that still regularly fail in production environments. And oh dear, if they fail, it's gonna be hard to figure out what exactly went wrong. We uncovered a bug in the job controller just yesterday that was probably not biting anyone else due to how a race condition usually plays out, but this kind of bug is way too common.

>Where the biggest adopter, Kubernetes, is ironically a way too complex system of automation routines that still regularly fail in production environments.

regularly fail? Do you have any data to back that up? No, a google search for one kubernetes failure doesn't count as proof of "regular failure"

I'm in an SRE-like position for an otherwise unmanaged Kubernetes cluster. We have code merged in the repo as well as a few PRs still open.

I should note that by regularly I mean edge cases and quirkiness, not that your run of the mill cluster mostly using deployments is going to randomly explode. But our teams still find ways to regularly break it in unexpected ways.

So I guess you could call that anecdotal if you want.

Yeah that's the irony of it. But Go is just a tool. People were building monstrosities with bash before and will be long after we build the mythical AI Cloud.

> But Go is just a tool.

Yes, but there are better tools suited for the job. Ironically, or perhaps not, Java or .NET would have faired better due to maturity of debugging and introspection tools.

> Simplicity is beautiful and we will have a renaissance of simpler systems. Something along the lines of what Go brought to the game.

Go is a (IMO too) simplistic language.

It does in no way follow that it will be used to build simple systems.

I don't think simplístic means what you think it does.

Go isn't simplistic. It might be seen as simple but it isn't so simple that it loses utility. There's plenty of nuance and elegance in there to solve real problems.

A simplistic version of go would have a bunch of wizards and a giant code generator that runs after you click Next enough times.

> A simplistic version of go would have a bunch of wizards and a giant code generator that runs after you click Next enough times.

You mean like Go and how people compensate for lack of generics? Generating code?

That kind of simplistic?

> You mean like Go and how people compensate for lack of generics? Generating code?

Not to mention how mocks for unit tests are generated. Golang is a step back from Java and .NET

"The jobs opening up include a requirement to know all these skills in addition to everything else."

I've been casually looking at job postings and sometimes wonder what drugs the job poster is on when writing job requirements.

I you really want to have fun, just look at the job posting for your own role in the organization. Then, you can really compare stuff.

Funny story: Once upon a time, I got promoted, and we needed to backfill my (old) position. They proceeded to post a job description that 1. I might not have qualified for, and 2. I never would have applied for.

That's partially because job postings are an SEO play, so that long list of requirements is more about trying to get as many eyeballs as possible on the page. Job postings used to be as short as possible because you were paying per line in the newspaper. Search engines changed that.

That doesn't make sense. They can just move all those requirements from 'required' to 'desired' section and get the same SEO out of it.

Small companies looking for a qualified person will bother. Large companies with hundreds of applicants won’t bother.

Every year cloud providers announce tens of new services, poorly tested and far from production ready. Non-ops eng teams and management are always happy to use the latest buzzword out of AWS, then ask their ops team to make it somehow work.

Certifications are moot. I don't care if you are certified, or if you have used a cloud (or kubernetes) following a tutorial. I do care if you have seen enough issues throughout your career, so you can debug every time (not when, nor if) mud hits the fan. I don't care if you can type a port in a yaml file or in your browser. I do care if you know how many ports are there, what's the ephemeral port range and so on. Systems are 75% experience, 25% basic knowledge, you can't get away just with certifications which matter only after you get experience and knowledge.

If you are experienced in ops, I know you can learn and work out any cloud. If you claim you are experienced with a specific cloud though, then better have to share a ton of war stories or it will just be a huge red flag on your CV.

Let's call it for what it is, clouds, like data-science are hot fields right now. It's only natural that many people will go after a good-paying job claiming expertise in the field.

If you have experience in ops and don’t know the specifics of how to do operations in the cloud and take that same mentality to A cloud provider, you are going to end up spending more and not get any benefits from being with AWS/GCP/Azure.

Indeed, but ops people have to learn new technologies all the time, so it's not that big of a leap. When you talk about a cloud and see a sysadmin rolling their eyes, it's not that they have something specifically against the cloud, it's that they've seen so many diverse bugs and issues in their life, that they know you just exchange one set of problems for another.

Also, a challenge to using clouds more effectively is you need to bring onboard your developers. Ops people often have to act as coaches and basic systems' knowledge is very important in this role.

It’s not just about learning a new technology it’s about learning a new mindset. Unfortunately part of that mindset is giving up control over the underlying infrastructure, automation, and could be making yourself or some of your redundant.

Not too many ops people that I’ve dealt with that move to AWS reach for Python, CloudFormation, creating custom resources with lambda to plug into CloudFormation, etc to automate. Most won’t even use System Manager to automate EC2 maintenance.

I've seen a good number that do once the org makes it a priority and allowed money to be spent on learning and experimenting.

From there, some people still want to stick to their old knowledge, but chances are you still will need that. Only start ups are "easily" going 100% Serverless. And even then many still want a few 24/7 servers. So there's still a job for older ops.

Ops people tend to be slower because brand spanking new tends to be over hyped and the brand new thing was designed with a goal for ease of use and performance. Security will be added on perhaps next year. Systems manager is cool, but if you already need to use something else to orchestrate, it's often easier to use that and keep a smaller toolset.

It’s not about going serverless, it’s about how to automate the management of servers. When my manager first came, the one ops guy we had was manually going through a list of EC2 instances and patching them. That’s an old mindset.

We have a third party web application that was on one server and when it went down, they would get an alert and reboot it. That was an easy win - put it behind a load balancer and an autoscale group with a minimum and maximum of 2 and http health checks so it would just kill the instance and bring a new one up.

This same guy saw a lot of EC2 instances that he didn’t recognize come up in the middle of the night and wanted them to follow our naming conventions APP0x and wanted to know the IP addresses. It took me forever to get him to understand that they were ephemeral, were part of an autoscaling group based on a queue, and that he couldn’t put whatever old school alerting system on them and they were all tagged with the corresponding autoscaling group and environment. The best I could tell him was the subnet that they would be launched in and the corresponding CIDR block.

Just curious - if the ephemeral instances are caused not by regular usage, but a mining botnet, will you have the tools to detect what is wrong (before the bills skyrocket) and mitigate?

That’s interesting. I guess we could set up something that used a combination of CloudTrail/CloudWatch Event/lambda that monitored any EC2 instances that are launched without the required tags and alert someone.

We do have billing alerts set up already that would warn us something went wrong we would have to investigate to find out what.

Better alerting and monitoring is one of our goals this year.

Heck I’m still finding places with hard coded keys in programs instead of letting the SDKs get the keys from the default configuration file (development) or from the instance role when they are run on EC2/lambda. Unfortunately all of the examples do it and the developers didn’t know any better.

"The Creeping IT Apoclypse" is on the same theme:


"Hi! I'm your new cloud outsourcing specialist. I'm here to take your old IT structure and those servers in the closet devils-you-know with something so intangible and 'elsewhere', when it stops working some day you cannot possibly send an army to the closet to fix the problem. You can all just wallow in your helplessness and go home and wait for a text message. When do I start??"

Yeah, except that team in the closet is now one of the most experienced teams in the world responsible for thousands of companies and managing a global backbone infrastructure that a smaller company couldn't possibly pay to maintain.

Unless it’s Azure as they make small company mistakes like forgetting to renew certs...

YMMV for anything you can’t see or touch regardless of the size or experience or reputation so far.

Just partially related to the article, but mentioned in it nonetheless, I just wanted to ask if “serverless” is now the new hype-word instead of containers/Docker/K8? I’m curious because I’ve started reading about it in different contexts more and more lately but afaik it doesn’t come with any clear advantages over the “classical” way of doing stuff.

Serverless is usually used to describe things like AWS Lambda, or Azure functions. You write your code, and push it to the cloud, and don't really think about the infrastructure. It is "serverless" in that you don't need to think about individual servers. You just deploy your code, and only pay for the time it actually spends executing.

I've also seen serverless used to describe services like Google App Engine and the Azure App Service. They're a bit less granular than cloud functions, since you deploy entire applications instead of individual functions. Again, you don't really have to think about servers or infrastructure. You deploy your code, and it just runs and scales up as necessary.

Docker and k8s and containers are great in general, but require a lot more hands-on active management than the services I usually see described as serverless. Though even here, services like AWS Fargate and Azure Kubernetes Service and (I think) Google Container Engine take a lot of the manual work out of container orchestration.

Everything is a semi-rational reaction to something else.

Companies used to own and run their own machines on-premises. But doing it properly (HVAC, power, raised floors, standardized racks, management, redundancy, preparedness...) is expensive and not actually most company's core competency. So they moved to colocated datacenters, where a competent company would take care of the infrastructure for a fee which would hopefully reflect a discount based on savings from scale.

But managing generations of servers at colo datacenters takes manpower for hardware replacement, upgrades, cabling, and generally doing things right; the customer companies mostly don't have that as a business feature but see it as a cost. So managed infrastructure services came up, where the datacenter company leases you the hardware (standardized so they can have spares) and racks it for you and puts in network switches and remote services and gets it to the point where all you have to do is log in to a console server and install your OS and start deployment.

But sysadmins who can keep track of security updates and package dependencies and keep an OS properly organized are relatively expensive, and what used to be

"well it works on my laptop"


"well it works in my container"

and the container itself gets shipped out. It's cheaper not to do security work, you know.

So VM and container accepting services come up, where the datacenter company now runs a hypervisor and managed storage systems for you, in exchange for the salary of those sysadmins. The start-up costs can be much lower, to the point where it really doesn't make fiscal sense for any tiny company to do anything themselves except ship their containers/VMs out and puzzle over IP allocation schemes and load-balancing services.

It's a trap, but a delicious one.

So now your company is hooked on the ease and speed and cheapness of just spinning up another container or VM any time someone expresses a desire, or even automatically, when you get the bill at the end of the month and you managed to spend how much? That's ridiculous. Why are we getting billed for containers that we don't even need to run all the time?

Along comes "serverless", which is a logical successor of inetd. Yes, inetd, the very old "internet super-server" which would read a table of ports and programs, open all the ports, and when a connection came in would run your program and connect stdin/out to the socket, isn't that great?

Serverless is just a management system that spins up your very restricted single-function program on demand -- now it answers an HTTPS API instead of a raw socket -- and does the accounting work to make it profitable.

Complexity is simultaneously the enemy of correctness and the source of profit.

This is so well written and beautiful. Thanks for putting it on this perspective.

Loved your inetd/xinetd analogy.

Not having to do any infrastructure management and being able to scale from 0 load paying for 0 load to peak traffic paying only for what you use is pretty incredible.

There are definitely some workloads that lend themselves to serverless and some that don't -- it's not the silver bullet a lot of evangelists say it is -- but when it works it just works.

Typically we have shortening 5-year cycles in tech that cause clamoring for new skills. First it was the internet (“HTML! Perl! PHP!”) then mobile apps (“Android! Objective C! React Native! Flutter!”) and now cloud (“AWS Certification! DevOps! SecOps!”). IoT and machine learning are probably next. The technology itself doesn’t really matter — companies just want an army of tactical workers to build this stuff tirelessly.

It's Kubernetes and microservices! Even though Kubernetes is what, four years old, where I am, numerous employers are chomping at the bit and demanding that you be an expert at these. Now, I'm a crusty old Linux sysadmin/programmer who puts "SRE" and "DevOps" on his resume, and I'm noticeably getting spurned because I don't have "5 years experience" with "containers".

At first I was worried that my skills were obsoleted, having worked in large enterprise environments doing infrastructure engineering for over 15 years. I worried that I was too "systemsy" and not enough "cloud-y". Oh, and even though I spent years using TeamCity and Jenkins, I was also spurned because I didn't use the right lingo within "CI/CD pipelines" and demonstrate a Docker deployment system exactly like they used.

Then I realized something.

The job descriptions are asking for five years experience total, number one, and the second red flag: the salaries and contract rates are lower than the rates for large enterprise-y type jobs (minus management).

Conclusion: hipster tech coincides with youth and coincides with companies following trends and buzzwords over wisdom and experience.

Back to the point. Someone with a strong basis in the core disciplines of systems design, infrastructure automation, perimeter security and so forth are going to be able to adapt to AWS knowledge, as I have (for the record I have two AWS Associate certs and didn't find them hard). And as the author points out, there are these 5 year cycles where all we look for is experience in certain tech. Is it any wonder that there are horror stories about fragile and insecure cloud environments?

It goes back to the incompetence of the hiring process which we see again and again discussed on HN.

> "10 years serverless" as a hard requirement...

So what we have here is a failure in the hiring process. The people writing the jobs don't actually know the space, but are looking for candidates with literally impossible amounts of experience.

Oh, that's easy;) See, "serverless" is just the new name for what we used to call CGI. There are people who've been doing it since the '90s! (Sure, we made it nicer since then, but if you squint just a little...)

We refuse to hire any Kubernetes developers without at least 6 years experience.


Sometimes I think that it is a way to filter out those the company doesn't think are "go-getters", or to have something to hit back with when salaries are discussed.

That S-Curve on cloud seems off.

Most Federal agencies are on or moving to Cloud services right now. They are squarely toward the top right of the S.

I don’t think you can say only “large trailblazers” are there. We can argue about how many services have moved and how it’s being used but the whole “let someone else manage it” argument is long done and normalized.

A lot of statements in the article seem off. The position on the S-curve, the claim that TDD replaced QA teams, even the fact that “serverless is coding for hardware engineer” or however it was formulated (the author probably meant “infrastructure as code”). With so many approximations in the argument, it’s hard to accept the conclusions of the article at face value.

right, but you basically can't get a job there, unless you live in Virginia.

29 year IT veteran here. I don't agree with a lot of this article.

I remember panic-talk when outsourcing began. When offshoring arose. Etc. None of these extinguished the need for IT workers.

I view the cloud as the next generation of the mainframe. By that, I mean your experience and understanding of the environment (nobody will master all of it) will be a career differentiator. Knowledge of Kubernetes, Istio, etc. will help you to navigate the next career paths. System admin skills (and associated extras, like networking) will likewise move you up the ladder.

Bill Gates once said there is an infinite number of problems yet to be solved, an infinite amount of work to be done. I believe him.

Disclaimer: ex preferred partner early AWS client-facing Fortune 200 consultant.

The senior sysadmin at the DMDC around mid California coast: only one certification, VMware. A French offensive exploits broker, solid researcher and reeigne: no certs. I only have a few certs because work paid tens of thousands for offsite training. A few certs are okay because corporate HR wants staff to demonstrate continual learning and achievement, lots of low-quality ones (A+) are a negative signal. It's more important to keep up on news, tools and techniques on the front lines.

My opinion is managing \aaS needs solid rack&stack ops experience at scale. For IT folks: either start huge if you can or start small and move towards huge. The biggest challenges are customer communication, managing expectations, anticipating and planning where possible and showing empathy... it's a 5% technical skill and 95% getting along productively game. Communicate the business impact, don't overwhelm customers with unnecessary* detail.

I really appreciate this part:

  Some of these firms realize they need to renovate their IT 
  but they attempt to do this through hiring “Digital 
  Transformation” roles which do little but frustrate the 
  revolving door of hires because they have no executive buy- 
  in and budget.
This kind of thing makes it easier for me to hire. My org swallowed the risk and moved our production infrastructure entirely to k8s. I've no expectation that I'm going to be able to hire engineers with years of experience with that, but my team has already cultivated those skills. Aside from all the benefits and pains of being on k8s has, it makes it easy to attract great engineers with lots of related experience looking to leave a company where they've heard they need to adopt something and fail to put the resources behind it.

This is an age-old problem in IT. Some new technology and/or fad comes along, and bunches of organizations suddenly want an expert in it.

It's impossible to find enough experienced in the new field because it's so new and/or growing too fast. But the HR department or technically-inexperienced managers don't understand this and complain to politicians et. al. of a "shortage". (Those in know have seen the pattern multiple times.)

The answer is simple: find a similar field and accept a learning curve for the employee to transition. If you need something quick, hire a consultant to both move the project along and to mentor the new employee.

The real reason is that coding is hard and coders are hard to find and certifications are just an acknowledgment that you have domain knowledge.

I don’t care if you know what an auto scaling group is. I care that you can write code that automates builds and deployments. The AWS part is easy and you can learn what you need to know on the job. In my experience, coding is something you either can do or you can’t, and if you come to me with a cert and no programming experience it’s very hard for me to know what you’re capable of.

i spend like 5% of an interview asking about AWS stuff and like 50% talking about code and automation.

>you can learn what you need to know on the job

Maybe I've had bad luck, but I've encountered very few prospective employers who were OK with this. The ones I encountered expected to hire someone who would fit in like a jigsaw puzzle piece and they wouldn't settle for anything less. My confidence in my ability to pick up almost any tech quickly did not impress them, and they latched on to my lack of that particular bullet point and kept hammering on things like "oh, you don't know what a [platform-specific term for auto-scaling groups] is?..."

Same. And I spent 10 years at a company where my job, literally, was new small projects, every 6-12 months, often radically different platforms and technologies, where I had to "come up to speed". (has it's plusses and minuses - you learn a lot of breadth, not a lot of depth). But interviewing for me, has been a very frustrating experience. When I did land a spot, they were desperate, as it turns out. And in interviewing and making hiring decisions for my team; I've had to make these same types of judgment calls. I look for candidates who have enough experience that I know they're not bullshitting. And I look for a work history that shows a progression of continually taking on new skills and roles. Because that's what this job really is. Anyone who's been doing this in the real world for at least 10 years knows this. But it's really puzzling to me how so many hiring managers at so many places just don't seem to understand this.

The same thing happened to me. Recently, one company put me through three Python challenges. One on the phone, and then later two in person. One was on the whiteboard, with no "hello" or questions about who I was, starting immediately as I entered the room. I completed his assignment (and in an object-oriented way). Later he said, "We're looking for someone who can come up to speed quickly." And later HR said, "Feedback was very mixed, but unfortunately you don't come up to speed quickly."

I answered all the questions, have a proven track record, am self-taught for 15 years with increasingly high salary / rates, and yet for this one guy I was incapable of learning quickly. This was determined in a 30 minute whiteboard interview I passed.

Some of the interviewers have extremely low sociability on the psychological scale, even if they're tech geniuses (which isn't even evident). Everyone talks about the skillsets, S-curves and so on, but no one talks about the mediocre evaluation process by those who aren't good at it.

“server admin is very hard, and mostly gone with virtualization.“ I would say this is simply not true. For many companies, getting good admins and infrastructure experts this is extremely hard, at least in big corporate.

Completely agree. Server admins are still super important. Not everything fits in a nice server-less box. Developers also tend to have issues with containers, and firewalls, and monitoring/troubleshooting (depending on their skills and interest). People whom enjoy the server admin side can still find rewarding work in a more serverless environment

Most developers I know don't understand containers well... Or they understand containers but not the underlying implications. "What does yum/apt upgrade have to do with node? Why should I run that?".. hell developers typically need a hard-* server guy to remind them to run package updates/upgrades on their languages package manager.. I'm not much of a developer but I end up doing a lot of the Npm and pip updates.

I'm guessing that another reason is that some of the existing IT personnel are resisting the change, and one of their methods is to give HR unrealistic requirements so no one will ever qualify.

I read this as "develop and sell a niche tool that's useful in cloud ops or transformations, and market it like hell"

Really interesting article. This line was funny though: "can remember the differences between a stack and a heap".

> QA has been gutted by TDD

Wow. Just wow.

It is probably fair to say that CI/CD have gutted QA though.

It is probably fair to say that managers have gutted QA.

Not so sure Docker is the only way forward when it comes to cloud scaling or deployment. VM templates are a very good alternative. They are more stable, more flexible/customizable and integrate more smoothly with CI.

Docker, cloud hosted VM from templates, vagrant, Azure, AWS, it doesn’t really matter much to me. The important thing to me is that I no longer have any need for anyone else doing company wide infrastructure. We have several department heads engaged in a battle about who’s going to “own” our “cloud infrastructure” they seem Oblivious to the fact that the only thing we need from them is to negotiate an Azure or AWS subscription and after that they will loose any utility.

A lot of talented people are going to find them self in a problematic situation because the area in which their talents lie is only going to be handled by low wage jobs at google, Azure and Amazon. Even big companies won’t bother setting up their own hardware because the people are costly even if they have slight savings on private hardware.

> they seem Oblivious to the fact that the only thing we need from them is to negotiate an Azure or AWS subscription and after that they will loose any utility.

I've seen similar and they tried the following:-

- We'll provision VMs for you. Raise a ticket.

- We're doing "Hub & Spoke". You're not allowed to route any internet traffic except through our inspection proxies.

- We've disabled the API. You can only use the Console.

Basically, a couple of old school guys will do anything they can to disable automation, as otherwise they'll be accepting they can't really contribute anymore.

The old-school guys also think (rightly in some cases) that they have an added value. 10 years ago I was building a cloud platform and explaining to the security team that they would no longer receive tickets to manually configure routes on firewalls, the customers would do it from a console. I thought they’d be happy to be relieved of a menial, boring task but their reaction was “when we receive a ticket requesting to open all ports from any IP address, we can explain to the customer that it’s a dangerous idea. If they can configure it themselves, who will tell them?”

I lived through a "empower the developers with Devops", "prevent you doing menial tasks" project a while back and ended up with:

- A mail server which was an open relay, promptly shutdown for abuse

- Every single internal server on an external IP address with an allow any/any ACL

- Brand new environments built with PHP 5.0 in 2018 to run new development projects (EOL over ten years ago)

- Managers patting themselves on the back about the power of Devops

I'll second this. My company does everything via AWS, and the personnel overhead for X00,000 users and regular major updates to everything is... maybe half of a full-time position, and it will be less than that when we finish overhauling the hard-to-scale legacy parts of our system.

I have the feeling that you have literally no idea what you are talking about. Like at all. This usually comes from self entitles semi-decent full stack developers building same old crappy systems that break apart as soon as they get any decent usage.

Guys not all of you are the mythical 10x engineers working on the ground breaking stuff. Deal with it.

We have had most of these technologies way before they became commoditized. What we have done is that we have made them cheaper and more accessible to your average joe.

Containers ? Give me a break. We have had Solaris Zones provisioning mechanisms at large telcos before any of you even knew what a container is. People have been provisioning jails/zones with a click of a button for ages.

It's funny to me because just 10y ago people like you were yelling and screaming about losing jobs to offshore developers in India and Eastern Europe. There's no apocalypse anytime soon.

Things are getting revamped, they are better, faster and what's more important they are accessible. Just because you know how to use docker does not mean you are able to manage a production ready infrastructure. AWS or any of the big providers are not a silver bullet. Never will be. They are at the end of the day very costly services not suitable for all business.

Cloud doesn't have to be actually better, cheaper, faster, etc, to consolidate and reduce ops, system admin and network jobs. It just has to be highly popular.

So the jobs change. They move into "Developer" category where we will hire Developer with domain knowledge of "networks", "systems" or "operations".

I have seen it many times with big data, chatops (we wont ever need to login to the system! we will do everything via hipchat/slack/whatever), openstack (who even needs aws?!).

Yadda yadda.

The work to manage infrastructure is smaller if most of it is centralized at Amazon or similar. Yes, there's still client side work, but there's an overall shift, consolidate, reduce, pattern.

> low wage jobs at google, Azure and Amazon

I have no idea what makes you say that. As far as I'm familiar with these roles, none of them are low paying. These companies tend to pay very well because of the amazing scalability involved in these roles. Any engineering work that scales linearly with the number of users is automated almost immediately and the focus is generally on very high-level work.

When you go from 100 corporations. Each having infrastructure teams of 50+ people to one provider running the same thing with 20 people from that pool the wages go down. While there certainly are well paid individuals behind Azure or AWS, they are rare compared to “the people on the floor”, and those people used to be able to rise to senior specialist in companies before, now they just order services from the big players.

Now don’t get me wrong there will always be avenues for the most talented players, but the crisis will come when the 45 out of the 50 strong infrastructure team at every larger corporation are no longer needed, and the last 5 will end up doing work that’s completely unrelated to their prior expertise.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact