Hacker News new | past | comments | ask | show | jobs | submit login
The many lies about reducing complexity part 2: Cloud (rna.nl)
224 points by rapnie on Jan 10, 2021 | hide | past | favorite | 125 comments



This shared responsibility principle that underlies cloud marketing speak sounds a lot like the self-driving mess we find ourselves in today - I.e. the responsibility boundary between parties exists in a fog of war and results in more exceptions than if one or the other were totally responsible.

We have been a customer of Amazon AWS for ~6 years now, and we still really only use ~3 of their products: EC2, Route53 and S3. I.e. the actual compute/memory/storage/network capacity, and the mapping of the outside world to it. Because we are a software company, we write most of our own software. There is no value to our customers in us stringing together a pile of someone else's products, especially in a way that we cannot guarantee will be sustainable for >5 years. We cannot afford to constantly rework completed product installations.

We strongly feel that any deeper buy-in with 3rd party technology vendors would compromise our agility and put us at their total mercy. Where we are currently positioned in the lock-in game, we could pull the ripcord and be sitting in a private datacenter within a week. All we need to do is move VMs, domain registrations and DNS nameservers if we want to kill the AWS bill.

I feel for those who are up to their eyeballs in cloud infrastructure. Perhaps you made your own bed, but you shouldn't have to suffer in it. These are very complex decisions. We didn't get it right at first either. Maybe consider pleading with your executive management for mercy now. Perhaps you get a shot at a complete redo before it all comes crashing down. We certainly did. It's amazing what can happen if you have the guts to own up to bad choices and start an honest conversation.

I would also be interested to hear the other side of the coin. Who out there is using 20+ AWS/Azure/GCP products to back a single business app and is having a fantastic time of it?


I've worked with a number of teams over the last few years who use AWS and I'd say from top to bottom they all build their strategy more or less the same way:

0. Whatever is the minimum needed to get a VPC stood up.

1. EC2 as 90%+ of whatever they're doing

2. S3 for storing lots of stuff and/or crossing VPC boundaries for data ingress/egress (like seriously, S3 seems to be used more as an alternative to SFTP than for anything else). This makes up usually the rest of the thinking.

3. Maybe one other technology that's usually from the set of {Lambda, Batch, Redshift, SQS} but rarely any combination of two or more of those.

And that's it. I know there are teams that go all in. But for the dozen or teams I've personally interacted with this is it. The rest of the stack is usually something stuffed into an EC2 instance instead of using an AWS version and it comes down to one thing: the difficulties in estimating pricing for those pieces. EC2 instances are drop-dead simple to price estimate forward 6 months, 12 months or longer.

Amazon is probably leaving billions on the table every year because nobody can figure out how to price things so their department can make their yearly budget requests. The one time somebody tries to use some managed service that goes overbudget by 3000%, and the after action figures out that it would have been within the budget by using <open source technology> in EC2, they just do that instead -- even though it increases the staff cost and maintenance complexity.

In fact just this past week a team was looking at using SageMaker in an effort to go all "cloud native", took one look at the pricing sheet and noped right back to Jupyter and scikit_learn in a few EC2 instances.

An entire different group I'm working with is evaluating cloud management tools and most of them just simplify provisioning EC2 instances and tracking instance costs. They really don't do much for tracking costs from almost any of the other services.


+1

I bet cloud providers are incentivized not to provide detailed billing/usage stats. I remember having to use a 3rd party service to analyze our S3 usage.

Infinite scalability is also a curse - we had a case where pruning history from an S3 bucket was failing for months and we didn’t know until the storage bill became significant enough to notice. I guess in some ways it is better than being woken up in the middle of night but we wasted millions storing useless data

Azure also has similar issues - deleting a VM sometimes doesn’t cleanup dependent resources and it is a mess to find and delete later - only because the dependent resources are deliberately not named with a matching tag.


> Infinite scalability is also a curse

People don't like to admit it, but in many circumstances, having a service that is escalating to 10x or 100x its normal demand go off line is probably the desirable thing.


There seem to be plenty of successful online businesses that share some information about their back end infrastructure, and when you look it turns out they just have a handful of web servers running their app behind a load balancer, enough DB servers for the throughput and redundancy they need and some sort of failover arrangements in case something dies, and some sort of cache and/or CDN arrangements just for efficiency. Even if they're running in the cloud, it's probably just a load of manually allocated EC2 instances, maybe RDS to save the hassle of managing the database manually, and maybe S3.

I wonder how many businesses exist in the entire world that truly need to scale their server counts up or down by a factor of several so quickly that it has to be done automatically and not as a result of, say, daily human reviews of what's happening. I feel like even large businesses that need to run thousands of servers at all are probably relatively rare, as indeed large businesses themselves are. The number that might realistically need to scale from say 1,000 web servers to 1,500 within a matter of hours must be even rarer. But I have nothing even resembling a useful data source to tell whether this intuition is correct.


This seems like you didn't have proper monitoring and alerting set up for your job, not sure how that is a downside of AWS.


> Infinite scalability is also a curse

This was the key sentence, I think. This type of problem actually shows up in other domains as well, queueing theory comes immediately to mind. Even the halting problem is only a problem with infinite tape, and becomes easier with (known?) limited resources.

When you have some parameter that is unbounded you need to add extra checks to bound them yourself to some sane value. You are right, in that the parent failed to monitor some infrastructure, but if they were in their own datacenter, once they filled their NAS, I’m positive someone would have noticed, if only because other checks, like diskspace are less likely to be forgotten.

Also, getting a huge surprise bill is a downside of any option, and the risk needs to be factored into the cost. I’m constantly paranoid when working in a cloud environment, even doing something as trivial as a directory listing from the command line on S3 costs money. I had a back and forths with AWS support just to be clear what the order of magnitude of the bill would be for a simple cleanup action since there were 2 documented ways to do what I needed, and one appeared to be easier, yet significantly more expensive.


AWS monitoring (and billing) is garbage because they make an extraordinary amount of money on unintentional spend.

"But look at how many monitoring solutions they have in the dashboard! Why, just last re:invent they announced 20 new monitoring features!"

They make a big fuss and show about improving monitoring but it's always crippled in some way that makes it easy to get wrong and time-consuming or expensive to get right.


I’m genuinely curious. Which parts of monitoring is crippled or difficult to use?

Disclaimer: I work at AWS.


I'm not very familiar with AWS or The Cloud, but I'm having trouble understanding what you said about Amazon leaving money on the table by not directing customers toward specific-purpose services as opposed to EC2?

Wouldn't (for AWS to make a profit anyway) whatever managed service have to be cheaper than some equivalent service running on an EC2 VM?

I get the concerns re: pricing and predictability, but it still seems like more $$$ for AWS.


No, usually the managed services are a premium over the bare hardware. When you use RDS for example, you’re paying for the compute resources but also paying for the extra functionality they provide and their management and maintenance they’re doing for you. You can run your own Postgres database, or you can pay the premium for Aurora on RDS and get a multi-region setup with point in time restore and one-click scaling and automatically managed storage size and automatic patching and monitoring integrated into AWS monitoring tools and...

They’re leaving money in the table because instead of using “Amazon Managed $X” potentially at a premium or paying a similar amount but in a way where AWS can provide the service with fewer compute resources than you or I would need because of their scale and thus more profitably, people look and see they’ll be paying $0.10/1000 requests and $0.05/1gb of data processed in a query and $0.10/gb for bandwidth for any transfer that leaves the region and... people just give up and go “I have no idea what that will cost or whether I can afford it, but this EC2 instance is $150/mo, I can afford that.”


It's not just about a straight cost comparison. It's about how organizational decision-making works.

The people shopping for products are not spending their own money, but they are spending their own time and energy. The people approving budgets are not considering all possible alternatives, they are only considering the ones that have been presented to them by the people doing the shopping.

If the shoppers decide that an option will cost them too much in time and irritation, then it may be that the people holding the purse-strings are never even made aware that it exists. Even if it is the cheapest option.


This is a really good summary of the situation, and I'd add a bit about risk:

It's relativity easy to estimate EC2 costs for running some random service, because it's literally just a per-hour fee times number of instances. If you're wrong, the bigger instance size or more instances isn't that much more expensive.

For almost every other service, you have to estimate some other much more detailed metric: number of http requests, bytes per message, etc. When you haven't yet written the software, those details can be very fuzzy, making the whole estimation process extremely risky - it could be cheaper than EC2, it could be 10x more, and we won't really know until we've spent at least a coulple months writing code. And let's hope we don't pivot or have our customers do anything in a way we're not expecting..


Yeah good question. Sibling comments to this one explain it well, but basically AWS managed services come at a premium price over some equivalent running in just EC2. (Some services in fact do charge you for EC2 time + the service + storage etc.)

"Managed" usually means "pay us more in exchange for less work on your part". This is usually pitched as a way to reduce admin/infrastructure/devops type staff and the overhead that goes along with having those people on the payroll.


For example:

Managed Airflow Scheduler on AWS with "large" size costs $0.99/hour, or $8,672/year per instance. That's ~ $17,500 considering Airflow for at least non-prod and prod instances.

Building it on your own on same size EC2 instance would cost $3,363/year for the EC2. Times two for two environments, let's say $6,700. $4,000 if you prepay the instance.

That looks way cheaper, but then you have to do the engineering and the operational support yourself.

If you consider just the engineering and assume engineer costs $50/hour and estimate this to initial three weeks of work and then 2.5 days / month for support (upgrades, tuning, ...) that's extra $4,000 upfront and $1,000/month.

So on AWS you're at $17,500/year and on-prem you're at best $20,000 first year and $16,000 next years.

So the AWS only comes a bit more expensive - but the math is tricky on several parts:

- maybe you need 4 environments deployed instead of 2, which is more for AWS but not much more for engineering?

- maybe there's less sustaining cost because you're ok with upgrading Airflow only once a quarter?

- you probably already pay the engineers, so it's not an extra money cost, it's extra cost of them not working on other stuff - different boxes and budgets

- maybe you're in part of a world where good devops engineer doesn't cost $50/hour but $15 hour

- I'm ignoring cost of operational support, which can be a lot for on-prem if you need 24/7

- maybe you need 12+ Airflow instances thanks to your fragmented / federated IT and can share the engineering cost

- etc, etc.

So I think what OP was saying is that if AWS priced Managed Airflow at $0.5 per hour, it would be no brainer to use instead of build your own. The way it is, some customers will surely for their own Airflow instead, because the math favors it.

Does that make sense?


> Managed Airflow Scheduler

Management decided get rid of all the on-prem, capex-heavy HVAC systems (that are depreciating as-we-speak!) for budget-friendly cloud thermostats? Maybe they can send you the climate-controlled blown air on your office's Amazon Day. ;)


> That looks way cheaper, but then you have to do the engineering and the operational support yourself.

In my experience, this is the piece that engineers rarely realize and that is actually one of the biggest factors in evaluating cloud providers vs. home-rolled. Especially if you're a small company, engineering time (really any employee time) is _insanely valuable_. Valuable such that even if Airflow is cash-expensive, if using it allows your engineers to focus on building whatever makes _your business successful_, it is usually a much better idea to just use Airflow and keep moving. Clients usually will not care about whether you implemented your own version of an AWS product (unless that's your company's specific business). Clients will care about the features you ship. If you spent a ton of time re-inventing Airflow to save some cost, but then go bankrupt before you ever ship, rolling your own Airflow implementation clearly didn't save you anything.


We used to have on-prem redis and a devops engineer to manage it, then we moved to redis in the cloud and had a devops engineer to manage it.

Saying that in the cloud you don't need engineers to manage "operational support" is the biggest lie the cloud managed to sell.


If you just run redis on virtual server in cloud, you're not replacing the devops engineer who manages redis. You're replacing:

- the network engineers who manage switches, firewalls, routers, nats and connectivity

- the people who manage on-prem hardware, install new servers, replace failing servers, and switch broken disks for working ones and install base os using some ilom

- the whole process for ordering that hardware and getting it delivered on time - including knowing when it's going to be necessary

- if your on-prem had virtual servers, the people who manage and operate vmware

- if your on-prem had SAN, the people who manage and operate that SAN (and buy new disks and take care of capacity planning)

Some of those things you still have to do - for example configure firewall, or say "I want to take a snapshot of this virtual disk drive" but instead of doing a difficult technical thing, you can do it in web ui - but you still need to do what to do.

And of course, if you never had SAN and virtual servers and two data center with your team managing the interconnect between private networks, there's a lot of stuff that the Cloud could give you that you probably don't need.

Now if you move to managed redis, you're also replacing the person who installs and patches the linux the redis runs on, and the one who installs, backups and configures the redis. And you get the redis on button click, so if you suddenly need three more, you're also replacing the person who automates building redises.

You are right, that it is not that the operational support is just gone. Some of it is gone. Some of it is replace by doing cloud stuff (like thinking of buying reserved instances instead of actual physical servers). Some of it is just more efficient.

Now if any of this doesn't fit you're use case because you have too small or too large scale, then the Cloud is of course a bad idea.


You've just wrapped a very specific definition of "Operational support" into a very vague term.

From another post in this thread: If you want to deploy distributed stream processing like Apache Kafka, but do not want to roll it yourself, you can use a tool like AWS Kinesis or Azure Event Hubs.

You still need "Operational Support" to manage EH/Kinesis, but generally it's closer to the type of "Operational Support" that can be provided by a general backend software engineer, as opposed to a DevOps/Infrastructure-specific dev. By using a Cloud (Managed) service, you're removing the need to manage:

* Uptime management * Scaling * Custom-multi-AZ

And probably a lot more. Sure, you've still have to actually handle events appropriately, but you have to do that either way.


I agree.

The only caveat is that this goes for founders or engineers who are financially tied with the company success. If the engineer just collects paycheck, they might prioritize fun - and I feel that might be behind a lot of the "reinventing the wheel" efforts you see in the industry.

Or maybe I'm just cynical.


Especially if you're a small company, engineering time (really any employee time) is _insanely valuable_.

This is true, but it is balanced by the fact that uncertainty can be insanely expensive. And diving into complicated cloud infrastructure with a small business, if you're not already an expert on it, is a very uncertain endeavour in terms of whether you'll get everything set up right (and not find out otherwise at 3am when it turns out your redundancy wasn't, for example) and what everything will cost. By the time you have either become an expert yourself or hired someone who already is, your costs have already increased significantly too, for exactly the reason you've just stated yourself.


> And diving into complicated cloud infrastructure with a small business, if you're not already an expert on it, is a very uncertain endeavour in terms of whether you'll get everything set up right

I do not agree. The entire point of using cloud offerings as opposed to rolling your own, is cloud offerings are usually several orders of magnitude easier to configure. Using Event Hub, as an example, means that you're getting a similar experience to Apache Kafka, but without having to scale/configure everything yourself.

Sure, you have to become proficient with Event Hub, but becoming proficient with EH is probably 1/100th of the difficulty as becoming proficient enough in Kafka to support the same scalability/workload.


All I can say is that this hasn't been my experience. Setting up a single server is much the same whether it's on-prem or at a colo facility or some VM in the cloud, but the amount of added complexity to set up non-trivial networking and security and redundancy/failover and backups and all that stuff in the cloud is far more complicated -- if you don't already know how to do it -- than just setting up a few servers directly. The only exceptions IME tend to be the most basic and essential services, like effectively infinitely scalable storage and managed databases. Not coincidentally, I suspect, these are the kinds of cloud services that almost everyone uses, and often (as this very HN discussion demonstrates) the only ones.

There is still considerable value in the on-demand nature of cloud hardware, instead of waiting for stuff like physically assembling new servers and then shipping them and then installing them in the rack all before you can even start setting up the software, but IME the simplicity emperor has no clothes. Just look at the number of anecdotes about even quite large and well-established businesses that have had systems go down because they didn't fully understand AZs and failover arrangements, or have been landed with some crazy high bill because they didn't fully understand all the different things they were going to be charged for, or have been breached because they left some S3 bucket open.


Watching your discussion I think the truth might be somewhere in the middle - the Cloud can do things for you but you still need to learn how to use it.

So once you know your way around, it can be a force multiplier but learning cloud can be as much work as learning how to do it on your own.

Or said from a different angle, it help you outsource (part os) operations but it does not actually help you to engineer and architect things correctly, and you can shoot yourself in the foot pretty easily.

Personally, I think the cloud is most transformative for large companies with dynosaur internal IT, where it brings a lot of engineer self-service into the picture - where I work I can have RDS in minutes, or internally provisioned on-prem Oracle in a week, and the Oracle ends up being more expensive because 24/7 support has to be ordered from a specific list of vendors ... but that's not going to be the case in agile company with strong engineering culture.


I suspect much or all of this is true, including the observation about large companies with dinosaur IT. The comment I originally replied to was specifically talking about the environment in small companies, which is the environment I'm most familiar with, and my comments should be read in that context.

In my own businesses or the clients we work with, we too could add RDS quickly for something we were hosting on AWS. On the other hand, we could also spin up a pair of Postgres instances and configure streaming replication quickly if we were running on-prem or colo. Each approach has its pros and cons, and we'd look at each situation on its merits and choose accordingly. But IME, it's more like choosing from a restaurant menu depending on what looks most appealing at the time. The way some people talk about cloud, it's like they see it as choosing between a gourmet restaurant with a Michelin-starred team and the local burger van.


> but the amount of added complexity to set up non-trivial networking and security and redundancy/failover and backups and all that stuff in the cloud is far more complicated

This complexity exists either way, is my point. Whether you're managing your own servers, or using barebones cloud VMs, or using a bunch of cloud fanciness, the complexity you just defined still exists. And if that complexity is a constant, why is it only being used as a negative against cloud services?

> Just look at the number of anecdotes about even quite large and well-established businesses that have had systems go down because they didn't fully understand AZs and failover arrangements, or have been landed with some crazy high bill because they didn't fully understand all the different things they were going to be charged for, or have been breached because they left some S3 bucket open.

If your argument is "It's not better when done badly", definitely, I agree, because what is?

I guess, my overall point is that cloud-based infrastructure shifts your focus. Yes, you have to know how to configure cloud resources, but in 2021, do you think it's easier to find people with AWS experience, or people with custom in-house or colo server management experience?


The thing is, I don't think the complexity is even close to the same in the two cases.

AWS and similar services are an abstraction over hardware, software and networking all at once. There are well over 100 different services available on AWS alone. Just to get a basic configuration up and running, someone new to the system has to figure out which of those services they actually need, which is a barrier in itself given the obfuscated names they have.

Then you have much the same network and security details to set up as you would have for on-prem or colo infrastructure, but now with lots of non-standard terminology and proprietary UIs, which are so cumbersome at times that an entire generation of overcomplicated "orchestration" tools has been developed, each of which typically adds yet another layer of leaky abstraction.

Hopefully some time before this all happened you tried to work out what it was going to cost, and maybe you were close or maybe you are in for a nasty surprise because those handy managed services cost several times what the equivalent server + software would have cost either running on real servers or just on cloud VMs.

And if you fall back to that latter case as your safe default, you still get all the same issues to deal with as you would have had on your own servers and network, except that now you need to figure out what is really going on behind all those virtualised systems and quasi-geographical organisation layers before you can tell whether one unfortunate event could take down all the instances of any vital services you need.

In comparison, literally every small business I have ever worked in as a tech worker has had several people at the office who were perfectly capable of buying a switch or firewall or router and spending the few minutes required to configure it or buying a server and installing Linux/Windows and then whatever server software it needed again very quickly. Cloud systems can make it faster to deploy new hardware and connectivity, because you save the time required for all the physical steps, but after that the time and knowledge required to get a small network's worth of equipment up and running really isn't that great. After all, we used to do that all the time until the cloud hype to hold, and it's not as if that has suddenly stopped working or all the people with that knowledge suddenly left the industry in the past 5 years.


> The thing is, I don't think the complexity is even close to the same in the two cases.

Agreed (but probably on the opposite end as you)

It seems a lot like you've been scorned in the past and that's driving a lot of your statements now (which is totally fine and fair). I'm trying to bring up that, for every problem you've just defined, the literal exact same problem exists for colo/managed servers, except it is now also your problem to keep the lights on and the machine running.

> literally every small business I have ever worked in as a tech worker has had several people at the office who were perfectly capable of buying a switch or firewall or router and spending the few minutes required to configure it or buying a server and installing Linux/Windows and then whatever server software it needed again very quickly.

I'm sorry, if you believe that building and deploying production-ready server infrastructure is as easy as "Just going out and buying a switch and spending a few MINUTES installing linux" (emphasis mine) - I feel like we aren't talking about the same thing at all. Not even close.


Not scorned, just a little bored of being told by advocates how the cloud will do wonders for my businesses or my clients and then seeing the end results not live up to the hype.

I'm not saying there are no benefits to cloud deployment. It does have some clear advantages, and I've repeatedly cited the rapid deployment of the hardware and connectivity in this very discussion, for example. It hasn't come up much so far in the parts of the discussion I've been in, but I would never claim there is no-one with a good use for the more niche services among the hundreds available from the likes of AWS, either.

However, I mostly operate in the world of smaller businesses, and in this world simplicity is king when it comes to infrastructure. We are interested in deploying our software so people can run it, whether that's for a client's internal use or something that's running online and publicly accessible. Setting up a new server is mostly a case of installing the required OS and hosting tools, and then our software will take over (and that work would be essentially the same wherever it is hosted), once you have the hardware itself installed and connected. Configuring a new office network is something you'd probably do in a day, again once the physical aspects have been completed. You slightly mangled the timescales I actually suggested in your quote, BTW.

These systems are often maintained by a small number of team members who have the relevant knowledge as a sideline to their day jobs. And this approach has been working for decades, and continues to work fine today. Perhaps I have just never met the bogeyman where the operational requirements to maintain the IT infrastructure for a small business (say up to 50 people) are somehow unmanageable by normal people with readily available skills in a reasonable amount of time, so the arguments about somehow radically improving efficiency by outsourcing those aspects to cloud services have never resonated much with me. It's a big world, and of course YMMV.


+1, but with a container tool (Fargate/ECS, Azure Container Instances) instead of EC2.


I recently inherited a product that was developed from the ground up on AWS. It's been a real eye opener.

Yes, it absolutely is locked in, and will never run on anything but AWS. That doesn't surprise me. What surprises me is all of the unnecessary complexity. It's one big Rube Goldberg contraption, stringing together different AWS products, with a great deal of "tool in search of problem" syndrome for good measure. I am pretty sure that, in at least a few spots, the glue code used to plug into Amazon XYZ amounted to a greater development and maintenance burden than a homegrown module for solving the same problem would have been.

NIH syndrome is certainly not any fun. But IH syndrome seems to be no better.


>I would also be interested to hear the other side of the coin. Who out there is using 20+ AWS/Azure/GCP products to back a single business app and is having a fantastic time of it?

Netflix uses a lot of AWS higher-level services beyond the basics of EC2 + S3. Netflix definitely doesn't restrict its use of AWS to only be a "dumb data center". Across various tech presentations by Netflix engineers, I count at least 17 AWS services they use.

+ EC2, S3, RDS, DynamoDB, EMR, ELB, Redshift, Lambda, Kinesis, VPC, Route 53, CloudTrail, CloudWatch, SQS, SES, ECS, SimpleDB, <probably many more>.

I think we can assume they use 20+ AWS services.


Certain services IMHO have to be discounted from this list:

- VPC - basic building block for any AWS-based infra that isn't ancient

- CloudTrail - only way to get audit logs out of AWS, no matter what you feed them into

- CloudWatch - similar with CloudTrail, many things (but not all) will log to CloudWatch, and if you use your own log infra you'll have to pull from it. Also necessary for metrics.

- ELB/ELBv2/NLB/ALB - for many reasons they are often the only ways to pull traffic to your services deployed on AWS. Yes, you can sometimes do it another way around, but you have high chances of feeling the pain.

My personal typical set for AWS is EC2, RDS, all the VPC/ELB/NLB/ALB stack, Route53, CloudTrail + CloudWatch. S3 and RDS as needed, as both are easily moved elsewhere.


I don't think you can discount them like that. Maybe they aren't as front of mind as services like S3, EC2, etc, but if you were to try to rebuild your setup in a personal data center, replacing the capabilities of VPC, IAM, CloudTrail, NAT gateways, ELBs, KMS etc would be a huge effort on your part. The fact that they are "basic building blocks" makes them more important, not less. In a discussion about the complexity of cloud providers versus other setups, that seems especially relevant.


Oh, I meant it more in terms of "can you count on them as optional services".

Because they aren't optional, and yes, it takes non trivial amount to replicate them... but funnily enough, several of them have to be replicated elsewhere too.

NAT gateways usually aren't an issue, KMS for many places can be done relatively quickly with Hashicorp Vault.

IAM is a weird case, because unless you're building a cloud for others to use it's not necessarily that important, meanwhile your own authorization framework is necessary even on AWS because you can't just piggy back on IAM (I wish I could).


I mostly agree, although ECS with Fargate is often nicer to use than EC2


So your company cautiously chooses which services in AWS to use, and sticks to infrastructure offerings for now. Netflix called it "paved path", and it worked really well too for Netflix. Over the years, though, the "paved path" expanded and extended to more services. It's worth noting that EC2 alone is a huge productivity booster, bar none. Nothing beats setting up a cluster of machines, with a few clicks, that auto scales per dynamic scaling policies. In contrast, Uber couldn't do this for at least 5 years, and their docker-based cluster system is a crippled for not supporting the damn persistent volumes. God knows how much productivity was lost because of the bogus reasons Uber had for not going to cloud.


We carefully select and use PaaS and managed cloud services to construct our infrastructure with. This allows us to maximize our focus on what our customers are paying for: creating software for them which will typically be in use for 5+ years. We spend close to zero time on infrastructure maintenance and management, we pay others to do this for us, cheaper and more reliable. Having to swap out one service for another hasn't given us any trouble or unreasonable costs yet in the past 5 years. Unlike the article is trying to convince us of, it has massively reduced complexity for us.


> There is no value to our customers in us stringing together a pile of someone else's products

Maybe not your business, but there are many businesses in which this is exactly what happens. Any managed-service is just combining other people's work into a "product" that gets sold to customers. And that's great! AWS has a staggering amount of products, and lots of business don't even want to have to care about AWS.

> Who out there is using 20+ AWS/Azure/GCP products to back a single business app and is having a fantastic time of it?

Several times. I think cloud products are just tools to get you further along in your business. Most of the tools I use are distributed systems tools, because I don't want to have to own them, and container runtimes/datastores. Every single thing I've ever deployed across AWS/Azure is used as a generic interface that could be replaced relatively easily if necessary, and I've used Terraform to manage my infrastructure creation/deployment process, so that I can swap resources in and out without having to change tech.

If, for some reason, Azure Event Hub stopped providing what we needed it for, we could certainly deploy a customized Kafka implementation and have the rest of our code not really know or care, but from when we set out to build our products, that has always been a "If we need to" problem, and we've never needed to.


It depends on your company. For new startups, I'd stick with delivering your product over building your own cloud services on top of EC2/S3.

At my last startup, the engineering head shared the Not-Invented-Here view. We couldn't use any handy services. We had to run our own Cassandra and everything else services. It was a huge time sink for a small team that didn't deliver differentiating value.

At my current startup, we're almost 100% serverless on SaaS providers. We operate a thousand or more nodes with no ops team (a couple engineers know ops, when needed). We leave the complexity of scaling an maintenance to our cloud vendor's autoscaling services. Sure, we could reduce our opex in places by trying to roll our own, but the ability to have the vast majority of engineers delivering customer product, rather than reinventing the wheel is more valuable to us than fear of vendor lock-in. If our cloud vendor wanted to jack rates, we'd take the necessary action, which we've done before when we banged a service in a week that a vendor was going to hold us over a barrel for, then dropped the vendor.


> The one time somebody tries to use some managed service that goes overbudget by 3000%, and the after action figures out that it would have been within the budget by using <open source technology> in EC2, they just do that instead

This impacts casual dabblers too. More than once I've seen HN comments on how someone is wary to experiment with cloud computing because a single screw-up can lead to an essentially unlimited bill. Judging by HackerNews anecdotes of when this does happen (unexpected overruns of thousands of dollars), there's a reasonable chance Amazon will refund you out of good will, but that's not enough to lay the fears to rest.

Linode let you pre-pay, for instance, but (to my knowledge) this isn't an option offered by Amazon, Microsoft, or Google.


You seem to have made up your mind, but for the benefit of others: yes, we use many AWS products and are having a fantastic time of it. AWS services are more reliable, less buggy and have more stable APIs than any of the alternatives. Specific services that made a difference to us are Fargate, Spot, Batch, SQS/SNS, Lambda, ECR, EFS, ELB/ACM, CloudWatch Logs, IAM, Aurora RDS, SSM/Secrets Manager, and Step Functions - in addition to EC2/VPC/EBS, Route53 and S3 that you already listed. Each of these services does a lot to free us up to do more value added domain-specific work.

Terraform has emerged as a key tool to manage AWS resources - so much so that it really adds a lot to the value prop of AWS itself. I can do stuff with Terraform that was only aspirational until it existed.

Personally, I wouldn't plan on using on-prem except for niche applications that involve heavy data streams from local hardware. In the time that I've spent with companies working on AWS, I've seen a number of other companies waste lots of time and resources on heterogeneous strategies while complaining about their AWS bill - which was high because someone got so fed up with IT dysfunction, they went and used AWS but left behind inefficiently configured resources that were on all the time. Cloud often ends up being an escape hatch for teams that are not adequately served by dogmatic IT departments.


The key glue to a lot of what you mentioned missing in your list to make a lot of this work for outside consumers: API Gateway, arguably their worst product.


What is so terrible about API gateway? Aside from the price, but they kind of solved that with HTTP gateway.


I agree, they really need to replace that product. It's not built to the standard that users expect from AWS. They have burned a lot of goodwill on that one.

That said, it's only truly necessary for Lambda and while it's frustrating and painful there, it's usually not a complete showstopper.


I'm in two minds about this (deeper integration with a particular vendor - i.e. "serverless")

Reduced time to market is incredibly valuable. Current client base is well in its millions. Ability to test to few and roll out to many instantly is invaluable. You no longer have to hire competent software developers who understand all patterns and practices to make scalable code and infrastructure. Just need them to work on a particular unit or function.

The thing which scares me is, some of these companies are decades of years old, hundreds. How long has AWS/GCP/Azure abstractions been around for? How quick are we to graveyard some of these platforms. Quite. A lot quicker than you can lift, shift and rewrite your solution to elsewhere.


It's always a trade-off though. You say you write most of your own software, but that's probably not true for, say your OS or programming language, or editors, or a million other things. Cloud software is the same; you might not be producing the most value if you spend your engineering hours (re)creating something you could buy.

In my own experience:

- AWS SNS and SQS are rock solid and provide excellent foundations for distributed systems. I know I would struggle to create the same level of reliability if I wrote my own publish-subscribe primitives and I've played enough with some of the open source alternatives to know they require operational costs that I don't want to pay.

- I use EC2 some of the time (e.g. when I need GPUs), but I prefer to use containers because they offer a superior solution for reproducible installation. I tend to use ECS because I don't want to take on the complexity of K8S and it offers me enough to have reliable, load-balanced services. ECS with Fargate is a great building block for many, run-of-the-mill services (e.g. no GPU, not crazy resource usages).

- Lambda is incredibly useful as glue between systems. I use Lambda to connect S3, SES, CloudWatch, and SQS to application code. I've also gone without Lambda on the SQS side and written my framework layers to dispatch messages to application code. This has advantages (e.g. finer-grain backoff control) but isn't worth it for smaller projects.

- Secrets manager is a nice foundational component. There are alternatives out there, but it integrates so well with ECS that I rarely consider them.

- RDS is terrific. In a past life, I spent time writing database failover logic and it was way too hard to get right consistently. I love not having to think about it. Plus encryption, backup, and monitoring are all batteries included.

- VPC networking is essential. I've seen too many setups that just use the default VPC and run an EC2 instance on a public IP. The horror.

- I've recently started to appreciate the value of Step Functions. When I write distributed systems, I tend to end up with a number of discrete components that each handle one part of a problem domain. This works, but creates understandability problems. I don't love writing Step Functions using a JSON grammar that isn't easy to test locally, but I find that the visibility they offer in terms of tracing a workflow is very nice.

- CloudFront isn't the best CDN, but it is often good enough. I tend to use it for frontend application hosting (along with S3, Route53, and ACM).

- CloudWatch is hard to avoid, though I rather dislike it. CloudWatch rules are useful for implementing cron-like triggers and detecting events in AWS systems, for example knowing whether EC2 failed to provision spot capacity.

- I have mixed feeling about DynamoDB as well. It offers a nice set of primitives and is often easier to starting use for small projects than something like RDS, but I rarely operate at the scales where it's a better solution than something like RDS PostgreSQL with all the terrific libraries and frameworks that work with it.

- At some scale, you want to segregate AWS resources across different accounts, usually with SSO and some level of automated provisioning. You can't escape IAM here and Control Tower is a pretty nice solution element as well.

I'm not sure if I'm up to 20 services yet, but it's probably close enough to answer your question. There are better and worse services out there, but you can get a lot of business value by making the right trade-offs, both because you get something that would be hard to build with the same level of reliability and security and because you can spend your time writing software that speaks more directly to product needs.

As for "having a fantastic time", YMMV. I am a huge fan of Terraform and tend to enjoy developing at that level. The solutions I've built provide building blocks for development teams who mostly don't have to think about the services.


Given the situation with Amazon Essentials, anyone who sells a product made substantially of stringing together AWS technology may find themselves cut out of the picture soon enough.

Many other companies have ended up competing with their own vendors. The only thing unique about Amazon is that they are boiling the frog much more slowly than previous generations of companies have.


I second that. It's not only that you make yourself completely intertwined with a Cloud by using more than fundamental services.

The costs of lambda or even DDB are IMMENSE. These only pay off for services that have a high return per request. I.e. if you get a lot of value out of lambda calls, sure, use them. But for anything high-frequency that earns you little to nothing on its own, forget about it.

Generally all your critical infrastructure should be Cloud independent. That narrows your choices largely to EC2, SQS, perhaps Kinesis, Rout53, and the like. And even there you should implement all your features with two clouds, i.e. Azure and AWS, just to be sure.

The good news is also the bad news. There are effectively only two options: Azure or AWS. Google Cloud is a joke. They arbitrary change their prices, terminate existing products, offer zero support. It's just like we have come to love Google. They just don't give a shit about customers. Google only cares about "architecture", i.e. how cool do I feel as engineer having built that service. Customer service is something that Google doesn't seem to understand. So think carefully if you want to buy into their "product". Google, literally, only develops products for their own benefit.


Do you have examples of Google Cloud arbitrarily changing prices and terminating products?

Sure they terminate consumer products, and there was a Maps price hike, but I'm not aware of anything that's part of Cloud.


IIRC they introduced a cluster management fee in GKE.



A very long time ago App engine went out of beta and there was a price hike leaving many scrambling. App engine was in beta so long that many people didn’t think that label meant anything.


Google Cloud has quite good support and professional services.

I’ve worked with them for 3 years and can’t think of any services that have been killed.

They are very customer focused. From my perspective as a partner cloud services are more built for customer use cases than Google internal use cases. GKE and Anthos for example.


I can't agree, at least not in general.

The optionality of being cloud agnostic comes with a huge cost, both because of all the pieces you have to build+operate and because of the functionality you have to exclude from your systems.

I am sure there are scales where you either have such a large engineering budget that you can ignore these costs or where decreasing your cloud spend is the only way to scale your business. But for the average company, I can't see how spending so much on infrastructure (and future optionality) pays off, especially when you could spend on product or marketing or anything else that has a more direct impact on your success.


> But for the average company, I can't see how spending so much on infrastructure (and future optionality) pays off, especially when you could spend on product or marketing or anything else that has a more direct impact on your success.

If you change "average company" to "average startup" then your point make sense. But for a normal company not everything needs to make a direct impact on your success. For example guaranteeing long term business continuity is an important factor too.


Unless you’re planning for the possibility of AWS dropping offline permanently with little to no notice, it really feels like you’re just paying a huge insurance premium. Like any insurance, it’s down to whether you need insurance or could cover the loss. Whether you’d rather incur a smaller ongoing cost to avoid the possibility of a large one time loss.

If AWS suddenly raised their prices 10x overnight, it would hurt but not be an existential threat for most companies. At that point they could invest six months or a year into migrating off of AWS.

Rough numbers that would end up costing us like $4m in cloud spend and staff if we retasked the entire org to accomplishing that for a year.

There’s certainly an opportunity cost as well, but I’d argue it’s not dissimilar to the opportunity cost we’d have been paying all along to maintain compatibility with multiple clouds.

Obviously it’s just conjecture, but my gut says the increased velocity of working on a single cloud and using existing Amazon services and tools where appropriate has made us significantly more than the costs of something that may never happen.


Strong agree.

Plus I've seen more than a few efforts at multi-cloud that resulted in a strong dependency on all clouds vs the ability to switch between them. So not only do you not get to use cloud-specific services, you don't really get any benefit in terms of decoupling.


I take your point, but I still don't quite agree.

There are obviously plenty of companies that are willing to couple themselves to a single cloud vendor (e.g. Netflix with AWS) and plenty of business continuity risks that companies don't find cost effective to pursue. Has anyone been as vocal about decoupling from CRM or ERB systems as they are with cloud?

My own view is that these kinds of infrastructure projects create as many risks as the solve and happen at least as much because engineers like to solve these kinds of problems than for any other reason.


> The optionality of being cloud agnostic comes with a huge cost, both because of all the pieces you have to build+operate

This sounds like cloud vendor kool aid to me. Nearly every cloud vendor product above the infrastructure layer is a version of something that exists already in the world. When you outsource management of that to your cloud vendor you might lose 50% of the need to operationally manage that product but about 50% of it is irreducible. You still need internal competence in understanding that infrastructure and one way or another you're going to develop it over time. But if its your cloud vendor's proprietary stack then you are investing all your internal learning into non-transferrable skills instead of ones that can be generalised.


I think it's economy of scale, not kool-aid.

I can run PostgreSQL myself but there's a ton that goes into running it with redundancy, backup, encryption, etc that takes deep expertise to well. I know from experience that it's easy to get database failover wrong. I could probably cobble something together, but it would be mediocre and wouldn't handle network partitions reliably. On the other hand, RDS is used at a scale well beyond what I could afford to build out and has benefitted from much more usage. Problems that are low probability for me are regular events for RDS and have that experience built in.

Ultimately, you are paying a cloud vendor for service value and operational experience. Some services aren't as good as others in both regards, but for the ones that are good, the overhead of doing it yourself is an exponent, not a fraction.


To be honest, I would probably fit RDS more into the infrastructure layer than the application layer. I think there is value in having things at that level be managed.


Did you look into multi-cloud solutions like Pulumi or Terraform to abstract your cloud vendor?


Don't fall for that trap.

I have never used Pulumi, I've used Terraform a bit. I like Terraform so this isn't a dig at the tools.

Abstracting your cloud provider is similar to abstracting your database. At a high-level they appear to be the same, they do similar things however the are very different when you get to the fine detail.

Pulmi / Terraform are useful in the provide a common language / API between cloud providers however you will never just switch cloud provider with changing a few lines of code.


I see a lot of mentions in the comments about just using the basic storage/networking/compute from AWS/AZ/GCP--if that's all you're using, you should really consider other providers. Linode, Digital Ocean, and Vultr will be far more competitive and offer faster machines, cheaper, and with better bandwidth pricing.

The point of using AWS/AZ/GCP is to leverage their managed service portfolio and be locked in. If you aren't doing that, there are better companies that want your business and will treat you much better.


There’s also packet (now equinix metal) that gives control over l2 and have nice things such as ibgp. I think vultr may do too but their docs are poor and support was uncooperative


IME, the network of AWS is much better than that of DO, Linode.


How so?


Less hiccups and downtime. It's faster and with better latency to other third party services. Superior internal control. Ex: In, linode a private IP address gives EVERYONE on the same data center access to your Linode server. Also, last time I used them they didn't have a Firewall.


Internally maybe, but AWS external connectivity is so shakey I wonder if it's intentional.

From a cisco ASA firewall to AWS vpg - there is weekly and even daily issues with things either plain timing out or experiencing latency spiking on AWS side. Cisco is not a small vendor or use case for creating a vpn bridge. We moved from this to cloudconnect - essentially a network connection directly to aws. Same instability issues only they moved to weekly/monthly instead of daily/weekly. This was from a provider that had fiber to AWS.

I will also say that when putting particular load on AWS's internal services from a single ec2 instance you can often see requests that fail (s3 being the one I see the most).

I can't speak on the merits or lack thereof on Linode, but I will say vultr and DO have - by comparision to AWS - leagues better public wan stability. I've had ssh sessions open for months on both and that includes to some POPs in europe. I'm comparing AWS which is <20ms away to something overseas... for reference that's about 20 additional hops and trans-atlantic fiber in between.


Right linode is basically old school dedicated servers afaik but DO should be in different class


AWS networking isn't _great_ but it's decidedly better than DO (which is actually the worst of those listed based on my own TCP connection tests).

Linode is pretty stable if not very exciting, Vultr is "better than DO", but their networks are almost always in maintenance.

For a little context; I maintain IRC servers and those are currently hosted in vultr (with linked nodes in 5 regions), I notice ping spikes between those nodes often and sometimes congestion which drops users. (IRC is highly stateful TCP sessions).

I've only known two truly good networking suppliers, GCP (and their magic BGP<->PoP<->Dark Fibre networks) and.. Tilaa.. (which is only hosted in Netherlands.. which is why I can't use them for my global network)


Awesome thanks for info. For gcp i notice occasional unavailability on the order of 10s of mins every quarter or so. That’s VM networking. Their loadbalancers are a different story as they are a complete crap


This is very much not my experience, do you have any more information?

Any particular regions? Are you certain it's not a local ISP?

(I used to run an always online video game and we had a LOOOOOT of connection issues from "Spectrum internet" on all of our servers including GCP ones.)


Answering here bc bottom post is locked for some reason - east1 occasionally disconnects from other regions. That is definitely within google backbone. Central-1 seems worse tho. If it’s less than an hour they dont bother with the status page.

For loadbalancer its very much by design as they randomly send you rst when google rolls them for upgrade and in some other cases (I’m working on a blog post on this). Google support recommendation is to retry (foreals)


This. And also AWS showed they can delete your business infrastructure unilaterally, without notice, on a whim so it’s very risky to use them.


This. I switched to Hetzner Cloud and reduced costs from ~$200/month on GCP to about ~20€/month. For comparison: 1cpu 2gb ram cost about $15 on GCP and 2,50€ on Hetzner, 1gb outgoing traffic ~0,12€ -> 20TB per machine included on Hetzner and 1€ per additional TB.

I'm still using S3 and GCS to store files because it's convenient and relatively cheap.


AWS has like 10 times more regions than any of the alternatives. If you want something close to home AWS is best.


very very few need so many regions to choose from


Unless they're you know, not close to the main regions. That's why cloud providers have so many regions, people actually live in many parts of the world.

Despite popular belief, the vast majority of the world's population does not live in the Bay Area (or even the US).


I have to say, at my current company we are using Serverless, and it really does feel like it reduces complexity. No runtime/framework to set up, no uptime monitoring or management required on the application layer, and scaling is essentially solved for us. I mean you do pay for what you get, but it does feel like one of those technologies which really lowers the barrier to entry in terms of being able to release a production web application. In terms of building an MVP, a single developer really can deploy an application without any dev-ops support, and it will serve 10 million users if you manage to get that much traffic.

I'm sure it's not optimal for every case, but for an awful lot of cases it seems pretty darned good, and you can save on dev ops hiring.


I used to be very excited about serverless, and I still have high hopes for it.

But for me it ended up replacing the complexity of runtime and frameworks with the complexity of configuring auxiliary srevices like Gateway API, Amazon VPC, etc. We needed to move the complexity to some tool that configured the services around Lambda, like Terraform or Cloud Formation, or at best to a framework like Claudia or Serverless.com. Configuring it by hand looks fine in tutorials, but is madness: it's still complex, and makes it all way too fragile.

There are however some products that make the experience better by simplifying configuration, like Vercel and Netlify.


Yeah I certainly agree that the complexity doesn't really go away completely, and sometimes it's much more frustrating to have to configure poorly documented services rather than just having access to the host OS.

I guess my overall point would be that two of the hardest things to do in terms of making a production-ready application are scaling and security, and Serverless pretty much obviates them. So it's not a magic wand, but it does take away some of the significant barriers to entry.


Yes, I agree with that point. I think my point was more that Serverless is a good idea, but the current implementations are still not good at removing complexity. But I can see this easily changing, with open standards and the such.


Well, we just need to admit that running applications if ack all know risks are complex. If we blissfully ignore risks like lamp or lemp stack its much more easier. Main question do we need to take in account most of risks, running within small scale.


I was expecting writing serverless to be a mess of writing configuration, but I've really enjoyed writing CDK for cloudformation. It's super unclear how you're supposed to write good cdk code, but I feel like I'm a lot clearer on what infrastructure I'm actually using than before, where I was relying on stuff set up by someone else ages ago with minimal to no documentation


This person has never worked in a data center. He thinks he's managing a network because he sets a few vpc ips, that's an itsy tiny fragment of networking, and the cloud has indeed removex a great deal you previously had to manage on prem.


I'm guessing you mean things like STP, firewalls, rate-limiting, routing, DHCP, and link aggregation?


Or going back even further I started my career worrying about all that, plus HVAC, hardware acquisition, capacity planning, power management, distribution and managing gensets(fuel and maintenance); and all the fun crap associated with it like a burst water pipe in your data center on Christmas Eve


to start, yes :)


I wonder who tells the story that cloud computing has something to do with reducing complexity. In my world, cloud computing is a bout scalability and making things as complex as they need to be to be scalable. This rarely means that complexity is being reduced.


But doesn't cloud computing reduce complexity? I used to write my own SystemV boot scripts, service installation and discovery logic, round robin updating tooling, and much more to handle live app deploy/cut over.

Now I have two dozen lines of YAML and GKE JustWorks™, deploying my services and taking care of everything I used to do with my bespoke scripts. My world is much simpler now.


The simplicity of having one cloud based product rather than several native products built for different systems is an argument I’ve heard a lot.


This is an advantage of the web platform, not exactly related to cloud. You can get this advantage with an on-premises web product, or with old school hosting.


Very true, just trying to explain the source of the “cloud reduces complexity” argument. There are a number of small operations that don’t want to manage all their own hardware, so cloud and web are conflated, and you get the web platform simplicity argument being used to justify a cloud platform.


Subjectively, it increasingly feels that while the complexity has been increasing, the notion of longevity of the underlying products and services has been degrading.

While updates to software were expected, general outlook would be that they would not be breaking the core features. The emphasis on backwards compatibility was in a way an assurance to businesses that building their operations on vendor's products is not risky. Even then, some mission-critical elements would be defensively abstracted to avoid the dependency risks (at least theoretically...)

Now, we all witness the "eternal-beta" paradigm across the most of the major software products. Frequent builds with automatic updates, when new features could be suddenly pushed, and old features removed.

Sure, it's still possible to spec out a "hard-rock" steady platform, postpone updates, abstract dependencies and just focus on business. But...such approach won't be approved, as it's widely acknowledged that the presence of critical bugs is rather a "feature" of all software. Postponing the updates is not prudent, it's a liability.

So the rock-solid expectations are just an illusion or perhaps a fantasy promoted widely, just to get the foot in the door.

Ironically, the most stable elements are the so much dreaded "legacy", too often in charge of the business-critical logic.


> the notion of longevity of the underlying products and services has been degrading.

Is eternal longevity even a relevant concept in computing ops any more?

I think businesses must grok that all services are living things that will require eventual maintenance. If you roll your own service, eventually the OS, even an LTS, will need upgrading when the prior release goes EOL. If you're higher up on a FaaS, the language/runtime will still age out, like Python 2, Node.js 6, or Java 5. These will inevitably come with changes. You application will need to adapt.

In extreme cases you can pay for ancient language support and customer IBM mainframe hardware, but the reality is the software's hosting will always need to be updated as time marches on.


Kind of depends on "eternal". I propose 10 years is possible, and realistic. 5 years should not be rare. 20 years is probably a fantasy.


I don't understand what people are building in order to need half of this decoupled and managed elsewhere anyway. It wasn't all that challenging to self manage it five years ago, what's changed?

My guess is that the average small to medium project has drank the enterprise coolaid, and they are suffering the configuration and complexity nightmares that surround managing cloud infrastructure before they really needed to.

As the article is pointing out, you don't forgo managing these things by doing it in the cloud, you just manage it inside a constantly changing Web UI instead of something likely familiar to your developers.


I guess it's Kool-Aid? I don't know; I don't remember being lied to when I started using cloud services. I think of cloud resources as being amazing and basically magical, but I know there's a limit to the magic, and the rest is work. People using (for example) AWS S3 should not be surprised that they still have to work to manage the naming, organization, access control, encryption, retention, etc. of their data, and they might encounter problems if they try to load a 100GB S3 object into a byte array in a container provisioned with 1GB of RAM. But they are. I don't know if that's human nature or if they're being lied to by consultants and marketers.


There are products (Terraform, CloudFormation) that help managing without an UI, but they also add complexity, so our point definitely stills stands.


I've used Terraform for 4~ years.

the providers are ever changing (and so is the terraform language/features). So the "ever-changing" part is certainly still true for Terraform.


Reducing complexity should never be about platform (on-prem vs cloud).

It should be about constructing software in partnership with the business and reducing complexity with modeled boundaries.

You can leverage the cloud to do some interesting things, but the true benefit in is _what_ you construct, not _how_.


I honestly believe that hiding complexity behind a closed door does not eliminate it. However, a lot of software and service vendors have a vested interest in convincing people otherwise. And, historically, they've had all sorts of great platforms for doing so. Who doesn't enjoy a free day out of the office, with lunch provided?

It's also much easier to hide complexity than it is to remove it. One can be accomplished with (relative) turnkey solutions, generally without ever having to leave the comfort of your computer. Whereas the other generally requires long hours standing in front of a chalkboard and scratching your head.


On the other hand, hiding complexity behind closed doors can be a very valuable thing, if it lets you keep track of who knows about the complexity behind each. I can't count the number of issues I've encountered that would have taken minutes instead of hours if only I'd known which specific experts I needed to talk to.


Agreed. Though, that to comes at a cost, so I don't want to do it except when it's worth it.

http://yosefk.com/blog/redundancy-vs-dependencies-which-is-w...


> It's also much easier to hide complexity than it is to remove it.

There are multiple ways to hide complexity. Some of them make it easier to remove (eg, refactoring), others make it nearly impossible to remove. In a service market there’s a perverse incentive to move toward the latter.


There is an element of _how_ as well. You could create simple monoliths or overengineered microservices. Or, complex monoliths with heavy coupling vs cleanly designed microservices with clear separations of concern.


Are microservices meant to separate data too? As in, each service has its own database.

Wouldn't that lead to non-normalisation of the data or a lot of expensive network lookups to get what I want/need?

What is the point of micro services anyway :-)?


> Are microservices meant to separate data too? As in, each service has its own database.

Yes.

> Wouldn't that lead to non-normalisation of the data

Yes. But it's not as bad as it sounds. That is how data on paper used to work, after all.

Business rules (at least ones that have been around for more than 5--10 years) are written with intensely non-normalised data in mind.

Business people tend to be fine with eventual consistency on the scale of hours or even days.

Non-normalised data also makes total data corruption harder, and forensics in the case of bugs easier, in some ways: you find an unexpected value somewhere? Check the other versions that ought to exist and you can probably retrace at what point it got weird.

The whole idea of consistent and fully normalised data is a, historically speaking, very recent innovation, and I'm not convinced it will last long in the real world. I think this is a brief moment in history when our software is primitive enough, yet optimistic enough, to even consider that type of data storage.

And come on, it's not like the complete consistency of the data is worth that many dollars in most cases, if we actually bother to compute the cost.


There's a progression ever developer that grew up on SQL Server/relational data needs to go through...

1. The models (plural intended) of a business are not necessarily a relational. There could be an upstream/downstream relationship. There could be an event-based relationship. There may be no relations at all (documents are handy in these scenarios).

Stop assuming you start with an entity relation diagram. That's an immediate limiting factor when listening to the business describe their processes.

2. There is no such thing as immediate updates to any database. There is _always_ latency. Build software understanding this.

3. Operational data and Analytical data are TWO DIFFERENT THINGS. (sorry for the shouting)

Operationally, I only need concern myself with the immediate needs of the user or process. If I'm doing "something" to the customer domain, I don't need to know or do anything else. If I'm doing something to the order domain, I may need to notify some other domains of what I'm doing or have done, but that's secondary and not _immediately important_. Inventory systems should have built-in mechanisms for levels and never need to know the exact up-to-date figures.

My operational domains can notify whole other systems on changes in data. So your analytical system can subscribe to these changes and normalize that data all it wants. I can even build user interfaces that display both operational and analytical data.

Micro-services are brilliant at operational domain boundary adherence. Events are brilliant at notifying external boundaries of change.

The caveat I point out to my clients is that thinking in this way is very different than we're used to and often comfortable with. It takes time to identify the best boundaries and events for the models of a business. But if you put in that time, the result will be software that your business personnel can actually understand.


> Are microservices meant to separate data too? As in, each service has its own database.

Ideally yes, to scale.

Sometimes you have a service with obvious and easy-to-split boundaries, and microservices are a breeze.

Some things that are easy to turn into microservices: "API Wrapper" to a complex and messy third-party API. Logging and data collection. Sending emails/messages. User authentication. Search. Anything in your app that could become another app.

However, when your data model is tightly coupled and you need to choose between tradeoffs (data duplication), having bigger services, or even keeping it as a monolith.

Btw, if you don't care about scalability, sharing a database is still still not the best idea. But you can have a microservice that wraps the database in a service, for example. Tools like Hasura can be used for that.


Microservices is a solution for an organisational problem ( multiple employees in one project), not a technical one.

If you're flying solo, just use DDD for example. It will give you the same patterns without the devops complexity


Microsoft summarized it nice [1] :

Advantages of public clouds:

Lower costs

No maintenance

Near-unlimited scalability

High reliability

Advantages of a private cloud:

More flexibility

More control

More scalability (compared to pure on-prem solution)

[1] https://azure.microsoft.com/en-us/overview/what-are-private-...


Notice that neither list has "Reduce Complexity" as benefit.

Cloud can abstract away complexity. Shift managing the complexity behind the cloud. But, it's more likely that complexity increases rather than decreases.


Hmm... So Azure for unlimited scalability... But private clouds have more scalability?


probably meant that you can just request hundreds of servers in other part of the world in one single setup compared to manually building your server there.


"More scalability — private clouds often offer more scalability compared to on-premises infrastructure."

I think they meant private cloud (renting 3rd party servers and using/maintaining your private cloud) vs on-prem (buying servers and building your own data centers).


Presumably both options are written relative to non cloud setups


For simple document or file type storage for an application I think solutions like S3 or azure blob storage are really great. In my real world experience we replaced a highly finicky way too heavyweight COTS CMS and all the associated database and backend SAN complexity with S3 and it ended up being way cheaper, easier and more reliable. To me object storage is one place where the cloud hype does deliver. Have had mixed results with the other services, even with things like RDS you have to do a lot of tuning and backup / DR work yourself.


Isn't this just Jevon's Paradox applied to software?

  when technological progress ... increases the efficiency with which a resource is used .., but the rate of consumption of that resource rises due to increasing demand [1]
[1] https://en.wikipedia.org/wiki/Jevons_paradox


Part 1 is linked in article

https://ea.rna.nl/2016/01/10/a-tale-of-application-rationali...

"This was actually part 1 of this story: A tale of application rationalisation (not)."


Anyone manage to find part 1? It's not on their site, can't seem to find it.


It's linked in the article https://ea.rna.nl/2016/01/10/a-tale-of-application-rationali...

"This was actually part 1 of this story: A tale of application rationalisation (not)."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: