If only part of your “application” has more load than others what do you suggest?
That instead of being able to granular take part of the application that had a larger load, and run it on a Firecracker micro VM (the underlying VM for lambda and Fargate) with 256MB RAM and 1 core, we add enough VMs to scale a monolithic app - even the parts that don’t need it on a full scale VM with 8GB RAM and four cores?
We did actually have to do something similar for a legacy Windows app. We scaled the entire process up based on the number of messages in the queue. It was extremely wasteful. It required at least a 4GB/2 CPU VM compared.
Scale the whole thing. The cost of doing that is usually orders of magnitude less than the true CapEx and OpEx of microservices.
edit: If breaking out one service from your monolith worked for your use case, that's great. I'm not trying to deny your experience. It is atypical, however.
We are talking about a 16x difference in resources. Would you also suggest scaling a database that was more read heavy than write heavy instead of splitting reads and writes when you can deal with eventual consistency and just autoscale the read replicas?
The database is the same but you have to then separate your code into reader and writer services with different connection strings, you have to make sure that anything that can’t be eventually consistent uses the writer connection string, etc.
It’s not just a matter of spinning up a database.
Also, since many enterprise apps live stores procedures and putting business logic in the database, that’s another ball of wax you have to untangle.