My take was that the mistake was using a myriad of low resource nodes to run dep...

My take was that the mistake was using a myriad of low resource nodes to run deployments that required an increasing amount of computational resources to run without problems. This led to launching more nodes to accommodate peaks which then just stayed idling.

The Kubernetes cluster was configured with horizontal pod autoscaling and cluster autoscaling, and to avoid problems the cpu limits were set to 0.5cpu. The end result was Kubernetes creating a myriad of nodes running 70% idle to accommodate the result of the cluster's autoscaling policy because a 1vCPU node does not have much headroom to accommodate peaks. For example, if you have 3 or 4 pods with 500m cpu limit running on a single vCPU node and it so happens that two peak at the same time, resource limits will be hit and cluster scaling will kick in to create yet another node just to meet that demand. In practice this mean that for each and every 1vCPU node to accommodate the peak demand of a single pod without kicking in cluster autoscaling, it needs to run at most at 50% idle.

This problem is mitigated by replacing 1vCPU nodes with higher vCPU count (the author switched to nodes with 16 vCPUs) because they have enough resources to scale up deployments without having to launch new nodes.