Auto-scaling is depending on startup time. If your startup time for a new instance/container is 5 seconds, then you need to predict what your traffic will be in 5 seconds. If your startup time is 10 minutes, then you need to predict your traffic in 10 minutes.

The choice of metric is important, but it needs to be a metric that predicts future traffic if you want to autoscale user facing services. CPU load is not that metric.

The best way to do autoscaling is to build a system that is unique to your business to predict your traffic, and then use AWS's autoscaling as your backup for when you get your prediction wrong.

