For example, we have daemons reading messages from SQS, if you try to use auto scaling based on SQS metrics, you come to realize pretty quickly that CloudWatch is updated every 5 minutes. For most messages, this is simply too late.
In a lot of cases, you are better off with updating CloudWatch yourself with your own interval using lambda functions (for example) and let the rest follow the path of AWS managed auto scaling.
There is also a cascading auto scale that you need to follow. If we take ECS for example, you need to have auto scaling for the containers running (Tasks) AND after that you also need auto scaling for the EC2 resources. Both of these have different scaling speeds. Containers scale instantly while instances scale much slower. Even if you pack your own image, there is still a significant delay.
For example, imagine that each machine has 20 available threads for processing messages received from SQS. Then I'd track a metric which is the percent of threads that are in use. If I'm trying to meet a message processing SLA, then my goal is to begin auto-scaling before that in-use percentage reaches 100%, e.g., we might scale up when the average thread utilization breaches 80%. (Or if you process messages with unlimited concurrent threads, then you could use CPU utilization instead.)
The benefit of this approach is that you can begin auto-scaling your system before it saturates and messages start to be delayed. Messages will only be delayed once the in-use percent reaches 100% -- as long as there are threads available (i.e., in-use < 100%), messages will be processed immediately.
If you were to auto-scale on SQS metrics like queue length, then the length will stay approximately zero until the system starts falling behind, and then it's too late. If you scale on queue size then you can't preemptively scale when load is increasing. By monitoring and scaling on thread capacity, you can track your effective utilization as it climbs from 50% to 80% to 100%; and you can begin scaling before it reaches 100%, before messages start to back up.
The other benefit of this approach is that it works equally well at many different scales; a threshold like 80% thread utilization works just as well with a single host, as with a fleet of 100 hosts. By comparison, thresholds on metrics like queue length need to be adjusted as the scale and throughput of the system changes.
For example (and I know nothing about the use-case of OP, can only estimate), you might be able to buffer requests into a queue and have it scale up slower.
You might have auto scaling that needs to be close to real time and auto scaling that can happen on a span of minutes.
Every auto scaling needs to also keep in mind the storage scaling, often you are limited by the DB write capacity or others.
Additionally, a major use case for ECS is machine learning workloads powered by GPU's and Fargate does not yet have this support. With ECS you can run p2 or p3 instances and orchestrate machine learning containers across them with even GPU reservation and GPU pinning.
ECS GPU scheduling is production ready, and streamlined quite a bit on the initial getting started workflow due to the fact that we provide a maintained GPU optimized AMI for ECS that already has your NVIDIA kernel drivers and Docker GPU runtime. ECS supports GPU pinning for maximum performance, as well as mixed CPU and GPU workloads in the same cluster: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...
One is your options for doing forensics on Fargate. AWS manage the underlying host so you give up the option of Doing host level investigations. It’s not necessarily worse as you can fill this gap in other ways.
Logging is currently only via CloudWatch logs so if you want to get logs into something like Splunk you’ll have to run something that can pick up these logs. You’ll have that issue to solve if you want logs from some other AWS services like Lambda to go to the same place. The bigger issue for us is that you can’t add additional metadata to log events without building that into your application or getting tricky with log group names. On EC2 we’ve been using fluentd to add additional context to each log event like the instance it came from, the AZ, etc. Support for additional log drivers on Fargate is on the public roadmap so there will hopefully be some more options soon.
 Fargate Log driver support v1 https://github.com/aws/containers-roadmap/issues/9
 Fargate log driver support v2 https://github.com/aws/containers-roadmap/issues/10
Edit: another one is that you can run ECS on spot fleet and save some money.