Hacker News new | comments | ask | show | jobs | submit login

There are many limitations that you need to "read between the lines" with AWS auto scaling.

For example, we have daemons reading messages from SQS, if you try to use auto scaling based on SQS metrics, you come to realize pretty quickly that CloudWatch is updated every 5 minutes. For most messages, this is simply too late.

In a lot of cases, you are better off with updating CloudWatch yourself with your own interval using lambda functions (for example) and let the rest follow the path of AWS managed auto scaling.

There is also a cascading auto scale that you need to follow. If we take ECS for example, you need to have auto scaling for the containers running (Tasks) AND after that you also need auto scaling for the EC2 resources. Both of these have different scaling speeds. Containers scale instantly while instances scale much slower. Even if you pack your own image, there is still a significant delay.

The effectiveness of dynamic scaling significantly depends on what metrics you use to scale. My recommendation for that sort of system is to auto-scaled based on the percent of capacity in use.

For example, imagine that each machine has 20 available threads for processing messages received from SQS. Then I'd track a metric which is the percent of threads that are in use. If I'm trying to meet a message processing SLA, then my goal is to begin auto-scaling before that in-use percentage reaches 100%, e.g., we might scale up when the average thread utilization breaches 80%. (Or if you process messages with unlimited concurrent threads, then you could use CPU utilization instead.)

The benefit of this approach is that you can begin auto-scaling your system before it saturates and messages start to be delayed. Messages will only be delayed once the in-use percent reaches 100% -- as long as there are threads available (i.e., in-use < 100%), messages will be processed immediately.

If you were to auto-scale on SQS metrics like queue length, then the length will stay approximately zero until the system starts falling behind, and then it's too late. If you scale on queue size then you can't preemptively scale when load is increasing. By monitoring and scaling on thread capacity, you can track your effective utilization as it climbs from 50% to 80% to 100%; and you can begin scaling before it reaches 100%, before messages start to back up.

The other benefit of this approach is that it works equally well at many different scales; a threshold like 80% thread utilization works just as well with a single host, as with a fleet of 100 hosts. By comparison, thresholds on metrics like queue length need to be adjusted as the scale and throughput of the system changes.

From a bird's eye view, you also need to figure out what costs you more.

For example (and I know nothing about the use-case of OP, can only estimate), you might be able to buffer requests into a queue and have it scale up slower.

You might have auto scaling that needs to be close to real time and auto scaling that can happen on a span of minutes.

Every auto scaling needs to also keep in mind the storage scaling, often you are limited by the DB write capacity or others.

Out of curiosity, what’s the use case for running ECS on EC2 (instead of using Fargate) these days?

AWS employee here. If you are able to achieve consistent greater than 50% utilization of your EC2 instances or have a high percentage of spot or reserved instances then ECS on EC2 is still cheaper than Fargate. If your workload is very large, requiring many instances this may make the economics of ECS on EC2 more attractive than using Fargate. (Almost never the case for small workloads though).

Additionally, a major use case for ECS is machine learning workloads powered by GPU's and Fargate does not yet have this support. With ECS you can run p2 or p3 instances and orchestrate machine learning containers across them with even GPU reservation and GPU pinning.

I'm not totally up to speed on ECS vs EKS economics but it seems like EKS with p2/p3 would be a sweet solution for this. Even better if you have a mixed workload and you want to easily target GPU-enabled instances by adding a taint to the podspec.

Kubernetes GPU scheduling is currently still marked as experimental: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus...

ECS GPU scheduling is production ready, and streamlined quite a bit on the initial getting started workflow due to the fact that we provide a maintained GPU optimized AMI for ECS that already has your NVIDIA kernel drivers and Docker GPU runtime. ECS supports GPU pinning for maximum performance, as well as mixed CPU and GPU workloads in the same cluster: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...

Apart from pricing and the potential to overcommit resources on EC2-ECS, there are a couple of other differences.

One is your options for doing forensics on Fargate. AWS manage the underlying host so you give up the option of Doing host level investigations. It’s not necessarily worse as you can fill this gap in other ways.

Logging is currently only via CloudWatch logs so if you want to get logs into something like Splunk you’ll have to run something that can pick up these logs. You’ll have that issue to solve if you want logs from some other AWS services like Lambda to go to the same place. The bigger issue for us is that you can’t add additional metadata to log events without building that into your application or getting tricky with log group names. On EC2 we’ve been using fluentd to add additional context to each log event like the instance it came from, the AZ, etc. Support for additional log drivers on Fargate is on the public roadmap[1][2] so there will hopefully be some more options soon.

[1] Fargate Log driver support v1 https://github.com/aws/containers-roadmap/issues/9 [2] Fargate log driver support v2 https://github.com/aws/containers-roadmap/issues/10

At least one is, that you get to use the leftover CPU and memory for your other containers when you use an EC2 instance. With some workloads this lets you to overcommit those resources if you know all your containers won't max out simultaneously.

Edit: another one is that you can run ECS on spot fleet and save some money.

Fargate is orthogonal to ECS and can be used together. The difference is that instead spinning VMs as hosts and configuring them for ECS and worrying about spinning just right amount of them, you select fargate, which does all of that behind the scenes (kind of like lambda), but the VM instances provided by fargate are a bit more expensive.

A large part of our ECS capacity is running on spot instances which are much cheaper.

Perhaps cost?

Cost and legacy.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact