
When AWS Autoscale Doesn’t - fullung
https://segment.com/blog/when-aws-autoscale-doesn-t/
======
DVassallo
The way I've been happiest using EC2 Auto Scaling was to have a single cron-
job continuously calculating how many instances I should be running, and it
sets the desired capacity manually with the Auto Scaling API[1]. This may seem
to defeat the purpose of Auto Scaling, but it's actually much more convenient
than spinning up/down EC2 instances with the EC2 API. You get to precisely
control how to scale, and won't be at the mercy of the Auto Scaling
heuristics.

[1] [https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-
man...](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-
scaling.html)

~~~
samstave
So we had this cause a spectacular outage a few years ago.

We were doing exactly this - but we had a flaw: we didnt handle the case when
the AWS API was actually down.

So we were constantly monitoring for how many running instances we had - but
when the API went down, just as we were ramping up for our peak traffic - the
system thought that none were running because the API was down - so it just
kept continually launching instances.

The increased scale of instances pummeled the control plane with thousands of
instances all trying to come online and pull down their needed data to get
operational -- which them killed our DBs, pipeline etc...

We had to reboot our entire production environment at peak service time...

~~~
DVassallo
That's not the right way to do it. You shouldn't monitor how many instances
you're running. You just need to determine how many instances you should be
running based on your scaling driver (cpu, # of users, database connections,
etc). Then you call the Auto Scaling SetDesiredCapacity API with the number,
and it is idempotent[1]. If the AWS API is down, your fleet size just won't
change.

[1]
[https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API...](https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_SetDesiredCapacity.html)

~~~
icelancer
> That's not the right way to do it.

The poster is aware of this, which is why they talked specifically what they
did wrong.

~~~
zepolen
While the poster was aware of it, he did not provide a solution whereas
DVassallo provided a valuable step by step on how to do it properly. Which may
help others in the future.

Think long and hard why you felt it necessary to make your comment and what
value it actually provided.

~~~
icelancer
> Think long and hard why you felt it necessary to make your comment and what
> value it actually provided.

Yeah, that's what I was telling the other poster in a nicer way. Same goes to
you. You can provide advice without repeating criticism for no reason when the
person specifically said they did something wrong.

And save me the lecturing on "long and hard." My comment has 15 upvotes, so
it's pretty unlikely I'm the one off the mark in this conversation.

~~~
zepolen
> You can provide advice without repeating criticism for no reason when the
> person specifically said they did something wrong.

That statement is a joke, here it is reworded:

> A person should be invulnerable to criticism as long as they make a humbling
> remark.

Doesn't sound so great now does it.

Also, I'm not surprised you got 15 upvotes. This place has ceased to be a
hacker forum for many years now. Too many eternal politically correct
Septembers.

~~~
icelancer
A strawman argument + when your view is not popular, the environment must be
the problem. Classic undefeatable argument. I'm surprised you have problems
getting along here.

~~~
zepolen
> when your view is not popular, the environment must be the problem

You were the one using upvotes to validate your argument when it's a fact that
posting a political opinion in either a left and right forum will net highly
different responses. Of course the environment plays a part.

Won't even bother dissecting the first shot. Your arguments have been weak at
best till now this final one was the final straw, man.

------
avitzurel
There are many limitations that you need to "read between the lines" with AWS
auto scaling.

For example, we have daemons reading messages from SQS, if you try to use auto
scaling based on SQS metrics, you come to realize pretty quickly that
CloudWatch is updated every 5 minutes. For most messages, this is simply too
late.

In a lot of cases, you are better off with updating CloudWatch yourself with
your own interval using lambda functions (for example) and let the rest follow
the path of AWS managed auto scaling.

There is also a cascading auto scale that you need to follow. If we take ECS
for example, you need to have auto scaling for the containers running (Tasks)
AND after that you also need auto scaling for the EC2 resources. Both of these
have different scaling speeds. Containers scale instantly while instances
scale much slower. Even if you pack your own image, there is still a
significant delay.

~~~
eunoia
Out of curiosity, what’s the use case for running ECS on EC2 (instead of using
Fargate) these days?

~~~
NathanKP
AWS employee here. If you are able to achieve consistent greater than 50%
utilization of your EC2 instances or have a high percentage of spot or
reserved instances then ECS on EC2 is still cheaper than Fargate. If your
workload is very large, requiring many instances this may make the economics
of ECS on EC2 more attractive than using Fargate. (Almost never the case for
small workloads though).

Additionally, a major use case for ECS is machine learning workloads powered
by GPU's and Fargate does not yet have this support. With ECS you can run p2
or p3 instances and orchestrate machine learning containers across them with
even GPU reservation and GPU pinning.

~~~
chrissnell
I'm not totally up to speed on ECS vs EKS economics but it seems like EKS with
p2/p3 would be a sweet solution for this. Even better if you have a mixed
workload and you want to easily target GPU-enabled instances by adding a taint
to the podspec.

~~~
NathanKP
Kubernetes GPU scheduling is currently still marked as experimental:
[https://kubernetes.io/docs/tasks/manage-gpus/scheduling-
gpus...](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)

ECS GPU scheduling is production ready, and streamlined quite a bit on the
initial getting started workflow due to the fact that we provide a maintained
GPU optimized AMI for ECS that already has your NVIDIA kernel drivers and
Docker GPU runtime. ECS supports GPU pinning for maximum performance, as well
as mixed CPU and GPU workloads in the same cluster:
[https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-
gpu.html)

------
jedberg
Auto-scaling is depending on startup time. If your startup time for a new
instance/container is 5 seconds, then you need to predict what your traffic
will be in 5 seconds. If your startup time is 10 minutes, then you need to
predict your traffic in 10 minutes.

The choice of metric is important, but it needs to be a metric that predicts
future traffic if you want to autoscale user facing services. CPU load is not
that metric.

The best way to do autoscaling is to build a system that is unique to your
business to predict your traffic, and then use AWS's autoscaling as your
backup for when you get your prediction wrong.

------
Legogris
Having used ECS quite a bit, I do not recommend anyone building a new stack
based on it. Kubernetes solves everything ECS solves, but usually better and
without sveral of the issues mentioned here. Last time I checked, AWS was
still lagging behind Azure and GCP on Kubernetes, but I have a strong feeling
they're prioritizing improving EKS over ECS.

If you're already invested in ECS it's a different story, of course.

~~~
pram
I think that Fargate is the “improvement” for ECS. I never understood the
appeal of ECS in the first place, seemed (and still does) really half baked.

~~~
NathanKP
AWS employee here. Sorry to hear that you feel ECS is half baked. Feel free to
reach out directly using the details in my profile info if you have any
feedback you'd like me pass on to the team.

To clear up the confusion on the relationship between Fargate and ECS, think
of Fargate as the hosting layer: it runs your container for you on demand and
bills you for the amount of CPU and GB your container reserved per second. On
the other hand ECS is the management layer. It provides the API that you use
to orchestrate launching X containers, spreading them across availability
zones, and hooking them up to other resources automatically (like load
balancers, service discovery, etc).

Currently you can use ECS without using Fargate, by providing your own pool of
EC2 instances to host the containers on. However, you can not use Fargate
without ECS, as the hosting layer doesn't know how to run your full
application stack without being instructed to by the ECS management layer.

~~~
pram
From my perspective Fargate offers the functionality I would have expected
from ECS in the first place. What ECS provides OOB requires too much
janitoring and ultimately isn't terribly different in effort compared to
running your own k8s or mesos infra on EC2 instances you provisioned yourself.
You still basically needed an orchestration layer over ECS.

Which is why, I assume, Fargate is now listed as an integral feature of ECS on
the product page.

~~~
NathanKP
Yeah to be clear the Fargate container hosting was always the vision for ECS,
from the very first internal proposal to build this system. But its necessary
to build something that keeps track of container state at scale first, and
that is ECS. We built ECS so that it can keep track of container state both in
Fargate and containers running on your own self managed EC2 hosts. This gives
you the most flexibility if you have really specific needs for your container
hosts that Fargate can't cover for you.

------
xfitm3
I'm surprised I didn't see application performance monitoring mentioned here.
A lot of applications are complex and in those cases adding containers is only
effective until you reach the next constraint.

Having two resources (such as DB and app) scale in concert can be exceedingly
difficult.

~~~
eikenberry
> Having two resources (such as DB and app) scale in concert can be
> exceedingly difficult.

This means your resources are too tightly coupled. If they are so tightly
coupled that they need to scale together then they are not two resources and
you should look into restructuring them into two actual resources or bind them
more closely to make a single resource.

~~~
xfitm3
In my example of DB and backend: How could they be decoupled?

~~~
eikenberry
As far as runtime, applications and DBs are already decoupled. You have N
application instances mapped to M database instances. Applications can usually
scale pretty much with load. Databases vary wildly in how they scale and it
depends on the DB type.

------
tyingq
Vertical autoscale is on my wish list. Some way to automatically scale
instance size for those things that don't scale well horizontally.

~~~
etaioinshrdlu
Jelastic claims to do this, and the marketing made it sound so cool.
[https://jelastic.com/](https://jelastic.com/)

But when I tried it, it turns out the docker support requires very specific
base images.

Not really docker then is it?!?

~~~
RuslanSJ
Hi, Jelastic founder is here. Thank you for mentioning our product. Vertical
scaling is not a marketing :), it's reality.

Jelastic public cloud providers offer automatic vertical scaling with pay-as-
you-use billing model (please do not confuse with pay-as-you-go). Except this
our team helps related technologies to become more elastic, for example Java
[https://jelastic.com/blog/elastic-jvm-vertical-
scaling/](https://jelastic.com/blog/elastic-jvm-vertical-scaling/)

Regarding the docker support, there are two flavors inside Jelastic: 1) Native
Docker Engine - you can create a dedicated container engine for your project
in the same way as you do on any IaaS today, for example “How to run Docker
Swarm” [https://jelastic.com/blog/docker-swarm-auto-clustering-
and-s...](https://jelastic.com/blog/docker-swarm-auto-clustering-and-scaling-
with-paas/). An advantage here is the vertical scaling feature. In Jelastic
unused resources will not be considered as paid while at any other cloud
provider you will have to pay for the VM resource limits.

2) Enhanced System Containers based on Dockerfile - there is no need to
provision a dedicated docker engine or swarm. This solution provides even
better density, elasticity, multi-tenancy and security, more advanced
integration with UI and PaaS features set compared to #1. It supports multiple
processes inside a single container, you can get an SSH access and use all
standard tools for app deployment, write to local filesystem, use multicast
and so on. It supports traditional or legacy apps while images can be prepared
in the same familiar Dockefile format. Unfortunately it's not fully compatible
with Native Docker Engine due to specifics limitations/requirements of docker
technology itself.

Thank you for pointing out this issue. In the upcoming release we will clarify
the difference between two and provide more tips which one is better to use in
various cases.

~~~
etaioinshrdlu
Regarding 1: Any easy set up guide for native mode?

Regarding 2: Is there any fundamental reason why full compatibility will never
work?

~~~
RuslanSJ
1 - An easy way is go to marketplace and print Docker in the search field,
choose Engine or Swarm, press install. There is one more article that will be
helpful [https://jelastic.com/blog/docker-engine-auto-install-
connect...](https://jelastic.com/blog/docker-engine-auto-install-connect-ssh-
portainer/)

2 - As Docker, Open Containers and other related technologies evolve the
difference between system and application containers gets slimmer over the
time. As an example well known issue with memory limits
[https://jelastic.com/blog/java-and-memory-limits-in-
containe...](https://jelastic.com/blog/java-and-memory-limits-in-containers-
lxc-docker-and-openvz/) now can be solved with help of lxcfs
[https://medium.com/@Alibaba_Cloud/kubernetes-demystified-
usi...](https://medium.com/@Alibaba_Cloud/kubernetes-demystified-using-lxcfs-
to-improve-container-resource-visibility-86f48ce20c6). I hope at some point we
will be able to use benefits of both in one container engine.

------
b5u
I've also been deploying services on ECS for close to a year now and would
like to address some inaccuracies the author seems to have made: 1) in
'Surprise 1' the author offers examples of CPU Utilization (or target) is
between 80% and 95% _without_ mentioning the reserved CPU/memory (aka size) of
those tasks (under the assumption that he's using the Fargate launch type).
The 'size' of a task also influences the average CPU target utilization. For
instance, if a task requires the reserved CPU of 4 vCPUs, then a spike from
80% to 95% is handled differently than when a task reserves 1 or 2 vCPUs. The
same goes for memory. In an example setup I'd use 1-2 vCPUs sized tasks with a
_service-wide_ target avg. CPU Utilization of 70% along and a StepScaling
policy which adds 10% more tasks if the service avg. CPUU falls between 70-80,
20% if between 80-90 and 25% if above 90. My strategy has been being smaller-
sized tasks, lower service avg CPU utilization (compared to 80%-90%) and
shorter evaluation periods/datapoints for the scale-out CW alarms (minimum
being 60 seconds IIRC). The short evaluation periods/low number of datapoints
of the CW alarm allowed me to handle spikes reasonably fast.

2) in 'Surprise 3' the author claims that the Terraform's
aws_appautoscaling_policy 'is rather light on documentation'. Since I am a
user of Terraform for several years, I find it inaccurate mostly because of
the several examples available in the documentation
[https://www.terraform.io/docs/providers/aws/r/appautoscaling...](https://www.terraform.io/docs/providers/aws/r/appautoscaling_policy.html)
as well as many more when doing a Github exact search for
"aws_appautoscaling_policy" language:HCL will reveal many, many more examples
from open-source repos (some with permissive licenses too). I'd created a
custom ecs-service TF module which creates for each service (optionally) an
ALB along with listeners and the attached ACM-issued TLS certs and TGs, the
scale-in/out CW alerts with configurable thresholds/policies, SGs, Route53,
etc. allowing one to quickly configure and launch an ECS service fast and
reliably.

Regarding the scale-in, I typically also have that at intervals between 5-15
minutes to avoid an erratic scale-in/scale-out 'zig-zag' happening even at the
cost of briefly over provisioning.

------
argd678
The biggest scaling issue I always run into is the database is the bottleneck
and there’s not a lot of options for most databases to auto scale them.

~~~
brianwawok
Yup. Your DB usually has to be over-provisioned for peak WRITE capacity.

Read capacity is easy to skill to infinity with caches. But if a DB can only
write 1000 updates per second, nothing will change that.

In many cases - it's ok to not process EVERYTHING right away. Process the
important stuff RIGHT AWAY. Slowly process the unimportant stuff in your spare
time.

------
mnutt
The biggest challenges I’ve had with auto scaling have been slow scaling time
and default metrics not being a good proxy for scaling needs. One thing I was
mildly curious about: if you’re going to build your own metrics and scaler,
what would be some of the downsides of having it scale down by just putting
instances in the Stopped state, then scale up by starting them? In my
experience starting takes seconds while launching new instances takes minutes.

Having to deploy updates to stopped instances would be complicated and you’d
have to pay EBS costs for stopped instances, but I’m curious if there are
other issues. Launching an instance from an AMI, even after the instance comes
up the disk tends to be very slow for some time as if it’s lazily loading the
filesystem over the network.

------
acd
AWS needs to glue EC2 and ECS scheduling together. Today the schedulers are
separate. So basically the feet does not know what the arms are doing. That
leaves fixing this scaling up to the client meaning duplicate code effort
solving the same thing for each AWS customer.

~~~
NathanKP
AWS employee here.

This is a feature that is currently on our public roadmap for container
services, in the "Researching" category: [https://github.com/aws/containers-
roadmap/issues/76](https://github.com/aws/containers-roadmap/issues/76)

Feel free to drop a thumbs up on the roadmap item to show your support and
boost its priority on the roadmap, or leave a comment to let us know more
about your needs.

------
anbotero
Quick question for the AWS employee solving inquiries: I used ECS in 2017, and
back then there was this weird issue where sometimes tasks would switch to new
versions in like, a minute (if that), but sometimes, like 2/10, it would take
like 10-12 minutes just for it to start killing old Task containers. Back then
there wasn't any timeout option or anything to force the killing. Do you know
if now there is? The project was killed for different reasons, but I really
liked everything else on ECS. Thanks!

EDIT: I meant killing containers, not the Tasks themselves. Sorry.

------
doctorpangloss
An incredible amount of software and infrastructure is written precisely for
analytics data gathering workloads.

I'm pretty confident AWS's product for this use case would be Lambda and the
new on-demand DynamoDB.

Is there actually a use case in analytics that requires a server that accepts
connections from multiple clients, and then has to have <60ms latency
including state over the wire and executing sophisticated business logic,
between those clients, for time periods longer than 5 seconds? I.e. something
that resembles a video game?

Because if there isn't, if your goal is to scale, why have containers at all?

~~~
thecopy
Batching incoming requests, for one. Kinesis only allows 5 write requests per
second per shard, for example. As well, Lambda have limits regarding
concurrent executions and are very slow (10s) if needing VPC connectivity (in
this case the default concurrent lambda limit is 350 due to ENIs)

~~~
meekins
Hmm... I don't see anything in the docs implying that - Kinesis API docs say
it's possible to ingest 1000 records or 1MB per shard per second. There's a
5/s limit on reads however but those deal with batches of records anyway.

We have one service running that consumes data to a Kinesis stream published
as an API GW endpoint. Preprocessing is done in Lambda in batches of 100
records and the processed records get pushed to Firehose streams for batched
loads to a Redshift cluster for analytics. So far we've been very happy with
the solution - very little custom code, almost no ops required and it performs
and scales well.

~~~
thecopy
Yes, sorry, i confused it with GetRercords limit.

------
eridius
> _For example, the maximum value for CPU utilization that you can have
> regardless of load is 100%._

I'm surprised it can't use load average to estimate the true resource demand.

~~~
wahnfrieden
You can, but it’s not visible to the hypervisor (it’s an OS concept) so you
have to publish that metric from an agent on the machine. Then you can use it
for autoscaling.

~~~
viraptor
But even then, "load" doesn't work well for all workloads.
[http://www.brendangregg.com/blog/2017-08-08/linux-load-
avera...](http://www.brendangregg.com/blog/2017-08-08/linux-load-
averages.html)

The number of switching tasks may be very high for a number of reasons,
including a very large number of threads which do very small chunk of work
each and yield.

------
ravedave5
I got bit by this! Even worse is that one of the servers crumpled because we
didn't scale up fast enough - so AWS killed it because of the health metric.
Which then took out the remaining two because they were then far, far over
capacity. I got the pager duty alert and found a total cluster and just
manually set it to scale up way bigger. Now for all big events we manually
bump minimum server counts for that period :\

------
Buge
>So, if you’re targeting 95% CPU utilization in a web service, the maximum
amount that the service scales out after each cooldown period is 11%: 100 / 90
= 1.1

How many errors are in that sentence? The 95% -> 90 mysterious conversion.
100/90 is actually 1.111(repeating of course), not 1.1. And if it did equal
1.1 it would be 10%, not 11%.

------
auslander
> .. the ECS dashboard does not yet support .. Terraform ..

They haven't yet matured enough, it seems. Cloudformation is the right way to
code your infra, not web Console or anything else.

Good sign, though, is they use ECS, not Kubernetes :)

------
svsucculents
That's why you don't use AWS 'auto' scaling. Every application has it's sweet
spot and it's simpler to roll your own when you know best.

------
jugg1es
I think most people want to write this kind of blog after slogging through the
AWS learning curve. Then they figure out how to use it and the urge goes away.

------
k__
_°whispers°_ cloud native...!

