
Using Amazon Auto Scaling with Stateful Applications - amagnus
http://adrienmagnus.com/using-amazon-auto-scaling-with-stateful-applications
======
jcrites
Glad to see writeups and lessons learned about AutoScaling. As a system
builder, I think it's a capability that's under-appreciated. My rule of thumb
is that essentially every machine fleet benefits from being managed as an
AutoScaling Group (whether stateless or not - there's value even without
dynamic scaling).

> If you’re working with a queue based model then scaling will be done based
> on the SQS queue size, otherwise we’ll use the custom metric “number of
> running jobs”.

There is another strategy to consider when auto-scaling queue-based
applications: if you have a fixed number of threads per machine instance that
process work from a queue, then you can scale based on the percentage of
threads occupied.

You can treat this as a utilization percentage similar to CPU Utilization, the
observation being that you can begin scaling up in advance of ever actually
developing a (long) queue. For example, consider an application where the
average queue size is zero, and the average thread utilization is zero.
Consider another application where the average queue size is zero, because
messages are processed almost immediately, but 19 out of 20 threads are
occupied on average. You can conclude that the first application is nearly
idle, while the second is nearly maxed out, even though both of them have
empty queues. By considering the second application to be (19/20=) 95%
utilized, you can establish a scaling policy that scales up before a backlog
develops if you wish -- this is assuming that you wish to avoid developing a
long queue, which is desirable in some cases. It depends on how quickly you'd
like to process messages - SLA.

(The article touches on this as well, talking about number of running jobs.)

Queue size can be useful as well, but I think it can be more difficult to
tune. Percent of thread capacity works well regardless of how large your fleet
is, and how expensive messages are to process. By comparison, a large-scale
system that processes thousands of messages per second could develop a large
queue -- in terms of number of items -- from a brief blip, which it will burn
through momentarily. A 20,000 item queue might be nothing to such a system,
whereas for another system, 100 items could be significant, if each one is a
10GB video to download and transcode.

The ideal auto-scaling solution typically involves a mix of multiple
measurement techniques, since it's rare that any single performance
characteristic captures an application's load perfectly. I would definitely
agree with the author's point that instrumentation is highly valuable.

~~~
honkhonkpants
Queues should be measured in wallclock time, not in terms of items. That
removes the problem you mention.

~~~
brightball
The only other consideration is stress of jobs on peripheral resources like
the database. If you have multiple jobs that may be intense enough to insist
on sequential execution it could inflate a wall clock number even if scaling
wouldn't matter.

You essentially need a metric for wallclock time exclusive to jobs that
benefit from scaling.

------
stretchwithme
You'll want to make sure you don't autoscale just because a shared resource
like a database is not working correctly. There's no point in having more
instances up if the new ones don't work either.

Elastic Beanstalk can be very useful, especially for deploying new versions of
your app. It has a CLI that makes it easy to SSH to all of your instances as
well. And application environments can be configured to save its logs to an S3
bucket. And it has support for time-based scaling events, so you can have more
hosts up during the day if that's what you need.

~~~
amagnus
Good point, and that's when good monitoring comes into play.

------
jjeaff
I would be curious what variance of load would be required for this type of
auto scaling to be worth the time vs just putting your own bare metal in a
colocation facility or leasing dedicated boxes.

The price performance is so much higher on dedicated. Or for that matter,
using reserved instances instead of scaling with spot prices.

~~~
amagnus
It comes down to the business decision of going all in with cloud or bare
metal. Day to day management of each is pretty different.

------
stephengillie
Stateful applications should use horizonal autoscale to hot-add CPUs. While
this has a lower limit than vertical autoscale, the two can be combined with
sticky persistence to maintain state at scale.

