
Autoscaling Pub/Sub Consumers - imaravic
https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/
======
sargun
At Netflix, we have considerable use of the autoscaler with our custom
scheduler. Is there any reason you can't autoscale based on queue depth, or
time of items in queue?

As far as your Docker problems go, have you tried to upgrade to 4.9.34+? We
found the issue as well, and upgraded kernels, and haven't looked back.

~~~
imaravic
We did thought to use queue depth, but CPU metric was easier to start with.
One problem with using queue depth though is that we can't (didn't figure out
how to) throttle down autoscaler to not give us more machines in case
downstream service is down.

We did upgrade Docker, but we didn't upgrade kernel. Switching kernel is a
bigger task that we'll probably do at some later point.

~~~
sargun
I'd argue that switching kernels is pretty easy. At least for us, there's a
lot less QA to switch kernels than Docker versions. The more often you
upgrade, the easier it is to upgrade, as with almost all things.

We also have a team that does work processing that's built a PID controller to
autoscale ASGs based on queue depth, and time in queue. Although, their code
is not open source, the concept is a self-tuning autoscaler based on a number
of variables allowing you to prioritize cost, timeliness, or even throughput.

We've been doing this for a while. If you'd like to collaborate at all, let me
know.

