
Autoscaling, Welcome to Google Compute Engine - AndrewDucker
http://googlecloudplatform.blogspot.com/2014/11/autoscaling-welcome-to-google-compute.html
======
mmastrac
This has been a long time coming. The autoscaler has been an interesting, and
possible the most interesting, part of AppEngine and IMHO has recently reached
the point (in the last year-and-a-half) where I'd say I'd want to rely on it
for production.

It wasn't always this way -- at one point we were having so many issues with
AppEngine's autoscaling that we ended up in a meeting across from Ben Traynor
and heard some of their engineers discussing post-mortems of scaling failures
in gruesome detail. They've come a long way since then and AppEngine has
really found its way from something of an internal science experiment to what
I think is the best competitor for AWS' offerings.

FWIW, the "backend" or "basic" scaling module that was available for both
AppEngine and GCE was so poor and unreliable that it made it nearly impossible
to scale anything in a way other than manual. I honestly hope they deprecate
this and roll the few features into the autoscaler.

~~~
erjiang
I'm confused - are you saying that scaling was horribly broken for App Engine,
not just Compute Engine? Because it seems that it's much simpler to scale App
Engine apps and App Engine has been around a long time.

~~~
mmastrac
Autoscaling was initially available for AppEngine only and the same (or
similar) infrastructure is being made available to Compute Engine. Compute
Engine had access to "basic scaling" which was so horribly broken and
unreliable that you may as well just set up a static number of instances.

The entire scaling infrastructure of AppEngine was horribly broken and
unreliable up until a year-and-half or so ago, but they've put a lot of work
into fixing that recently and it's in awesome shape now.

------
pm90
So, I'm actually intensely curios about how this thing works, even though its
probably a trade secret or something. How do you prevent oscillations in
resource allocations when the traffic is fluctuating? How do you know if e.g.
an adversary is just DDoS'ing your application to make it more expensive to
run your application? I guess there must be some kind of threshold that you
can set? But then this would cut off the website for the real customers...

~~~
ajessup
PM for GCE Autoscaling here. On protection from malicious traffic spikes - you
can set a max-instances threshold to prevent over-spend which you can change
at any time.

Ultimately though, from an instance's perspective, there's not a lot of
difference between a malicious DDoS and a genuine traffic spike. You can
mitigate this (a) by setting low max-instances thresholds when you aren't
expecting high traffic, and (b) upstream filtering or QOS management to filter
our DDoS traffic from legitimate traffic.

~~~
_dark_matter_
Do you have models for traffic pattern differences between spikes and DDoS
attacks? It seems that a DDoS attack would have different patterns (quicker
spike, no buildup, etc.). Just speculation here.

~~~
ajessup
Autoscaler will respond differently to different traffic patterns, but won't
go as far as identifying a particular traffic pattern as malicious and
refusing to grow resources to accommodate it (most architectures filter
malicious requests upstream of the instance itself, eg. at the load balancer
or via a proxy).

Even if you could identify malicious usage, it's still a matter of policy on
how you want to respond to it - given that if it can reach your VM it will
likely be affecting legitimate traffic too.

------
zeeshanm
Would love to see autoscaling feature in Heroku. Wonder if it breaks their
revenue model somehow to have something like this. May be it's more of a
complex problem. As increasing dynos may not help for apps that are CPU bound.

~~~
willcodeforfoo
I've wondered for a really long time myself
[https://news.ycombinator.com/item?id=7104653](https://news.ycombinator.com/item?id=7104653)

------
jedberg
This is huge. When people ask me how someone could be a true competitor to
AWS, my first is answer is usually "autoscaling". It's the feature that AWS
has that no one else does (now did).

~~~
mynameisvlad
Azure has it: [http://azure.microsoft.com/en-
us/documentation/articles/clou...](http://azure.microsoft.com/en-
us/documentation/articles/cloud-services-how-to-scale/)

And based on the comments, has had it for 1-2 years now.

~~~
pm90
So does Rackspace, apparantly: [http://www.rackspace.com/blog/easily-scale-
your-cloud-with-r...](http://www.rackspace.com/blog/easily-scale-your-cloud-
with-rackspace-auto-scale/)

~~~
jedberg
It does, but they don't have nearly the capacity footprint that Amazon has.

------
pdkl95
So the "Google Grid"[1] is finally complete?

[1]
[http://en.wikipedia.org/wiki/EPIC_2014](http://en.wikipedia.org/wiki/EPIC_2014)

------
TheMagicHorsey
Is this the same sort of thing as AWS's Elastic Load Balancer service?

~~~
g9yuayon
No, it's not the same as ELB. It's similar to Amazon's auto scaling feature
that comes with AWS Auto Scaling Groups. There's a key difference, though. AWS
ASG is threshold based: users define actions when specified metrics exceed
predefined threshold. In contrast, Google's autoscaling is goal oriented:
users do not define action. They simplify specify desired values of specified
metrics. Google takes care of scaling up or scaling down behind the scene.
IMHO, Google's approach gives GCE more room for future optimization.

------
jakozaur
Amazon Web Service plays the strategy "catch me if you can" (e.g. new services
launched last week). Google is trying to catch up, if they want to beat AWS
they would need quite a few differentiators (e.g. like buying Firebase).

~~~
billjive
The problem with this statement is that Amazon is largely matching firebase
functionality. They just announced streaming JSON out of dynamodb, and the new
lambda functionality means you can trigger events on changes and push out
updates. It wasn't explicitly stated but I viewed this as a firebase
competitor.

