
Deploying geographically distributed services on Kubernetes Engine - eicnix
https://cloudplatform.googleblog.com/2018/06/How-to-deploy-geographically-distributed-services-on-Kubernetes-Engine-with-kubemci.html?m=1
======
alpb
My takeaway/distillation from this article basically comes down to this:

* GKE clusters currently can be only in 1 region (say, us-central1)

* But you can create multiple clusters around the world, and deploy the same app on them

* Google Cloud provides global anycast IPs for load balancers

* Anycast IPs are routed to the closest PoP/datacenter available to the visitor of your application

* Then, the traffic is routed to the closest GKE cluster (without leaving Google's private worldwide network)

* This way, you serve to the user from the nearest cluster

* If a region goes offline (disaster scenario) or is under high load, the load balancer will start sending traffic to other nearby clusters

I deployed the sample demo app provided at with this at
[http://35.190.71.87/](http://35.190.71.87/) (probably won't work in a month
or so). Depending on where you're visiting, you'll be routed to the closest
datacenter the app is deployed in.

The demo app source code is here:
[https://github.com/GoogleCloudPlatform/k8s-multicluster-
ingr...](https://github.com/GoogleCloudPlatform/k8s-multicluster-
ingress/tree/master/examples/zone-printer) You can try it out yourself if you
have a GCP account.

Disclaimer: I work on GKE at Google.

~~~
puzzle
How do you set up capacity for each cluster? You'll reach a point where one
cluster will be overloaded; then, it makes more sense to send some requests to
a cluster in another region, rather than melting down the local one. GSLB
supports that.

~~~
alpb
(I am not 100% sure, but) I think this is done automatically. When you add an
instance group behind a load balancer
([https://cloud.google.com/compute/docs/instance-
groups/](https://cloud.google.com/compute/docs/instance-groups/)) (which is
what GKE Ingress does), the load balancer is aware of the health/CPU
utilization of the backend VM instances.

With this information, the load balancer knows about the RPS (queries per
second), CPU utilization, and number of connections established to each
backend. In Kubernetes/GKE case, each "node pool" of a cluster is an "instance
group" and GKE Ingress controller configures the LB with instance groups and
health checks.

I believe the load balancing strategy (QPS, CPU, connections etc) is currently
not configurable through Kubernetes since Ingress API does not have a good way
of exposing cloud-specific configuration details, yet.

~~~
puzzle
I see. Perhaps latency can be an alternate metric to divine capacity/maxqps,
because you could have a backend that is not using a lot of CPU, but is still
very slow, because it's just waiting on the local DB (which is the one melting
down). If region X is Y ms away, you only start sending traffic there once the
local backed is Y+Z ms slower than usual. The problem with such a metric is
that it's hard to tell overload-induced latency from other kinds of high
latencies.

Health checks are good, but they tell you there's a problem seconds after you
have already triggered it. :-) Better to have some concept of nominal
capacity. Yeah, the Ingress API is not advanced at all and most people end up
using controller-specific, non-portable annotations. I thought perhaps that
was the case here. Hopefully there will be an improved API soon.

------
yannski
What about databases?

~~~
alxvio
_What_ about databases? What functionality would you expect for this to
provide which you shouldn't be providing?

------
linsomniac
I think this is the first of these stories that I have realized has no
relevance to me _BEFORE_ I click on the link rather than after. (Currently
working on Kubernetes stuff, but all the GKE posts here have not been that
useful, though I have played with GKE and like it)

