Hacker News new | past | comments | ask | show | jobs | submit login
Deploying geographically distributed services on Kubernetes Engine (googleblog.com)
99 points by eicnix on June 5, 2018 | hide | past | favorite | 20 comments



My takeaway/distillation from this article basically comes down to this:

* GKE clusters currently can be only in 1 region (say, us-central1)

* But you can create multiple clusters around the world, and deploy the same app on them

* Google Cloud provides global anycast IPs for load balancers

* Anycast IPs are routed to the closest PoP/datacenter available to the visitor of your application

* Then, the traffic is routed to the closest GKE cluster (without leaving Google's private worldwide network)

* This way, you serve to the user from the nearest cluster

* If a region goes offline (disaster scenario) or is under high load, the load balancer will start sending traffic to other nearby clusters

I deployed the sample demo app provided at with this at http://35.190.71.87/ (probably won't work in a month or so). Depending on where you're visiting, you'll be routed to the closest datacenter the app is deployed in.

The demo app source code is here: https://github.com/GoogleCloudPlatform/k8s-multicluster-ingr... You can try it out yourself if you have a GCP account.

Disclaimer: I work on GKE at Google.


> (without leaving Google's private worldwide network)

:mindblown:

Sometimes I forget just how big Google really is


The Google network is really good, but still has congestion between regions (probably just for GCP customers, I’m sure Google reserve capacity for themselves).

We have to run proxies over the public Internet to work around poor latency between Singapore and Iowa, for instance.


This is unexpected, I'm assuming both instances are on the same VPC and you are using the internal IP addresses right?

(I work for GCP, would love to know more. My contact details are in my profile if you want to go into details)


Yeah, please file a support ticket. You shouldn't need to do that.


Yes, all on the same VPC with internal IPs.

As far as we know this is a known issue (?) that someone is doing something about.

There’s a thread on the insiders list about some other customers with similar issues.


What kind of performance difference do you see between your public internet proxy vs staying on the internal google network?



Any ideas if the whole traffic in fibre between the regions is encrypted?



How do you set up capacity for each cluster? You'll reach a point where one cluster will be overloaded; then, it makes more sense to send some requests to a cluster in another region, rather than melting down the local one. GSLB supports that.


Right. As others have pointed out Google Cloud balancer supports setting up max requests limits, but this is not exposed in kubernetes ingress API. If required, users can set these limits directly on GCLB after setting it up with kubemci. But they will be overwritten by kubemci update.

A common scenario is to setup Pod autoscaler and cluster autoscaler to scale clusters or add more clusters in the same region rather than route requests to a different region.

Disclaimer: I work on kubemci.


(I am not 100% sure, but) I think this is done automatically. When you add an instance group behind a load balancer (https://cloud.google.com/compute/docs/instance-groups/) (which is what GKE Ingress does), the load balancer is aware of the health/CPU utilization of the backend VM instances.

With this information, the load balancer knows about the RPS (queries per second), CPU utilization, and number of connections established to each backend. In Kubernetes/GKE case, each "node pool" of a cluster is an "instance group" and GKE Ingress controller configures the LB with instance groups and health checks.

I believe the load balancing strategy (QPS, CPU, connections etc) is currently not configurable through Kubernetes since Ingress API does not have a good way of exposing cloud-specific configuration details, yet.


I see. Perhaps latency can be an alternate metric to divine capacity/maxqps, because you could have a backend that is not using a lot of CPU, but is still very slow, because it's just waiting on the local DB (which is the one melting down). If region X is Y ms away, you only start sending traffic there once the local backed is Y+Z ms slower than usual. The problem with such a metric is that it's hard to tell overload-induced latency from other kinds of high latencies.

Health checks are good, but they tell you there's a problem seconds after you have already triggered it. :-) Better to have some concept of nominal capacity. Yeah, the Ingress API is not advanced at all and most people end up using controller-specific, non-portable annotations. I thought perhaps that was the case here. Hopefully there will be an improved API soon.


There are balancing modes in GSLB.

One cluster is deployed on one instance group.

So you can set max cpu utilization or RPS for a region.

Requests flow to different region when you breach the defined threshold for the defined metric


You can set a cpu usage % limit and/or request per second limit.


What about databases?


What about databases? What functionality would you expect for this to provide which you shouldn't be providing?


Thinking the same thought reading this. Stateless service distribution isn't the hard part.


I think this is the first of these stories that I have realized has no relevance to me BEFORE I click on the link rather than after. (Currently working on Kubernetes stuff, but all the GKE posts here have not been that useful, though I have played with GKE and like it)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: