Container-Native Multi-Cluster Global Load Balancing with Cloud Armor on GCP

nopurpose · on Feb 16, 2020

> This means the Service must be created first, then when the corresponding NEG is created the name can be queried and added to the Terraform project.

These kind of reverse dependencies, where app level changes have to be reflected in infrastructure are source of endless bugs and headaches.

To do it right, you'd want to encode app and infra changes in the same ubiquitous tool, but "infrastructure as a code" tools such as terraform suck big times at app level deploys.

Pulumi takes steps in right direction, where it is actually not that painfull to manage everything, that is what I ended up using for very similar problem configuring GCPs load balancers for my k8s apps.

cppforlife · on Feb 16, 2020

someone asked me to wrap get-kapp.io (one of my projects) with terraform so that they can pass down infra settings to their k8s config. i ended up writing https://githubc.com/k14s/terraform-provider-k14s (example: https://github.com/k14s/terraform-provider-k14s/tree/master/...). i havent been asked about case where app info is needed by terraform, but may be thats something to add...

solatic · on Feb 16, 2020

> To do it right, you'd want to encode app and infra changes in the same ubiquitous tool

Well, that's exactly what we do with Dhall, which lets us put static types on infrastructure tooling that prints out unpredictable output but in a predictable format and transform that output into the format that other dependent tools expect. Everything is then glued together with simple shell scripts and run in a CI environment.

Once you understand the pattern - use an uber-config plus typed tool outputs to generate downstream config and apply all config in an idempotent way - you wonder why you ever did things any other way.

nopurpose · on Feb 17, 2020

I looked at Dhall before. It is a pure language for static configs, right? It can't help me to express "deploy k8s resource, wait for annotation to appear, take annotation value and use it to deploy infra resource". Wrapping multiple tools in a bash is exact thing I wish I didn't have to do :(

If Dhall enables safer bash glue, I'd be happy to read how

cbushko · on Feb 16, 2020

We use terraform for our kubernetes app configuration and it works well. Being able to pass values from one resource to another is extremely powerful. For example, the database instance credentials down into cloudsql proxy.

The only and most frustrating missing piece of the terraform kubernetes provider is CRDs. I hacked around it by using some yaml provider bit I would like a cleaner solution.

nopurpose · on Feb 16, 2020

TF kubernetes provider works for simplest cases only. How would you wait for PV to be created, when your TF resources specify PVC? That is to help users to diagnose problems with deployments.

hinkley · on Feb 16, 2020

What about the Pulumi solution gets in your way?

nopurpose · on Feb 17, 2020

No tools IaC tools I know of including Pulumi, allow to describe evolutions for infra. Simple use case - deploy canary instance, check it's health, scale down canary instance, update main pool of instances.

In other words, there is no way to express sequence of changes of the same object in the single run of IaC tool. We are forced to wrap IaC tools in layers of bash to simulate it.

hinkley · on Feb 17, 2020

Ah, good to know, thanks.

What you're describing sounds a bit like migrations for databases. Do you agree?

cbushko · on Feb 16, 2020

This is a decent article on how the load balancing between GCP loadbalancers and kubernetes works.

We use terraform end to end and tried the http/https L7 loadbalancer first in our setup but I had a heck of a time with:

- there is no API for the annotation mentioned in the article so if you miss it while setting up a backend, nothing works - gRPC was hard to get going. Most gRPC examples out there use some port such as 50051 so you assume that is needed. gRPC does work over 443.

We currently use istios ingressgateway as a loadbalancer that you set in your kubernetes setup. It works but I don't know if it is better or not. We had to run an ingressgateway pod as a daemonset on every node so that we could get the real IP addresses from requests for security logging. That was a pain.

pmlnr · on Feb 16, 2020

Is the title peak buzzword-era, or is there more to come?

summarity · on Feb 16, 2020

This might be an elaborate demo for the upcoming thisblogpostdoesnotexist ML text synthesis project.

Just for fun, I used Grover to generate an article based on the headline:

> What is the cheapest way to optimize the cloud? Going globally? Using open-source cloud computing platforms? Choosing AWS in the U.S. and for the remaining countries? Choosing SaaS? Or, using AWS and establishing a global presence on the EC2 Container Manager platform?

> I know because I did all four. And I won’t do them again.

> I am not arguing that globalizing your compute resources through a community cloud is any less important than globalizing your compute resources through your enterprise cloud (though I am somewhat of a skeptic about the value of community clouds). The important point is that migrating to a global cloud requires a state-of-the-art approach to container-native load balancing, container-native load balancing on AWS, and container-native load balancing on other clouds. And you need to be prepared to pay for it.

DangitBobby · on Feb 16, 2020

Okay, but unless Grover can accurately describe how to deploy services and load balancers on GCP, generate valid kubernetes yaml that matches the contents of the article, and generate an infrastructure diagram that matches the infrastructure described in the article, then this was certainly not generated by Grover. It in no way resembles something generated by some ML text synthesis. I'm not sure how anyone who read it could come to the conclusion that it was.

politelemon · on Feb 16, 2020

Can you tell me a little more about Grover, is it a CLI you just pointed at a URL or a little bit more to generate that article?

summarity · on Feb 16, 2020

I have a local version of Grover and the model checkpoints, so I used a python script. But I think there's an online demo somewhere.

ljm · on Feb 16, 2020

I remember when load balancing was a case of setting up apache or nginx on a server and letting it handle the upstream connections.

zzzcpan · on Feb 16, 2020

It still is. Proper "global load balancing" is essentially a CDN and nginx with say bind9 running on a bunch of edge nodes from different hosting providers without all those container-native buzzwords is the easiest way to do it.

manigandham · on Feb 17, 2020

It still is... if you're just running a few apps on single server, or 1 load balancer VM in front of a few other servers.

This article is talking about routing traffic from a single IP globally to multiple backend Kubernetes clusters that can be running in any region. Traffic will automatically go to the closest region (with available capacity) or otherwise fallback to other endpoints, while also bypassing much of the Kubernetes network stack to go straight to the running pods via GCP's software-defined network.

The complexity here is warranted if you need the flexibility and features. If you don't then you can just stick with nginx.

ec109685 · on Feb 16, 2020

I think that is unfair. Each of those buzz words were concepts in the article or actual products that GCP offers.

However, this is embarrassing to GCP that this isn’t easier. These were very basic requirements from this customer, yet so much rigmarole was needed to get it set up.

manigandham · on Feb 17, 2020

Multi-cluster global load balancing isn't "very basic" and no other cloud is any easier.

AWS requires stacking the Global Accelerator on top of their ALB/ELBs but doesn't have smart routing across clusters and Azure only has Frontdoor which requires completely manual setup and has no backend integration.

ec109685 · on Feb 17, 2020

Deploying an app to more than one geographic region on earth should be very basic. For mostly read only sites, it is mean to make your users suffer extra latency and your reliability goes down.

The authors had to go read source code to figure out obscure annotations at one point to get container aware networking to work, which then wasn’t compatible with Google’s tools to setup global ingress.

manigandham · on Feb 17, 2020

That isn't very basic and never has been. If it's just a read-only site then all you need is a CDN, not load balancing with multiple regions running Kubernetes clusters.

What other cloud vendor or PaaS offers this? The only one that has seamless global deployment is Cloudflare's Workers but that's limited to Javascript code in a lightweight functions environment.

ec109685 · on Feb 27, 2020

They are making one part of this easier at least: https://twitter.com/vicnastea/status/1232751949117702145?s=2...

ec109685 · on Feb 17, 2020

I said mostly read-only. Like if you can run off of a MySQL master and replica.

It’s common enough they are trying to support it: https://cloud.google.com/blog/products/gcp/how-to-deploy-geo...

Their tool is at least three years old and still has this disclaimer: “Caution: The kubemci tool is a temporary solution intended to help users begin using multi-cluster Ingress. This tool will be replaced by an implementation using kubectl that delivers a more Kubernetes-native experience. Once the kubectl implementation is available, you will need to manually migrate any apps that use kubemci”

dna_polymerase · on Feb 16, 2020

The page greets you with:

> Kubernetes Accelerated Your Path to Enterprise Cloud Native

I have no idea what they want from me.

djohnston · on Feb 16, 2020

They want money!

LeonM · on Feb 16, 2020

Nah, it doesn't contain blockchain or serverless, so there is still some runway left.

manigandham · on Feb 17, 2020

They are product and feature names used by the GCP platform (and this article). It's perfectly accurate and descriptive.

vemacs · on Feb 16, 2020

This tool makes what the OP is trying to do much easier: https://github.com/GoogleCloudPlatform/gke-autoneg-controlle... you can have GKE configure the NEG on the BS directly, and not have to interrogate K8s for the name of the NEG to add to the BS via terraform.

whatsmyusername · on Feb 16, 2020

I've yet to see a compelling argument for the complication of kubernetes or the 'its not AWS' nature of GCP over a bog standard AWS ALB with ECS cluster tasks attached to it.

summarity · on Feb 16, 2020

Because k8s mostly prevents lock-in. A much better idea is to run EKS on Fargate. You still won't have to manage a cluster, but at least you can now use standard k8s manifests that work in any other cluster instead of the ECS homebrew stuff. You can keep ALB, because ALB can run as an Ingress Controller.

whatsmyusername · on Feb 18, 2020

EKS billing is hilariously bad (Managed SFTP is the same way). It's obvious they built it as a feature parity thing and they don't want you to actually use it.

I can run all of my container infrastructure for a property for the cost of the control plane for EKS (before you attach hosts and before you spend how much time figuring out why hosts won't attach to it). It's just bad.

whatsmyusername · on Feb 16, 2020

Lock-in on the cloud platform with >60% market share? I'm indifferent.

AWS is the VMWare of our age, you'll never get fired for suggesting it. There are workloads that are inappropriate for it (see: Dropbox) but those are few and far between.

auspex · on Feb 16, 2020

This seems to be a common rhetoric from people who have not invested the time in k8s. It is not as complicated as people think.

whatsmyusername · on Feb 16, 2020

I've used k8s at various times since it's inception. I've yet to see a compelling argument for running your control plane on the same layer as your workloads. I have always found this to be a recipe of disaster when 'something' in your service dependency chain breaks.

Also not having load balancer support built in kills it for me. Yeah you can do metallb or nginx ingress but it means they punted one of the major components needed to make it, 'cloud agnostic.'

Meanwhile in AWS we have ECS which works out the box on fargate, their EC2 hosts, or your own EC2 hosts and on hardware I can easily get (and dev) on docker swarm.

atombender · on Feb 17, 2020

Kubernetes splits the control plane from the workloads. You have to run Kubelet on the individual nodes, but it's in charge of node-local orchestration; sure, you could maybe invent something thinner that was controlled with just with SSH or something, but you can't really get away with some kind of controller on the node itself. Something has to start containers and monitor their status, for one.

There are lots of production-quality load balancer implementations for Kubernetes, such as Traefik, as well as operators that configure cloud-native load balancers (e.g. the one that configures GLBs on GCP). I don't see this as missing functionality. The ingress story isn't great, but the available options (Nginx being a very solid example of something tried and tested) are good enough that it's more a problem of standardization, not implementation.

Neither of those two points seem to sufficiently argue against using Kubernetes. Having used it for a few years now for all sorts of workloads, I would never go back to plain VMs, nor to something like ECS or Mesos. The only alternative I might entertain would be "serverless", but the available offerings don't seem to cover all the bases (e.g. batch jobs).

whatsmyusername · on Feb 18, 2020

Yeah I've used traefik (originally on Mesos though on Kubernetes as well). I mean it's okay, but I don't get security groups and I have to build stuff. I'm old. If you make me build commodity pieces and your competitor doesn't I'm going with the competitor.

ECS with Fargate behind ALBs talking to RDS/SQS/Elasticache with Scheduled Tasks as my cron layer is 99% of what we need without standing up a single host we have to maintain.

I put that in my calculator and it makes a happy face.

auspex · on Feb 19, 2020

Fargate is actually one of the most expensive ways to run a container from a pure cloud cost standpoint.

whatsmyusername · on Feb 19, 2020

Not if you take into account the fact that I don't have to maintain the hosts the jobs are running on (which is a not inconsiderable amount of time when you get HIPAA and PCI into the mix). Liberal application of AWS Savings Plans with 1 year no up front helps a lot too.

At scale I agree, however if you're a small company (or, in our case, 6 small companies) and don't have dedicated DevOps (or, in our case, have 1.5 DevOps people split 6 ways), it's fantastic.

je42 · on Feb 16, 2020

when I experimented with GCP, I found it weird that cloud-run would not work together with the GLB. That was really a surprise.

fulafel · on Feb 16, 2020

Why would GCP restrict the ports like this?

tn890 · on Feb 16, 2020

GCP has a lot of ports restrictions.

A place I used to work at migrated from GCP to AWS because they don't allow port 25 incoming; if you're doing e-mail, you 100% need this.

jkaplowitz · on Feb 16, 2020

They block it outbound, not inbound - email receiving should work fine in GCP. I believe that's the only mandatory firewall restriction, and even that one can have exceptions made for use cases such as your former employer, as the other reply to you said.

The point is to make sure that anything sending email from GCP has someone properly attending to it, so that GCP does not become a source of spam, rather than to prevent email service providers or the like from using GCP.

For many companies that aren't an email service provider, it's often a better use of IT funds to send via one anyway, given how much work is involved in maintaining security and deliverability of an outbound email server in 2020. Most of those support sending through ports that aren't 25, and Google has some nice free tier deals for GCP customers.

emanlin · on Feb 16, 2020

GCP will allow port 25 and smtp, just not by default. Talk to your TAM to make it happen.

jonstewart · on Feb 16, 2020

Is it web scale?

znpy · on Feb 16, 2020

CLOUD NATIVE DEVOPS RAFT-BASED BLOCKCHAIN-ENABLED MULTI CLOUD RESILIENT CONTAINERIZED MULTI-REGION MULTI-AZ static blog generator.

choward · on Feb 16, 2020

I'm sold. Link please.

hinkley · on Feb 16, 2020

Oh I’m so sorry, we were expecting “Paxos” for the correct answer. Unfortunately I cannot accept your answer.

bryanlarsen · on Feb 16, 2020

I downvoted this because even though I like it, it belongs at the bottom of the conversation below any real discussion of the actual article.

Aperocky · on Feb 16, 2020

Sarcasm is in the realm of discussion though, see 'Will Any Crap We Put into Graphene Increase Its Electrocatalytic Effect?'

https://pubs.acs.org/doi/10.1021/acsnano.9b00184

znpy · on Feb 16, 2020

Dude that's unfair. I cannot control the sorting of comments.

tyfon · on Feb 16, 2020

Personally I think it belongs at the top. When the article looks like it is generated by the buzzword generator, that should be the real discussion.

bryanlarsen · on Feb 16, 2020

It's quite a good article with a solution for a real problem. That it looks like a buzzword generator to HN says more about HN than it does about the article, IMO.

terminaljunkid · on Feb 16, 2020

WEBSCALE NODEJS MONGODB QUANTUM BULLSHIT USB-TO-PHP ADAPTER CHARGER BASED DETTOL HANDWASH KILLS 99% GERMS.