
TLS Termination for Network Load Balancers - el_duderino
https://aws.amazon.com/blogs/aws/new-tls-termination-for-network-load-balancers/
======
luhn
They don't mention it in the article, but TLS comes at an additional cost.
Same price per hour and per LCU, but an LCU gets you less with TLS. Standard
LCU:

\- 800 new non-TLS connections or flows per second.

\- 100,000 active non-TLS connections or flows (sampled per minute).

\- 1 GB per hour for EC2 instances, containers and IP addresses as targets.

TLS LCU:

\- 50 new TLS connections or flows per second.

\- 3,000 active TLS connections or flows (sampled per minute).

\- 1 GB per hour for EC2 instances, containers and IP addresses as targets.

[https://aws.amazon.com/elasticloadbalancing/pricing/](https://aws.amazon.com/elasticloadbalancing/pricing/)

~~~
NotAnEconomist
Just because I found it interesting:

\- 16x cost for new connections or flows per second.

\- 32x cost for active connections or flows.

\- 1x cost for traffic carried in GB per hour, which you'd expect.

------
illumin8
There is a great mini-thread by a principal engineer at AWS describing why
this is so amazing:
[https://twitter.com/colmmacc/status/1088510453767000064](https://twitter.com/colmmacc/status/1088510453767000064)

------
clon
I was pretty excited until I saw the screenshot where you would have have to
choose the certificate.

If you run a multi-tenant system that services N "vanity" domains (where N is
around the same number as your users on a $5/month plan), there is still no
service in AWS to do transparent TLS termination for a reasonable cost. Which
is a pity, since it really costs almost nothing to generate these
certificates.

~~~
helper
We use wildcard certs (*.example.com) so each customer can have vanity-
customer-name.example.com domains. I think our model is fairly common for
multi-tenant domain name segregated systems.

~~~
clon
We do that as well for the cheaper packages. But serious users insist on a
our-service.client.com "vanity" scheme, which is easy enough with CNAME
records, apart from TLS that adds significant cost.

For example, Application Load Balancers (ALB) come with a limit of max 25
certificates, which is non-starter for us. So you cannot avoid terminating
your TLS in nginx/caddy etc (it also needs to be HA of course) and then hit
another LB that stands in front of the actual service. You end up with a
2-layer LB architecture that adds cost and complexity.

~~~
web007
Can you bundle your certificates the way Cloudflare does? IIRC if you connect
to a CF endpoint you will get a certificate with N arbitrary hostnames that
are being served by that endpoint. I don't know what the SNI hostname limit
is, but it's probably stupidly high. Multiply _that_ by 25 and it may be
tenable?

~~~
prdonahue
You really want to avoid putting too many SANs on a certificate as i) renewal
breaks much more frequently; ii) certificate size increases/negatively affects
performance due to fragmentation and; iii) browsers many lock up (if you like
to live dangerously, try loading
[https://10000-sans.badssl.com/](https://10000-sans.badssl.com/) for example).

The biggest operational headache by far is on renewals. If one customer on a
commingled certificate adds a CAA record that doesn't include your CA of
choice (or more commonly, the customer churns and no longer points to you),
your renewal fails. If you're running a SaaS business you really want one
certificate per hostname that's lazy loaded based on the incoming SNI. This
keeps renewal failures and support costs down, and keeps customers from seeing
a list of their competitors on their certificate.

As @elithar points out, that's why we built SSL for SaaS. You make a single
API call with the name of the hostname you want a certificate issued for,
indicate the validation method (HTTP ./well-known is by far the "happiest
path"), and then in about a minute you've got a certificate deployed
worldwide. You tell us where to route traffic back to by providing a default
origin (which can be a load balancer) and optionally overriding it on a per-
hostname basis.

So long as your customer is CNAME'ing to your domain and that domain resolves
to Cloudflare's edge, we can automatically complete—and keep completing—domain
control validation (DCV). We then issue two certificates per hostname: one
P-256 keyed, SHA-2/ECDSA signed certificate that gets presented to modern
browsers[1] and one RSA 2048-bit, SHA-2/RSA signed certificate that gets
presented to browsers that don't support ECC.

1 - [https://blog.cloudflare.com/tls-certificate-optimization-
tec...](https://blog.cloudflare.com/tls-certificate-optimization-technical-
details/)

~~~
clon
Issuing SAN-s with multiple tenants on the same certificate, even if the
substantial technical problems were overcame, would make our clients reach for
pitchforks and torches.

> You make a single API call with the name of the hostname you want a
> certificate issued for ... [magic TLS things happen]

This, precisely, is how the ALB _should_ work, without a silly 25-certificate
limitation. Sounds like an excellent service.

I wonder what is the cost of issuance for a TLS certificate. There is minimal
storage/network load, some computations (likely in hardware). Perhaps the main
cost is the collection of sufficient entropy?

------
jamescun
> This will free your backend servers from the compute-intensive work of
> encrypting and decrypting all of your traffic

[https://istlsfastyet.com/](https://istlsfastyet.com/)

------
weitzj
That’s great news.

Does anybody know if they support ALPN to announce h2 as the http2 protocol?

They don’t do this for ELBs, I would hope for NLBs.

If they do this, this would finally be the sane solution how to use NLBs + ACM
to expose gRPC services at the edge.

Other solution might maybe be (as popping up here in the threads):

Use NLB + ACM and let it do MITM and then forward the traffic to a service
with a self signed certificate. The hope here would be that the NLB would pick
up the ALPN header from your backend service and communicate it transparently
to the outside world. Don’t know if this architecture makes sense.

Anyways, I just would love gRPC capable secure workloads on the NLB with ACM.

Doing Self signed certificates or LetsEncrypt works already - the Gamechanger
is to use ACM + ALPN and by that also having upstream http2 traffic behind the
NLB (compared to the ALB, which only supports upstream http1)

------
scarface74
Unfortunately, this still wouldn’t have solved my biggest pain point at the
last company I worked for. You can’t use TLS termination at the load balancer
when you require HIPAA Compliance.

If they had some kind of way to use ACM generated SSL certs on the VM _and_
had a method for autorenewal that would be ideal.

~~~
nkrumm
Encryption is an "addressable implementation" in the final Security Rule [1].
Practically this means you are _not_ required to encrypt your data (e.g.,
between LB and VM) "if the entity decides that the addressable implementation
specification is not reasonable and appropriate [...]".

[https://www.hhs.gov/hipaa/for-professionals/faq/2001/is-
the-...](https://www.hhs.gov/hipaa/for-professionals/faq/2001/is-the-use-of-
encryption-mandatory-in-the-security-rule/index.html)

~~~
jontro
And also it looks like it supports encrypted traffic between the LB and the
instance as shown in the guide.

~~~
e1g
AWS LB does not validate backend certificates, so you can put a self-signed
cert on the instance. Heck, even if the cert expires it will still work, and
make LB<->EC2 connection technically encrypted. Yay compliance.

~~~
xyzzy123
You can actually turn this on (for classic ELBs), but it locks to a specific
cert (rather than a CA). So yeah no one cares about expiry but the backend
does have to present that cert. The thing to look for is "Enable backend
authentication".

I would question whether this is a problem though; basically if someone is in
a position to MITM traffic in your AWS VPC this would indicate a compromise of
AWS at a fundamental level (or loss of your AWS control plane).

~~~
e1g
AWS does not encrypt internal traffic, including traffic between Availability
Zones. AZ's are spread over several datacenters, so your VPC traffic
(RDS/microservicers etc) travels unencrypted across multiple physical
locations. I consider AWS network assurances sufficient so it's not a problem
for our standard threat model, but the auditors got their checkboxes to
tick...

~~~
ec109685
Do you have a citation that data travels among data centers in aws
unencrypted? At least between regions, it is encrypted:
[https://aws.amazon.com/blogs/aws/new-almost-inter-region-
vpc...](https://aws.amazon.com/blogs/aws/new-almost-inter-region-vpc-peering/)

------
azinman2
SSL added and removed here :)

------
philsnow
> After choosing the certificate and the policy, I click Next:Configure
> Routing. I can choose the communication protocol (TCP or TLS) that will be
> used between my NLB and my targets. If I choose TLS, communication is
> encrypted; this allows you to make use of _complete end-to-end encryption in
> transit_

Wait. That is not what "end to end" means. This is more like "piecewise end-
to-end", and it is _not_ the same thing as "end to end".

Now, people are going to say that "it doesn't matter since the only party
you're disclosing the comms to is AWS, who you already implicitly depend on
because you're running all your stuff under their hypervision". That's true.
(ofc now there are 2 places in AWS's infra where the comms are in plaintext,
so now an adversary has two teams of engineers/opsen to social engineer /
compromise, and will win if they break either one).

I'm reacting to the dilution of the term "end to end" here, because it's a
vital concept.

------
DGAP
This further reinforces the pattern of terminating TLS at the LB. While this
is generally justifiable, it does decrease defense-in-depth.

~~~
scarface74
What’s the threat model for someone intercepting traffic between a load
balancer and an EC2 instance?

~~~
zokier
That is generally AWS stance on the matter too; VPC is considered secure
enough. I remember reading something about it in their own blog, but now I
could only find this where it is explained:

[https://kev.inburke.com/kevin/aws-alb-validation-tls-
reply/](https://kev.inburke.com/kevin/aws-alb-validation-tls-reply/)

~~~
scarface74
I’ve watched the reinvent videos where they describe the custom hardware NICs
they use with their own custom ARM chips to ensure security and that traffic
isn’t spoofed.

------
atombender
So AWS didn't have TLS termination until now?

I'm still waiting for GCP to support more than ten (!) certs per load
balancer, which seems like a ridiculous limit when you consider the need to
serve dozens or hundreds of low-traffic customer domains on a single domain.
If you're on Kubernetes, they're basically asking you to split your ingresses
up just to avoid the limit.

We ended up going straight to running our own in-cluster load balancer
(Traefik, which is _meh_ , but works okay) with Let's Encrypt so we don't need
to provision anything at all. It's so much nicer than fiddling with manual
cert registration. I really wish cloud providers such as Google would get on
board with Let's Encrypt already.

~~~
briffle
We use cert-manager to get letsencrypt certs in GKE, and automatically push
them to the Google global load balacer's.

[https://github.com/jetstack/cert-manager](https://github.com/jetstack/cert-
manager)

~~~
atombender
Doesn't that still require that you create a maximum of 10 host rules per
ingress? That makes it harder to automate things.

With the current system, we can use one consistent IP or CNAME for all new
domains. With the 10-certs-per-GLB limitation, we'd have to manage the DNS
accordingly so that all the domains are correctly spread out among the N
different ingress GLBs.

At the time we started with Kubernetes, cert-manager was listed as alpha
quality and not recommended for production. Even today, it seems Gandi (which
we use for DNS) still isn't supported; at least it's not in the documented
list [1]. LEGO supports Gandi, so I'm not sure if maybe the documentation is
wrong here.

[1] [https://docs.cert-
manager.io/en/latest/reference/issuers/acm...](https://docs.cert-
manager.io/en/latest/reference/issuers/acme/dns01/)

~~~
JMTQp8lwXL
I don't think this limitation applies when using Kubernetes with cert-manager.
When you create Kubernetes Service API objects of type LoadBalancer, you get
(by default) get a TCP load balancer on GCP.

SSL termination becomes the responsibility of the cluster. The certificate
(and private key) is stored inside the cluster, too, via secrets. To have
cert-manager automatically create and renew certificates, all you have to do
is update your ingress host/tls YAML configuration.

Using cert-manager, you can continue exposing your app under many domain names
with a single IP (via an A record) or abstractly through a CNAME record.

I still haven't added more than 10 domains to manually verify your concern,
however. But Kubernetes isn't making any changes to my load balancer as I've
added domains. I would re-visit to see if cert-manager is right for your use
case: I noticed no mention of it being alpha quality on the README. However,
they do point out it is 0.x and there could be breaking changes to the API
later.

~~~
atombender
So the method you describe ends up using a TCP GLB, but the point of using an
HTTP GLB is to enjoy all the benefits of the GLB.

With a HTTP GLB, you get a very cheap distributed, effectively global CDN that
doesn't require special configuration. You also get features like health
checking, logging (which you can pipe to BigQuery), Google's "just works out
of the box" edge caching (negating the need for an external caching CDN like
Fastly), and so on.

My understanding is that the typical use case is to run cert-manager together
with ingresses, where cert-manager's will allocae certs and create Kubernetes
secrets for each. If you wire up ingresses with those secrets, and you use the
"gce" or default ingress class, then you end up getting TLS termination at the
GLB level, very easily.

However, because of the 10-certs-per-LB, you run into the problem I described
before. There's no way around it except to create ingresses that contain a
maximum of 10 hosts. (It's a hard limit, not a quota; you can't petition to
have the limit increased.) If you have 60 domains, that necessarily means 6
GLBs, and 6 different IPs to point DNS to.

We're using Traefik today as a custom ingress controller with a TCP GLB. So we
get one external IP, and Traefik handles TLS termination (via Let's Encrypt).
So this is effectively the same as running cert-manager plus something like
the Nginx ingress controller.

~~~
JMTQp8lwXL
Yeah, I should have clarified in my case I am using the nginx ingress
controller, but Traefik would also suffice.

------
lenova
Dumb question, but has the industry decided on what they want to call
TLS/HTTPS? Majority of people I talk to still refer to them as SSL certs...

~~~
atombender
Old habits die hard.

The current standard is TLS, which superceded SSL back in 1999. Uptake and
awareness was initially slow, and SSL itself wasn't deprecated until _2015_.
It doesn't help that a bunch of projects (OpenSSL being a prominent one) still
have "SSL" in their name.

Technically, they're not "SSL certs". They're X.509 certs. Certificates show
up in many places unrelated to TLS/SSL.

~~~
tialaramex
Well, the things people want aren't (generally) just X.509 certificates.

If you expect your certificates to work in common client software like web
browsers or some random Python client code a third party is writing then
you're going to want:

* PKIX, currently RFC 5280 plus revisions, the Internet's chosen standard for how X.509 should be implemented. PKIX says a bunch of things about what you should or should not write in the X.509 certificate, you will probably have better luck conforming as much as possible even if you don't care about:

* The Web PKI. A Public Key Infrastructure has Certificate Authorities (plenty of those across all of X.509 or you can roll your own) but they're trusted by Relying Parties (parties who are _relying_ on the certificate's attestation to be true) and you probably want certificates that will be trusted by most RPs. Grandma doesn't know what the X.500 directory system is, or who a Certificate Authority is, but she does use Safari, and Safari trusts certain CAs on her behalf as does macOS. So you're going to want certs from one of those CAs. The Web PKI is a loose name for (despite that word "Web") the PKI covering SSL/TLS services on the Internet.

* You may want to get more specific. Although the Baseline Requirements which say roughly how a Certificate Authority should work are agreed across industry, each of the Major Trust Stores has their own additional rules. You probably care about all of them. They roughly correspond to operating system vendors. Microsoft and Apple (for their browsers and operating systems), then Mozilla (for Firefox on all platforms, and for Free Unix systems), Google (mostly just for Android, not Chrome), and then runners up Oracle (for Java), then a long tail of people including Nintendo and all those crappy in-car entertainment systems people...

For example, probably fifty people in the whole world have ever bothered
trying to use the Web from a Nintendo WiiU (not to be confused with the
popular Wii) console. If they try now though lots of things don't work.
Because Let's Encrypt isn't trusted by the WiiU, and since it's an end-of-life
product, probably never will be.

But "SSL Certs" gets across what you mean pretty well, only pedants are going
to insist it's wrong, and unless you're currently playing "Um, Actually" they
should cut it out.

------
barbadosmercury
How do they ensure the target machines see the original IP address and port?

~~~
ec109685
They encapsulate the packet, including data about the original IP address, as
it is sent to the hypervisor on the ec2 instance. Then the hypervisor reverses
the process and creates a packet with the original IP address as a source when
it is forwarded into the vm.

