
AWS Network Load Balancer - jeffbarr
https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/
======
colmmacc
If you're curious to see NLB in action, here's a live demo:
[http://nlb-34dc3b430638dc3e.elb.us-
west-2.amazonaws.com/](http://nlb-34dc3b430638dc3e.elb.us-
west-2.amazonaws.com/) , it took about 5 minutes in the console to set it up
and no changes on the targets/backends.

Massive disclaimer: I work on NLB.

~~~
posnet
Are there any plans to add UDP support?

~~~
gtaylor
This would be great to have.

------
mooreds
I love the concept, because not being able to handle TCP traffic was one
shortcoming of the new ALB.

But that pricing model:

    
    
        Bandwidth – 1 GB per LCU.
        New Connections – 800 per LCU.
        Active Connections – 100,000 per LCU.
    

Would be nice to have it added to the simple monthly calculator:
[https://calculator.s3.amazonaws.com/index.html](https://calculator.s3.amazonaws.com/index.html)
but I had to read the FAQ to find out what those were:
[https://aws.amazon.com/elasticloadbalancing/faqs/](https://aws.amazon.com/elasticloadbalancing/faqs/)

~~~
runako
It looks like the deep linking to the LCU page doesn't work (you have to click
the tab for Network Load Balancer), so here's what an LCU is from that page:

\---

An LCU is a new metric for determining how you pay for a Network Load
Balancer. An LCU defines the maximum resource consumed in any one of the
dimensions (new connections/flows, active connections/flows, and bandwidth)
the Network Load Balancer processes your traffic.

------
syncerr
Seems to remarkably decrease latency (380ms -> 109ms). Running some tests:

    
    
        # ab -n 400 http://nlb-34dc3b430638dc3e.elb.us-west-2.amazonaws.com/
        Time per request: 108.779 [ms] (mean, across all concurrent requests)
    
        # ab -n 400 <public server via ELB>
        381.933
    
        # ab -n 400 <public server via ALB>
        380.632
    
        # (for reference) ab -n 400 https://www.google.com/
        190.536
    
        # (for reference) ab -n 400 https://sandbox-api.uber.com/health/
        107.680
    

If you're wiling to terminate SSL, this looks like it could be a solid
improvement.

~~~
geocar
I've seen a similar improvement using the EnableProxyProtocol policy, which
required a bit of code:

    
    
        Time per request:       88.400 [ms] (mean, across all concurrent requests)
    

versus public server via the regular HTTP proxy:

    
    
        Time per request:       415.859 [ms] (mean, across all concurrent requests)
    

For reference:

    
    
        $ ab -n 400 https://www.google.com/
        Time per request:       168.438 [ms] (mean, across all concurrent requests)

------
paulddraper
Static IP, source IP, and zonality are game changing.

Unfortunately, it lacks a very significant existing feature of ELB: SSL/TLS
termination. It's very convient to manage the certs in AWS without having to
deploy them to dedicated EC2 instances.

~~~
9point6
It won't ever be possible to do this as the NLB runs a few network layers
below where TLS runs

~~~
paulddraper
Does it? It has HTTPS health checks.

~~~
9point6
I expect the health-checking happens independently of the routing. The routing
will just act upon a list of routes which is modified independently by the
health check.

------
g09980
This was discussed a couple of times recently, but answers seemed
contradictory. ELB requires pre-warming if you expect sudden high load. But do
ALB and NLB?

[1]
[https://news.ycombinator.com/item?id=15085863](https://news.ycombinator.com/item?id=15085863)

[2]
[https://news.ycombinator.com/item?id=14052079](https://news.ycombinator.com/item?id=14052079)

~~~
colmmacc
Each NLB starts out with several gigabits of capacity per availability zone,
and it scales horizontally from there (theoretically to Terabits). That's more
capacity than many of the busiest web-sites and web-services in the world
need.

If you expect an instantaneous load of more than about 5Gbit/sec, in those
situations we work directly with customers via AWS Support. We really try to
understand the load, make sure that the right mechanisms are in place. At that
scale, our internal DDOS mitigation systems also come into play. (It's not a
constraint of NLB).

The load test in the blog post was done with an NLB, and was done with no pre-
provisioning or pre-warming and allowed us to get to 3M RPS and 30Gbit/sec,
which is when we exhausted the capacity of our test backends.

ALBs start out with less capacity, and are constrained more by requests than
bandwidth. I don't have a precise number because it depends on how many rules
you have configured and which TLS ciphers are negotiated by your clients, but
the numbers are high enough that customers routinely use ALBs to handle real-
world spiky workloads, including supporting Super Bowl ads and flash sales.

Each ALB can scale into the tens of gigabits/sec before needing to shard. ALB
also has a neat trick up its sleeve: if you add backends, we scale up, even if
there's no traffic. We assume the backends are there to handle expected load.
So in that case it has "more" capacity than the backends behind it. That goes
a long way to avoiding some of the scaling issues that impacted ELB early in
its history.

If you have a workload that you're worried about, feel free to reach out to me
and we'll be happy to work with you. colm _AT_ amazon.com.

~~~
vacri
> _[upgrading ELB], in those situations we work directly with customers via
> AWS Support._

This is painful, though. I don't know about Sep 2017, but in 2016 upscaling
your ELB involved answering a block of 21 or so questions in a list only given
to you after you engage with support, which had some pretty esoteric items on
it. It was decidedly un-AWS-ey.

~~~
colmmacc
It was definitely un-AWS-ey, and there's an almost visceral pain response on
our faces when we don't have a self-service API for something. This is
improving all of the time and I think is already much better, with some big
specific improvements ... I'll do my best to share what I can here.

First things first, with NLB our experience is that pre-warming is never
necessary. Each NLB starts out with a huge volume of capacity, beyond the
needs of even the largest systems we support, and each NLB can theoretically
scale to terabits of traffic.

Our first big improvement for ALB was that the basic scalability and
performance of ALB is at a point that in all but a very small number of cases
(think of some of the busiest services in the world), customers don't need to
do anything. This is the pay-off from a lot of hard work focused on low-level
performance.

Our second big improvement, for both ALB and Classic ELBs, was a mix of more
generous capacity buffers, and pro-active and responsive scaling systems.
Together, these mean that we can race ahead of our customers load
requirements.

Another item that's helped is that for the truly big scaling cases, which is
DDOS preparedness, we now have the AWS Shield service to manage that process
in consultation with the customer. That's useful if your needs are more
nuanced and custom than the DDOS protection that is included with ELB by
default. This gets into things such as how your application is configured to
handle the load.

With all of these improvements, ALB does not require pre-warming for the vast
majority of real-world workloads. However, after years of pre-warming as a
thing, we have customers who have incorporated it into their operational
workflows, or who rely on it for extra peace of mind. We do want to continue
to support that for our customers.

~~~
vacri
That's all great news to hear. Thanks for the informative response :)

------
deafcalculus
How does failover across zones work?

The blog post says there's one static ip per zone. I suppose www.mydomain
should have multiple A records each pointing to an elastic ip in a zone. What
happens when one zone entirely fails? Does it need a DNS change at this point?
Or does the NLB have a different IP with which it can do BGP failover?

~~~
krallin
AWS provides you with a number of DNS records for each NLB:

\- One record per zone (which maps to the EIP for that zone) \- A top-level
record that includes all active zones (these are all zones you have registered
targets in, IIRC)

The latter record is health checked, so if an AZ goes down, it'll stop
advertising it automatically (there will be latency of course, so you'll have
some clients connecting to a dead IP, but if we're talking unplanned AZ
failure, that's sort of expected).

That said, this does mean you probably shouldn't advertise the IPs directly if
you can avoid it, yes.

(disclaimer: we evaluated NLB during their beta, so some of this information
might be slightly outdated / inaccurate)

~~~
deafcalculus
Won't DNS failover be painfully slow? Some clients ignore small TTL values.
I've seen DNS updates taking several hours to propagate.

I thought one of the advantages of multiple zones is that zonal failover can
happen with "zero" downtime (this seems to be the case with Amazon RDS).

~~~
colmmacc
The default answer includes multiple A records, so if clients can't reach one
of the IPs, they try another. There's no need for anything to propagate for
that to kick in, it's just ordinary client retry behavior.

We do also withdraw an IP from DNS if it fails; when we measure it, we see
that over 99% of clients and resolvers do honor TTLs and the change is
effected very quickly. We've been using this same process for www.amazon.com
for a long time.

Contrast to an alternative like BGP anycast, where it can take minutes for an
update to propagate as BGP peers share it with each other in sequence.

------
gregmac
I have just finished setting up a new front-end for a few services (we are
just about to start migrating production systems to it).

I was aiming to use static IPs (for client firewall rules), and simplify
networking configuration, so what I ended up with is an auto-scaling group of
HAProxy systems that run a script every couple of minutes to assign themselves
an elastic IP from a provided list. Route 53 is configured with health checks
to only return the IP(s) that are working.

The HAProxy instances also continuously read their target auto-scaling groups
to update backend config, and do ssl terminating, also running the Let's
Encrypt client. Most services are routed by host name, but a couple older ones
are path-based and there are some 301 redirects.

I think NLB could replace the elastic IP and route53 part of this setup, but
I'd still need to do SSL, routing, and backends. It's too bad, because my
setup is one that could be used nearly anywhere that has more than one public-
facing service, but there's not much built-in to help - I had to write quite a
few scripts to get everything I needed.

~~~
gnur
Have you tried/evaluated traefik? It sounds like it could do nearly everything
you just mentioned.

------
tjholowaychuk
Awesome! I have the perfect project for this haha. Does it still work with ECS
integration?

~~~
Thaxll
Yes it does ( target group integration )

------
GeneticGenesis
No chance that I'll jump into another new load balancer product from Amazon
any time soon. ALB has significant deficiencies that AWS don't warn you about,
and you only find then at tens of thousands of RPS.

Still waiting on that fix, AWS.

~~~
newhere420
If they won't warn us, could you please warn us? \- Fellow ALB user.

~~~
GeneticGenesis
Sure. Whenever a "config change" (Note: this includes adding or removing
targets to a target group, EG Autoscaling) happens on an ALB, the ALB drops
all active connections, and re-establishes them at once, at high load, this
obviously causes significant load spikes on any underlying service.

You can see this happening by looking at the "Active Connection Count" graphs
from your ALB, and adding or removing an instance from an ASG.

At 30+GBPS and over 20kRPS, removing one instance can cause absolute chaos.

~~~
colmmacc
Wow that sounds awful - but thankfully this isn't typical. I'm going to go
digging for a case-id/issue and see what's going on myself (please e-mail the
case if you have one). Re-configurations are routine and graceful.

From your description it may be that you have long lived connections that
build up over time, at a rate that targets can easily handle, but that the re-
connect spikes associated with a target failure/withdrawal are too intense.
This is a challenge I've seen with web sockets: imagine building up 100,000
mostly-idle web sockets slowly over time, even a modest pair of backends can
handle this. But then a backend fails, and 50,000 connections come storming in
at once!

Another scenario is adding an "idle" target to a busy workload, but it not
being able to handle the increased rate of new connections it will get.
Software that relies on caching (including things like internal object caches)
often can handle a slow ramp-up, but not a sudden rush.

We're currently experimenting with algorithms that allow customers to more
slowly ramp-up the incoming rate of connections in these kinds of scenarios.

Anyway, those are guesses, so I may be wrong about your case, but hopefully
the information is still useful to others reading.

------
dfischer
This is perfect. Need something like this to load balance A records for
dynamic domains off apex. We can most likely use the static IP address
perfectly for this.

Also did I read it wrong or this is actually cheaper than ALB?

Awesome! Can't wait to dig in more!

~~~
davidbrownct
Yes, NLB is priced the same as ALB hourly but 25% cheaper on bandwidth (LCUs).

------
mark242
I don't understand the pricing model. 800 new connections per hour, for
$0.006? Isn't that extremely expensive? 80,000 connections for $0.60 in an
hour is $432 per month for not a whole lot of traffic.

edit: Okay, it's 800 new connections per second, per the ELB pricing page,
under "LCU details". The cost for 80k connections in an hour is effectively
constrained by the bandwidth, eg if there's very low bandwidth it's
$0.006/hour or $4.32/month.

------
kadiyala
I was just wondering if this is something purely developed inside amazon or is
it backed by an ADC like NetScaler or F5. does anyone know any detail ? I'm
assuming that classic load balancer is some third-party or old framework and
this is something amazon developed internally.

~~~
Corrado
I would assume that it's something developed internally at Amazon. Networking
inside of AWS isn't standard fare and I doubt something like NetScaler or F5
products would be able to be used. Generally speaking, they aren't using
TCP/IP behind the curtain, to move packets between nodes. AWS has even created
their own routing hardware/software because no other company could do what
they need at the scale that they need. See this video for more information:
[https://www.youtube.com/watch?v=St3SE4LWhKo](https://www.youtube.com/watch?v=St3SE4LWhKo)

~~~
discodave
AWS & Amazon uses a LOT of routers from one of these vendors. Although, they
would probably try and avoid baking it into a public-facing product like this.

~~~
ckozlowski
Not necessarily: [https://www.geekwire.com/2017/amazon-web-services-secret-
wea...](https://www.geekwire.com/2017/amazon-web-services-secret-weapon-
custom-made-hardware-network/)

------
irl_zebra
It's not showing up in GovCloud. This stuff always takes longer for that.

------
Thaxll
When is http2 coming to ELB ... :/ It should be their #1 priority.

~~~
colmmacc
ELB's Application Load Balancer (ALB) supports HTTP/2 termination. More in the
original launch announcement: [https://aws.amazon.com/blogs/aws/new-aws-
application-load-ba...](https://aws.amazon.com/blogs/aws/new-aws-application-
load-balancer/) and
[https://aws.amazon.com/elasticloadbalancing/details/#details](https://aws.amazon.com/elasticloadbalancing/details/#details)

~~~
Thaxll
http2 termination is not very useful, most people want ELB -> backend
instances.

~~~
seanp2k2
in case anyone is confused about this sub-thread: """ Application Load
Balancers provide native support for HTTP/2 with HTTPS listeners. You can send
up to 128 requests in parallel using one HTTP/2 connection. The load balancer
converts these to individual HTTP/1.1 requests and distributes them across the
healthy targets in the target group using the round robin routing algorithm.
Because HTTP/2 uses front-end connections more efficiently, you might notice
fewer connections between clients and the load balancer. Note that you can't
use the server-push feature of HTTP/2\. """ (
[http://docs.aws.amazon.com/elasticloadbalancing/latest/appli...](http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-
balancer-listeners.html#listener-configuration) )

------
inertial
Feature request : Please allow weighted load balancing i.e. ability to
distribute traffic in a user specified ratio (weights) to different sized
instances.

~~~
bashtoni
Better: distribute to the instance with the lowest % CPU by default, ala the
Google Cloud NLB.

~~~
colmmacc
This is what ELB Classic did for a long time - but we're experimenting with
new algorithms. The problem with this approach is that it's not cache
friendly.

When you bring in a new server to a busy workload, it gets all of the new
connections. That can put a lot of pressure on that host - and if the
application relies on caching, which most do in some form, it can really make
performance terrible and even trigger cascading failures.

Another problem is that if a single host is broken or misconfigured, throwing
500 errors, it is often also the fastest and lowest CPU box, because that
isn't very expensive. It can suck in and blackhole all of the traffic.

Based on how these issues work out at scale, we've moved beyond simple load
based load balancing (I know that sounds counter-intuitive) and into
algorithms that try to achieve a better balance for a wider range of
scenarios.

------
shampster
I was so excited before I realized this wouldn't terminate http(s) traffic. An
IP anycast based load balancer ALB would be nice.

------
Cieplak
I wonder what they wrote it in. I'd guess C++, Java or Erlang, or a
combination of those.

~~~
takeda
Chances are it's IPVS[1], maybe with some patching.

Why would one reinvent the wheel when Linux already lets you do that.

In that case it would be C since it's implemented in the Linux kernel.

[1]
[https://en.wikipedia.org/wiki/IP_Virtual_Server](https://en.wikipedia.org/wiki/IP_Virtual_Server)

~~~
takeda
I should have read the blog instead of skimming before responding.

This appears as Layer 4 load balancing, IPVS is more of Layer 3.

So the person who was voted down by mentioning HAProxy might not be too far
off. It could be implemented through HAProxy + TPROXY enabled in the
kernel[1]. Then just make sure that default gateway configured on targets
routes back to the load balancer or it is the load balancer.

[1] [https://www.loadbalancer.org/blog/configure-haproxy-with-
tpr...](https://www.loadbalancer.org/blog/configure-haproxy-with-tproxy-
kernel-for-full-transparent-proxy/)

~~~
bogomipz
>"This appears as Layer 4 load balancing, IPVS is more of Layer 3."

No, IPVS is L4 load balancing. L3 load balancing would be a routing protocol
plus ECMP.

~~~
takeda
Ah my bad, even the wiki link I posted said it is layer 4. I guess both
solutions could be utilized then, perhaps it was IPVS then since it would be
more performant than HAProxy.

------
sgs1370
This will motivate me to get everything into VPCs, which I should have done a
while ago.

------
hexsprite
so would a websocket-based application be better off using NLB?

~~~
zob_cloud
Better off depends on what your workload goals are. If you want path or host
name based routing, Application load balancer may be a better fit as it
natively supports WebSockets. If your goal is long lived sessions (weeks and
months, not minutes and hours), Network load balancer is probably a better
fit.

------
baccredited
Does it support ipv6?

------
dlhavema
any idea if you can use this behind API gateway if it is not itself public?

~~~
muddley
yes of course. The NLB is public and API GW sits on the edge locations

------
sigi45
nice. Especially the ip thing.

