Hacker News new | comments | show | ask | jobs | submit login
AWS Network Load Balancer (amazon.com)
287 points by jeffbarr 5 months ago | hide | past | web | favorite | 118 comments

If you're curious to see NLB in action, here's a live demo: http://nlb-34dc3b430638dc3e.elb.us-west-2.amazonaws.com/ , it took about 5 minutes in the console to set it up and no changes on the targets/backends.

Massive disclaimer: I work on NLB.

Are there any plans to add UDP support?

This would be great to have.

Ditto. UDP support would be great.

It sounds like NLB passes through source IP - does that mean outbound flows are through the IGW?

There must be some magic happening somewhere, because otherwise outgoing packets would have the wrong source address.

You can use direct server return to manipulate the Ethernet frames so that packets don't travel back through the load balancer on the way to the parent switch.

That generally requires config on the serving hosts, which wasn't mentioned in the setup. I think I saw a reference to adding hosts with a different port number than the service port as well. For people in EC2-VPC (not classic), all their traffic is going through an Amazon NAT anyway, perhaps this new service is setting up translations there. (Note all the references to VPC, and never a mention of EC2-classic)

Direct Server Return works at layer 3 not layer 2, its routing and encapsulation - IP in IP, GRE etc.

You can do it at layer 2 as well, but it requires that the load balancer have an interface on the same broadcast domains as the hosts.

I am confused at how this would work. Can you elaborate? Also broadcast domain is a layer 3 construct.

Any idea how long for GovCloud?

Do you have a support POC? I'd reach out to them as they should be able to provide you with a roadmap update. If you're not sure who that is, you can reach out to me at kozlowck at amazon.com.

Hello fellow GovCloud user, would love to compare notes if you're game. je@h4x.club :)

I got to say, asking about GovCloud and with that email address your post sounds like a terrible spear phishing attempt :)

Pretty sure that's the entire point :)

Was it? neom has posted that email in the past.

Ain't no dark arts over here. Just a buddy trying to fix cities. <3

Is there any intent to add TLS termination? That’s a dealbreaker for us switching from the classic load balancer. Otherwise this looks really awesome, thanks!

I don't think they can add TLS termination because of the way it's implemented. NLB runs on Layer 4 - the transport layer where TCP/UDP run on. TLS technically runs on top of the transport layer.

That’s kind of the answer I was expecting, just hoping it wasn’t the case. From the marketing material they really want you to move, but not having a solution to offload tls makes it impossible for us. And it worries me to see the CLB getting effectively deprecated with it an alternative

ALB can term TLS for h2 and wss: https://aws.amazon.com/elasticloadbalancing/ sounds like that's what you might want?

Unfortunately we are currently on a custom TCP-based protocol (we're in the game space). But yes, this is more incentive for us to consider h2 or wss.

I'm hopeful AWS will follow this up with ACM supporting SSL certs on instances, so you can run a LetsEncrypt equivalent on each instance, providing TLS end to end encryption

Huh? It's no different than ELB.

There is no Security Group for NLB, how is that reasoned?

that threw me for a bit of a loop as well. This means that responsibility for doing ACL whitelisting at the edge is now moved from the actual edge, to the security groups on the actual servers responding to request, right?

That's do-able and all, but I kind of didn't hate the old paradigm of having an extra layer there.

One way to think of NLB is that it's an Elastic IP address that happens to go to multiple instances or containers, instead of just one. Everything else stays the same.

Yeh, it's easy to use it like that for now. I hope they update it later on though. Seems like an missing feature in their otherwise nice firewall rules setup.

Demo page states "Your browser may keep a connection open for a few seconds and re-use it for a reloaded request. If it does, you'll get the same target", but when I attempted to abuse the power of F5, I was alternated between ice cream and bumblebee.

If you are going to look at it, attempt time - ~04:50 UTC, remote address from network

Same, got a different one each time.

Browsers typically use a few connections to load a page so that it can load faster. Each of those threads has a different source port, and thus may route to a different target. In Colm's demo, it depends which thread your browser uses when requesting the part including the CSS which decorates the object. In my Chrome on Mac I see 6 TCP connections to the demo NLB


Each of those will be routed to the same target, so it's up to your browser to decide which to use for what.

is there a cloudformation template for this demo that i can look at?

I doubt. Unfortunately, CloudFormations lags severely behind...

CloudFormation, CodeDeploy, and ECS all support NLB today :) I used the console to create the simple demo though, so I don't have a template to recreate it. Sorry!

Good as I'm tired spending days writing Customer Resources!

CloudFormation provides support for Network Load Balancer.

Check the 'Type' property mentioned in the CloudFormation templates reference in public documentation http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuid...

I love the concept, because not being able to handle TCP traffic was one shortcoming of the new ALB.

But that pricing model:

    Bandwidth – 1 GB per LCU.
    New Connections – 800 per LCU.
    Active Connections – 100,000 per LCU.
Would be nice to have it added to the simple monthly calculator: https://calculator.s3.amazonaws.com/index.html but I had to read the FAQ to find out what those were: https://aws.amazon.com/elasticloadbalancing/faqs/

It looks like the deep linking to the LCU page doesn't work (you have to click the tab for Network Load Balancer), so here's what an LCU is from that page:


An LCU is a new metric for determining how you pay for a Network Load Balancer. An LCU defines the maximum resource consumed in any one of the dimensions (new connections/flows, active connections/flows, and bandwidth) the Network Load Balancer processes your traffic.

I often feel the simple monthly calculator is largely neglected by AWS.

A large number of services are not even featured on the calculator as an option (e.g. Lambda)

Seems to remarkably decrease latency (380ms -> 109ms). Running some tests:

    # ab -n 400 http://nlb-34dc3b430638dc3e.elb.us-west-2.amazonaws.com/
    Time per request: 108.779 [ms] (mean, across all concurrent requests)

    # ab -n 400 <public server via ELB>

    # ab -n 400 <public server via ALB>

    # (for reference) ab -n 400 https://www.google.com/

    # (for reference) ab -n 400 https://sandbox-api.uber.com/health/
If you're wiling to terminate SSL, this looks like it could be a solid improvement.

I've seen a similar improvement using the EnableProxyProtocol policy, which required a bit of code:

    Time per request:       88.400 [ms] (mean, across all concurrent requests)
versus public server via the regular HTTP proxy:

    Time per request:       415.859 [ms] (mean, across all concurrent requests)
For reference:

    $ ab -n 400 https://www.google.com/
    Time per request:       168.438 [ms] (mean, across all concurrent requests)

Static IP, source IP, and zonality are game changing.

Unfortunately, it lacks a very significant existing feature of ELB: SSL/TLS termination. It's very convient to manage the certs in AWS without having to deploy them to dedicated EC2 instances.

This is what their ALB service if for.

It won't ever be possible to do this as the NLB runs a few network layers below where TLS runs

Does it? It has HTTPS health checks.

I expect the health-checking happens independently of the routing. The routing will just act upon a list of routes which is modified independently by the health check.

This was discussed a couple of times recently, but answers seemed contradictory. ELB requires pre-warming if you expect sudden high load. But do ALB and NLB?

[1] https://news.ycombinator.com/item?id=15085863

[2] https://news.ycombinator.com/item?id=14052079

Each NLB starts out with several gigabits of capacity per availability zone, and it scales horizontally from there (theoretically to Terabits). That's more capacity than many of the busiest web-sites and web-services in the world need.

If you expect an instantaneous load of more than about 5Gbit/sec, in those situations we work directly with customers via AWS Support. We really try to understand the load, make sure that the right mechanisms are in place. At that scale, our internal DDOS mitigation systems also come into play. (It's not a constraint of NLB).

The load test in the blog post was done with an NLB, and was done with no pre-provisioning or pre-warming and allowed us to get to 3M RPS and 30Gbit/sec, which is when we exhausted the capacity of our test backends.

ALBs start out with less capacity, and are constrained more by requests than bandwidth. I don't have a precise number because it depends on how many rules you have configured and which TLS ciphers are negotiated by your clients, but the numbers are high enough that customers routinely use ALBs to handle real-world spiky workloads, including supporting Super Bowl ads and flash sales.

Each ALB can scale into the tens of gigabits/sec before needing to shard. ALB also has a neat trick up its sleeve: if you add backends, we scale up, even if there's no traffic. We assume the backends are there to handle expected load. So in that case it has "more" capacity than the backends behind it. That goes a long way to avoiding some of the scaling issues that impacted ELB early in its history.

If you have a workload that you're worried about, feel free to reach out to me and we'll be happy to work with you. colm _AT_ amazon.com.

> [upgrading ELB], in those situations we work directly with customers via AWS Support.

This is painful, though. I don't know about Sep 2017, but in 2016 upscaling your ELB involved answering a block of 21 or so questions in a list only given to you after you engage with support, which had some pretty esoteric items on it. It was decidedly un-AWS-ey.

It was definitely un-AWS-ey, and there's an almost visceral pain response on our faces when we don't have a self-service API for something. This is improving all of the time and I think is already much better, with some big specific improvements ... I'll do my best to share what I can here.

First things first, with NLB our experience is that pre-warming is never necessary. Each NLB starts out with a huge volume of capacity, beyond the needs of even the largest systems we support, and each NLB can theoretically scale to terabits of traffic.

Our first big improvement for ALB was that the basic scalability and performance of ALB is at a point that in all but a very small number of cases (think of some of the busiest services in the world), customers don't need to do anything. This is the pay-off from a lot of hard work focused on low-level performance.

Our second big improvement, for both ALB and Classic ELBs, was a mix of more generous capacity buffers, and pro-active and responsive scaling systems. Together, these mean that we can race ahead of our customers load requirements.

Another item that's helped is that for the truly big scaling cases, which is DDOS preparedness, we now have the AWS Shield service to manage that process in consultation with the customer. That's useful if your needs are more nuanced and custom than the DDOS protection that is included with ELB by default. This gets into things such as how your application is configured to handle the load.

With all of these improvements, ALB does not require pre-warming for the vast majority of real-world workloads. However, after years of pre-warming as a thing, we have customers who have incorporated it into their operational workflows, or who rely on it for extra peace of mind. We do want to continue to support that for our customers.

That's all great news to hear. Thanks for the informative response :)

Does NLB have the option for session affinity? i.e. keep the same tcp connection open to the same backend host?

Yes, all packets from a TCP connection will keep going to the backend that was chosen when the connection first came in. This is preserved even if other backends are added and removed to the set of eligible targets. It's also preserved even if the backend itself becomes unhealthy, but the connection seems not to be impacted. For example, if the backend application stops listening for new connections - existing ones will continue to work.

We try really really hard to preserve connections and the system is designed to keep connections healthy even for months and years.

Forum post answer, so not 100% official, but it is from an AWS employee.

"Yes, while you should see performance increases on ALB in general, for major traffic surges we still recommend a pre-warm"


I would trust what @collmmac (https://news.ycombinator.com/user?id=colmmacc) is writing above, as he is a Senior Principal on NLB: https://www.linkedin.com/in/colmm1/

Not sure about ALB, but from the linked blog post for NLB:

> Beginning at 1.5 million requests per second, they quickly turned the dial all the way up, reaching over 3 million requests per second and 30 Gbps of aggregate bandwidth before maxing out their test resources.

Seems like this limitation has been for long time (since 2009). Curious to know how everyone has been using ELB. To me, ELB seems to be an unfinished product. It must be painful to first predict heavy load on your application and then notify AWS well in advance.

It's almost like AWS released what they thought was the best product at the time, regretted some of their decisions and then launched other product(s) to replace it....

In this case, it seems like the one-size-fits-all ELB has been replaced by ALB for those using containers, who want L7 LB, and don't need insanity-scale. NLB for those who want massive scale, a dumb pipe and/or need consistent IPs. They could have tried to build these features into ELB but they didn't, they deliberately created new nomenclature to get rid of baggage.

Also see: SimpleDB -> DynamoDB. EC2 Classic -> VPCs.

good > perfect is somewhat standard thinking in AWS.

How does failover across zones work?

The blog post says there's one static ip per zone. I suppose www.mydomain should have multiple A records each pointing to an elastic ip in a zone. What happens when one zone entirely fails? Does it need a DNS change at this point? Or does the NLB have a different IP with which it can do BGP failover?

AWS provides you with a number of DNS records for each NLB:

- One record per zone (which maps to the EIP for that zone) - A top-level record that includes all active zones (these are all zones you have registered targets in, IIRC)

The latter record is health checked, so if an AZ goes down, it'll stop advertising it automatically (there will be latency of course, so you'll have some clients connecting to a dead IP, but if we're talking unplanned AZ failure, that's sort of expected).

That said, this does mean you probably shouldn't advertise the IPs directly if you can avoid it, yes.

(disclaimer: we evaluated NLB during their beta, so some of this information might be slightly outdated / inaccurate)

Won't DNS failover be painfully slow? Some clients ignore small TTL values. I've seen DNS updates taking several hours to propagate.

I thought one of the advantages of multiple zones is that zonal failover can happen with "zero" downtime (this seems to be the case with Amazon RDS).

The default answer includes multiple A records, so if clients can't reach one of the IPs, they try another. There's no need for anything to propagate for that to kick in, it's just ordinary client retry behavior.

We do also withdraw an IP from DNS if it fails; when we measure it, we see that over 99% of clients and resolvers do honor TTLs and the change is effected very quickly. We've been using this same process for www.amazon.com for a long time.

Contrast to an alternative like BGP anycast, where it can take minutes for an update to propagate as BGP peers share it with each other in sequence.

RDS failover still uses DNS and you still need to be aware of client TTLs:

"Because the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if your application tries to connect to an IP address that no longer is in service."


I assume they intend for you to use Route53 on top of this. You could use a combination of geolocation routing and failovers to set it up so that by default people are routed to their nearest region, but if that region is currently offline send them somewhere else instead.

This is a job for a CDN

I have just finished setting up a new front-end for a few services (we are just about to start migrating production systems to it).

I was aiming to use static IPs (for client firewall rules), and simplify networking configuration, so what I ended up with is an auto-scaling group of HAProxy systems that run a script every couple of minutes to assign themselves an elastic IP from a provided list. Route 53 is configured with health checks to only return the IP(s) that are working.

The HAProxy instances also continuously read their target auto-scaling groups to update backend config, and do ssl terminating, also running the Let's Encrypt client. Most services are routed by host name, but a couple older ones are path-based and there are some 301 redirects.

I think NLB could replace the elastic IP and route53 part of this setup, but I'd still need to do SSL, routing, and backends. It's too bad, because my setup is one that could be used nearly anywhere that has more than one public-facing service, but there's not much built-in to help - I had to write quite a few scripts to get everything I needed.

Have you tried/evaluated traefik? It sounds like it could do nearly everything you just mentioned.

Awesome! I have the perfect project for this haha. Does it still work with ECS integration?

Yes it does ( target group integration )

No chance that I'll jump into another new load balancer product from Amazon any time soon. ALB has significant deficiencies that AWS don't warn you about, and you only find then at tens of thousands of RPS.

Still waiting on that fix, AWS.

If they won't warn us, could you please warn us? - Fellow ALB user.

Sure. Whenever a "config change" (Note: this includes adding or removing targets to a target group, EG Autoscaling) happens on an ALB, the ALB drops all active connections, and re-establishes them at once, at high load, this obviously causes significant load spikes on any underlying service.

You can see this happening by looking at the "Active Connection Count" graphs from your ALB, and adding or removing an instance from an ASG.

At 30+GBPS and over 20kRPS, removing one instance can cause absolute chaos.

Wow that sounds awful - but thankfully this isn't typical. I'm going to go digging for a case-id/issue and see what's going on myself (please e-mail the case if you have one). Re-configurations are routine and graceful.

From your description it may be that you have long lived connections that build up over time, at a rate that targets can easily handle, but that the re-connect spikes associated with a target failure/withdrawal are too intense. This is a challenge I've seen with web sockets: imagine building up 100,000 mostly-idle web sockets slowly over time, even a modest pair of backends can handle this. But then a backend fails, and 50,000 connections come storming in at once!

Another scenario is adding an "idle" target to a busy workload, but it not being able to handle the increased rate of new connections it will get. Software that relies on caching (including things like internal object caches) often can handle a slow ramp-up, but not a sudden rush.

We're currently experimenting with algorithms that allow customers to more slowly ramp-up the incoming rate of connections in these kinds of scenarios.

Anyway, those are guesses, so I may be wrong about your case, but hopefully the information is still useful to others reading.

i'm starting to realize real world testing is difficult on aws bc "on-demand" means we can push maximums

This is perfect. Need something like this to load balance A records for dynamic domains off apex. We can most likely use the static IP address perfectly for this.

Also did I read it wrong or this is actually cheaper than ALB?

Awesome! Can't wait to dig in more!

Yes, NLB is priced the same as ALB hourly but 25% cheaper on bandwidth (LCUs).

I don't understand the pricing model. 800 new connections per hour, for $0.006? Isn't that extremely expensive? 80,000 connections for $0.60 in an hour is $432 per month for not a whole lot of traffic.

edit: Okay, it's 800 new connections per second, per the ELB pricing page, under "LCU details". The cost for 80k connections in an hour is effectively constrained by the bandwidth, eg if there's very low bandwidth it's $0.006/hour or $4.32/month.

I was just wondering if this is something purely developed inside amazon or is it backed by an ADC like NetScaler or F5. does anyone know any detail ? I'm assuming that classic load balancer is some third-party or old framework and this is something amazon developed internally.

I would assume that it's something developed internally at Amazon. Networking inside of AWS isn't standard fare and I doubt something like NetScaler or F5 products would be able to be used. Generally speaking, they aren't using TCP/IP behind the curtain, to move packets between nodes. AWS has even created their own routing hardware/software because no other company could do what they need at the scale that they need. See this video for more information: https://www.youtube.com/watch?v=St3SE4LWhKo

AWS & Amazon uses a LOT of routers from one of these vendors. Although, they would probably try and avoid baking it into a public-facing product like this.

It's not showing up in GovCloud. This stuff always takes longer for that.

When is http2 coming to ELB ... :/ It should be their #1 priority.

ELB's Application Load Balancer (ALB) supports HTTP/2 termination. More in the original launch announcement: https://aws.amazon.com/blogs/aws/new-aws-application-load-ba... and https://aws.amazon.com/elasticloadbalancing/details/#details

http2 termination is not very useful, most people want ELB -> backend instances.

in case anyone is confused about this sub-thread: """ Application Load Balancers provide native support for HTTP/2 with HTTPS listeners. You can send up to 128 requests in parallel using one HTTP/2 connection. The load balancer converts these to individual HTTP/1.1 requests and distributes them across the healthy targets in the target group using the round robin routing algorithm. Because HTTP/2 uses front-end connections more efficiently, you might notice fewer connections between clients and the load balancer. Note that you can't use the server-push feature of HTTP/2. """ ( http://docs.aws.amazon.com/elasticloadbalancing/latest/appli... )

Yes it is. For the client, multiplexing multiple requests across a single connection and only needing to negotiate TLS once can have huge benefits, depending on the page/site. You are after all building your app/site for your end users, right?

Feature request : Please allow weighted load balancing i.e. ability to distribute traffic in a user specified ratio (weights) to different sized instances.

For now here's a work-around that I use: create multiple listeners/ports on the larger instances and add them as targets. Containers is a great approach here too; load up the bigger instances with more containers and register each container as the targets.

Better: distribute to the instance with the lowest % CPU by default, ala the Google Cloud NLB.

This is what ELB Classic did for a long time - but we're experimenting with new algorithms. The problem with this approach is that it's not cache friendly.

When you bring in a new server to a busy workload, it gets all of the new connections. That can put a lot of pressure on that host - and if the application relies on caching, which most do in some form, it can really make performance terrible and even trigger cascading failures.

Another problem is that if a single host is broken or misconfigured, throwing 500 errors, it is often also the fastest and lowest CPU box, because that isn't very expensive. It can suck in and blackhole all of the traffic.

Based on how these issues work out at scale, we've moved beyond simple load based load balancing (I know that sounds counter-intuitive) and into algorithms that try to achieve a better balance for a wider range of scenarios.

That's very hard to get right. If you don't have exactly the right cool downs and back offs, you get oscillating load.

I was so excited before I realized this wouldn't terminate http(s) traffic. An IP anycast based load balancer ALB would be nice.

I wonder what they wrote it in. I'd guess C++, Java or Erlang, or a combination of those.

Chances are it's IPVS[1], maybe with some patching.

Why would one reinvent the wheel when Linux already lets you do that.

In that case it would be C since it's implemented in the Linux kernel.

[1] https://en.wikipedia.org/wiki/IP_Virtual_Server

I should have read the blog instead of skimming before responding.

This appears as Layer 4 load balancing, IPVS is more of Layer 3.

So the person who was voted down by mentioning HAProxy might not be too far off. It could be implemented through HAProxy + TPROXY enabled in the kernel[1]. Then just make sure that default gateway configured on targets routes back to the load balancer or it is the load balancer.

[1] https://www.loadbalancer.org/blog/configure-haproxy-with-tpr...

>"This appears as Layer 4 load balancing, IPVS is more of Layer 3."

No, IPVS is L4 load balancing. L3 load balancing would be a routing protocol plus ECMP.

Ah my bad, even the wiki link I posted said it is layer 4. I guess both solutions could be utilized then, perhaps it was IPVS then since it would be more performant than HAProxy.

Most likely C/C++, Erlang / Java is not fast enough for this kind of work load.

It preserves source ip, so that suggests something like a kernel module, or like Intel's DPDK. Either (practically) rules out anything but C, doesn't it?

Or they could just encapsulate the packet to preserve the IP...

Happy to be educated. There's some easy way to alter the destination IP that way?

Like mention above IPVS: http://www.linuxvirtualserver.org/VS-IPTunneling.html

This is heavily used by Facebook for their loadblancer on their racks.

There was a talk about this in SREcon Europe 2015: https://www.usenix.org/conference/srecon15europe/program/pre...

Ah, ok. A kernel module, as I mentioned :)

I guess HAProxy

Maybe VHDL if it's running on f1 instances?

This will motivate me to get everything into VPCs, which I should have done a while ago.

so would a websocket-based application be better off using NLB?

Better off depends on what your workload goals are. If you want path or host name based routing, Application load balancer may be a better fit as it natively supports WebSockets. If your goal is long lived sessions (weeks and months, not minutes and hours), Network load balancer is probably a better fit.

ALBs are great for WS and can term SSL for WSS. Just use "HTTP" and "HTTPS" protos on the target groups and it'll work. It's a bit confusing but it works.

any idea if you can use this behind API gateway if it is not itself public?

yes of course. The NLB is public and API GW sits on the edge locations

Does it support ipv6?

nice. Especially the ip thing.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact