I spent a long time fighting this battle with nginx. I eventually came to the conclusion that while it was technically possible, it just wasn't worth the trouble (and yes, I tried various alternatives such as Tengine and OpenResty).
What I settled on was the following:
Ngninx for SSL termination in front of an haproxy load balancer. I wrote a simple Python script (92 lines of code) that spits out a new haproxy configuration and gracefully reloads haproxy whenever the DNS changes. That has (after fixing a bug or two in my code) been FAR more reliable than any other solution to this problem we've tried.
Bonus: haproxy is a superior load balancer with better runtime metrics and health checking options than nginx.
It was not that reliable for us. We still had unexplained occurrences of IP caching and it's easy to fat finger the configuration and lose the resolving functionality. Anyway, as I mentioned haproxy has vastly superior load balancing capabilities. Nginx still has a lot of work to do to match haproxy in this area.
HAProxy 1.5 does a pretty great job with that stuff now. I agree that the configuration syntax can be difficult but it is very powerful.
There are some limitations of course, and I can see why you might want to use nginx for some routing. One of the more bizarre tricks I've had to use for more complex redirects (like non-www to www while properly injecting HSTS headers) involved sending the request to a backend that sent to a single frontend via a local port [1]. Hopefully that kludge will be fixed in 1.6.
I know its capable, but you have to understand we are working in a team environment where the expertise is primarily with nginx. It's also quite a lot easier to introduce nginx to a dev who has never used it before. Not every decision can or should be made solely based on technical capabilities.
Bizarre tricks are not something I am a fan of deploying to production and using as the foundation for our service. That is hardly a compelling argument. ;)
I've been working with nginx quite a bit recently for use as a reverse proxy in a containerized environment and have become quite disenfranchised with it.
For some reason it's the most popular reverse proxy for this sort of stuff, however it's not particularly well suited for it. We have this issue. We also have the lack of active upstream checks and any sort of upstream status reports. Both are hidden behind the nginx Plus paywall which is absurdly expensive in a micro-architecture world. There are various patches, Patches!?, you can apply but the documentation isn't fantastic and this is an extra bit of hassle we don't really deserve..
I think the open internet deserves a better reverse proxy TBH.
I am acutely aware of HAProxy however the rest of the team is, no surprise I'm sure, familiar with nginx. Likely I'll end up doing the usual nginx->haproxy->backend chain again. HAPROXY appears to have an enterprise now(it may have had forever I just have not noticed) but it appears to be much more focused on value-add support and stack validation than hiding simple-to-implement stuff behind a paywall and open-core. This is more of a frustration with nginx which could be so much more for us if:
* They were not purposely holding back features so they can be pay-walled
* Re-compiling were not necessary to add functionality
* More cohesive modification ecosystem
I know I said "reverse proxy" which HAProxy is cleary the more superior, but I didn't mean to limit my statement to reverse proxies. nginx does have a lot of other functionality people rely on..
It's worse than a bug, it's a Really Bad Design (tm). Bugs are usually unintentional, but this is a deliberate reinvention of a wheel (IP address caching) that doesn't need to be reinvented.
But this is not just nginx's fault. It's also Bad Design on the part of AWS because switching IP addresses this way means you can't keep a TCP socket open to an ELB machine for more than 60 seconds at a time because you never know when the routing rug is going to be yanked out from under you. This makes ELB useless for anything involving a persistent connection. No websockets for you!
It's not quite as bad as that. AWS's short TTL means that they can change the IP whenever they like, not that they will. I would imagine that most of the time, long-lived connections to an ELB will be fine.
If the ELB does get a new address, then yes, the connection will fail, the client will have to reconnect, and when it does, it will need to do a fresh address lookup. But since the connection is over a network, failure is a possibility regardless of what AWS does, and so clients need to be able to detect failure and reconnect anyway.
The problem is not the frequency with which it happens. The problem is that the way you get notified that it has happened is that IP packets suddenly start to get delivered to the wrong machine. If you're lucky, the net result will simply be a dropped TCP connection. If you're not lucky, pretty much arbitrarily bad things can happen.
Now, it is true that IP is not reliable, and so arbitrarily bad things can happen at any time and you do have to be prepared for those. The problem here is that there is no notification. Potentially bad things are happening here not because something has gone wrong, but by design. That, IMHO, is the very definition of Bad Design.
Normally, when you make a DNS change, you control the DNS and the affected end points. That way you know when the change is happening, and you can give yourself as much time as you need to make the transition in an orderly way. With ELB, the only guarantee you have is the TTL. With a 60 second TTL, that means you have at most 60 seconds to do the transition, and even that is only if you notice it when it happens (and the only way to guarantee that is to poll the name server constantly).
I like to think of the problem as an inherent issue with cache invalidation in distributed systems that has no straightforward solution in an Internet architecture. DNS problems greatly resemble what it's like to have dangling pointers, for another analogy.
I think the expected solution approach is a little like validating 2FA logins. To account for system variability, you accept that transactions can take up to a certain time and you allow multiple values (the similarities probably end there, admittedly). With a more advanced solution you would even advise clients to invalidate their own cache too (similar to load shedding by advising / redirecting clients - you can see this in 2FA logins sometimes where you're asked for the sequence after the once-valid key just entered). So I think you'd need to accept multiple prior CNAME resolutions to account for longer lived transactions and make sure each entry change will be valid only for so long. Being able to be notified of these changes programmatically would be really nice though specifically for AWS. Perhaps AWS Lambda or SNS could be leveraged for pushing AWS DNS state change notifications to your system?
One approach I've seen some folks do is to simply reload their nginx configurations across their backend nodes to refresh the cached entry. It's probably intractable without sacrificing a lot of theoretical availability with in a degenerate state of reloading upon each request when your CNAME changes several times a second. Some of my colleagues have experienced problems even with the nginx proxy_pass directive that most people say is the most recommended free solution and also recommended in the article.
Regarding how the ELB works, then I think the AWS engineers have a different idea on how to implement stuff. AWS is very much a dynamic platform and the engineers seem to have embraced this when they came up with this solution.
It doesn't necessarily mean you can't use websockets through an ELB, it just means that you would need to be able to handle reconnects, but that shouldn't be a new challenge for any system relying on connections being open for long. Also, the load balancer servers doesn't switch every 60 seconds, you can have connections running for a lot longer than that. I would also assume the load balancers keep handling connections for a while after they were taken out of the DNS rotation, in order to make sure DNS caches are updated before the IP addresses stops working.
AWS support has said that ELBs will continue to accept connections (and use the correct backend) for at least an hour after the CNAME stops resolving to a particular IP.
> you would need to be able to handle reconnects, but that shouldn't be a new challenge for any system relying on connections being open for long
That's true, but it kind of misses the point. Normally, if a connection is dropped it means something has gone wrong. In this case, connections are dropped by design, and there is no way (AFAICT) to work around this. Designing so that behavior that is otherwise the result of things going wrong is now the normal designed-for behavior is, IMHO, the very definition of Bad Design.
I'd argue that this design promotes building reliable applications. A system that cannot reconnect is fragile, and the best way to know if the system can handle that failure is to occasionally induce the event. Assuming that you are running across a lossless, ideal network is, IMHO, the very definition of Bad Design.
Certainly systems should be designed to be robust against failures. But encouraging this by deliberately producing failures in production seems like a bad idea to me. It's kind of like saying, "Let's see if the new hull design works by deliberately steering the boat into an iceberg!"
A TCP socket teardown followed by a reconnect is hardly the equivalent of ramming a floating chunk of ice. There are a bunch of reasons you will see that teardown in practice, like NAT timeouts in a home router, or carrier-grade 6to4 NAT, or mobile devices rehoming to a new tower, or anywhere else that state is tied to the path.
Sure this is a deliberately produced failure, but only in the sense that this is a "normal" failure. This is a condition that is to be expected on the internet, and this is simply an additional place it occurs.
Bad analogy. It's like saying "let's see if the new hull design works by deliberately running it into things in a test laboratory setting". Because, y'know, if you deploy an application to production using a particular network configuration (that is, using an ELB) without testing it in a development/staging environment first, you're doing a poor job.
This disconnect behavior is just a property of the system. Either you design your application to handle it, or you use a different system. (Not that you can get away with not handling disconnects even without ELBs.)
My analysis shows an AWS ELB changes ip addresses on us roughly every 2 weeks. Often enough to cause problems if you aren't prepared but infrequent enough to give you a false confidence that things are working as designed.
You can hold a TCP connection open through an ELB basically as long as you want. The default idle timeout is 60s but can be increased to 1hr, this is a non-factor if you are sending any sort of data though.
When the "routing rug" is pulled out from under you all you need to do is re-resolve and re-establish the TCP connection which will likely live on for days (in most cases weeks) without disconnecting again.
This is fine for most use cases I am aware of.
As for Websockets, you will need to run the ELB in TCP mode to do that and probably run a real HTTP proxy behind it that supports Websockets/UPGRADE and uses constant source-ip hashing and supports the TCP PROXY protocol. i.e HAProxy.
You can run HAProxy or other any other proxy that matches the above in an ELB to get good highly available Websockets proxy layer.
I find it particularly grating that the official way to deal with Nginx ignoring the advertised TTL is to pay $1500 per year, per server, for an Nginx Plus license.
Yes. Some of the nginx plus features are very small, and there's even 3rd party modules that add such features. This particular case seems like an outright bug, with a hack workaround. Would they take a patch to fix it?
It makes me think that they must be intentionally omitting features in order to make plus valuable. That seems rather cheesy. Obviously it's their code and right, but I think it shows how hard it is to profit off an open source program.
I think the reasoning for it is for performance. If you can make all the DNS queries you need before you start serving any requests, then you don't have to wait for DNS servers while clients hammer your server.
Ideally I would have like it to be an option though, instead of it basically having become a feature in Nginx Plus - if it wasn't for the way I described in the post.
I think there's room for interpretation but that in cases like this, it is indeed a bug.
Many applications are programmed this way but I firmly believe that an application needs to honor the TTL of a DNS request for any subsequent connections, for exactly this reason. DNS records change. Sometimes frequently. IMO you shouldn't need yo kick your apps to get them to use the new hostname for subsequent connections.
It should just schedule a DNS update before the previous one has expired, and if the addresses have changed shift over, so it should not cause any delays.
That was my first thought as well. It's a bit preposterous that a) there's an "unbreak this software" option at all, and b) you have to pay for the privilege to enable it.
Thank you for this, it's a decent start for me personally. I was previously using the DNS director in Varnish 3.x to do this which was removed in 4.x. The longer time goes on the more risk it is to be running older software so this has been a great help.
Oddly enough I'd never really considered nginx for the job despite using it to reverse proxy elsewhere. Sometimes you just need a poke in the right direction :)
Now I just need to figure a way to specify *.internaldomain so that nginx resolves www.example.com.internaldomain - where www.example.com is grabbed from the requested Host: header
I know you can use a map in Nginx to do what you ask for, as long as you have a list of domains already: http://nginx.org/en/docs/http/ngx_http_map_module.html#map. I can only imagine it also being possible to make fully dynamic, I just don't have a clear way of doing it in mind right now.
Lua + Redis is a nice way to do this, I have a small project https://github.com/spro/simon that does dynamic routing and load balancing based on Hostname -> IP:Port sets in Redis. Adding a new route is as simple as:
Neat! I want to get started on some Lua scripting as well, probably first to just get a bunch of metrics out of Nginx via StatsD. There's access to a whole bunch of numbers via Lua that you otherwise can't get out from the stub_status page.
The sibling comment to this pointed out the $host variable. I imagine something along the line of proxy_pass http://$host.internaldomain; or something on that line.
Sure, that's also what I cover in the post. The latter part about rewriting is only relevant when you want to have the service behind an ELB to be located at something else than "/".
Can you not assign static IPs to ELB instances? This might be a stupid question...ELB is one of the older AWS resources that I've never really touched since Nginx is so powerful & easy to set up.
No, you can't. You can instead create your own EC2 server, give it a static IP (well, "Elastic IP") and then hope your own server doesn't go down. This is the reason why I prefer to use an ELB since I never seen stability issues with it.
What I settled on was the following:
Ngninx for SSL termination in front of an haproxy load balancer. I wrote a simple Python script (92 lines of code) that spits out a new haproxy configuration and gracefully reloads haproxy whenever the DNS changes. That has (after fixing a bug or two in my code) been FAR more reliable than any other solution to this problem we've tried.
Bonus: haproxy is a superior load balancer with better runtime metrics and health checking options than nginx.