Poor Man's Global Traffic Manager

_pn3l · on Aug 21, 2022

Please don’t do this.

Authoritative nameserver selection by recursive resolvers is extremely unreliable and implementation specific. This method may actually be worse than health checked records and low TTLs and could leave your site completely unconnectable for uncomfortably long periods of time.

See this paper for more details about nameserver selection of common recursive resolvers, and so called SRTT : https://irl.cs.ucla.edu/data/files/papers/res_ns_selection.p...

Also this presentation, https://youtu.be/z7Jl1sjr9jM

One of the best and most robust solutions for GSLB/GTM is pointing your main entrypoint IP (e.g. apex record/www/api) at an anycasted proxy (Cloudflare/Fastly/Google/AWS, etc), and using the active health check features of that service to the static unicasted (and firewalled!) load balancer IPs of your origin.

These services are not expensive but if you can’t afford them, the second best method would be to round robin A/AAAA records with low (60s) TTLs of a large pool of ingress load balancer IPs—which is exactly how AWS ALB/ELB/NLB operate!

When loadbalancing via DNS, you’ll still contend with misbehaving recursive resolvers (JVM clients for example) but you’ll strand less traffic for less time than you would when withdrawing authoritative nameservers due to unpredictable and chaotic resolver implementations.

gandalfk7 · on Aug 20, 2022

I did something almost identical and it's working wonders with haproxy and "mode tcp" to pass the ssl connection just reading SNI to the backend without terminating it.

I've used a delegated zone to achieve that, I like your catch-all approach better since it's cleaner, in a future rewrite I might change it.

I am now testing the configuration with proxy-protocol to pass the information on the client IP which otherwise would be lost, it's working but certbot does not like it so it needs a dedicated backend with a rule.

I've also created an Ansible playbook that does the installation for you and retrieves the haproxy config from a git repo so it's always in-sync on the balancer machines: https://blog.gandalfk7.it/posts/20220201_01_diy-balancer-wit...

Snawoot · on Aug 21, 2022

Nice to know, thanks!

> certbot does not like it so it needs a dedicated backend with a rule.

What exactly fails in the certbot when you use proxy protocol by the way? Are you using certbot in standalone mode, making it listen socket on its own? I usually pass acme paths with Nginx to certbot workdir and use webroot mode.

Speaking about synchronization, on some job I had configs in SVN repo which were checked out with fabric job - it worked nicely. But afterwards I moved configs to git and made small daemon to listen webhooks for repo update to checkout and reload loadbalancers. Sort of gitops.

gandalfk7 · on Aug 21, 2022

It looks like unknowingly we followed a very similar route :D

For the git repo I am not using webhooks but wrote a script that checks if the repo has been updated, downloads the new configfile, tests the syntax and if ok copies it and reloads haproxy.

I use certbot in standalone mode, I want to review that approach and integrate it with nginx in the near future.

But yes, in standalone mode proxy-protocol is not working with certbot, I think it's due HTTP01 challenge [0] but I have to look into that with more time at hand.

I am now using these lines in haproxy config to route it to the correct backend (which just won't have "send-proxy" in the server options):

  acl letsenc_example path_beg /.well-known/acme-challenge/ hdr_dom(host) -m end example.org
  use_backend bk_http_letsenc if letsenc_example

thanks for the ideas for improvement!

[0]: https://github.com/cert-manager/cert-manager/issues/466

Nextgrid · on Aug 20, 2022

Doesn't this ignore the fact that a lot of DNS resolvers don't respect the TTL and cache for way longer than they should?

donavanm · on Aug 21, 2022

Ive run a major CDN and authoritative nameservice. Its small enough to not matter in most cases. A quick swag is Id expect 95% of traffic to respect the ttl. Another 4% will appear to have some sort of minimum ttl on the order of single digit minutes. The last 1% transitions over the next hours. There will be a super long tail with measurable, but insignificant, traffic using the previous RDATA effectively forever. Im sure someone from DNS OARC has more recent and well quantified numbers.

Snawoot · on Aug 20, 2022

That definitely takes place just because on the internet you can find any anomaly you can think of. But it's not widespread. Also, AWS Route53 also relies on fairly low TTL, so it's a non-issue.

zhfliz · on Aug 20, 2022

isn't it usually the opposite, except for client applications that ignore the ttl they got from their recursive resolver as they don't want to bother with keeping track of ttls?

icedchai · on Aug 20, 2022

When I ran a small web hosting provider, we had some clients (mostly bots, I think) hitting old IPs for months. We had kept the TTL below 5 minutes for a very long time before migration.

Also, older Java apps would cache DNS lookups forever, until you restarted the JVM. I remember arguing back and forth in email with a "vendor." They finally believed me and restarted their app server.

paranoidrobot · on Aug 21, 2022

> Also, older Java apps would cache DNS lookups forever, until you restarted the JVM. I remember arguing back and forth in email with a "vendor." They finally believed me and restarted their app server.

New JVM-based apps do the same thing. I've had the same conversations with a "partner".

icedchai · on Aug 21, 2022

I figured, after all these years, the JVM would switch to a sane default for DNS caching. Sadly that is not the case...

zhfliz · on Aug 21, 2022

that's what i meant.

the resolvers (which those apps talk to) are (afaik) generally rfc-conforming and respect ttl, though they may have lower max ttl values, allowing faster refreshes in some cases.

certain bad applications however just don't bother dealing with ttls at all and assume dns is static and never changes.