We generally use 60 second TTLs, and as low as 10 seconds is very common. There's a lot of myth out there about upstream DNS resolvers not honoring low TTLs, but we find that it's very reliable. We actually see faster convergence times with DNS failover than using BGP/IP Anycast. That's probably because DNS TTLs decrement concurrently on every resolver with the record, but BGP advertisements have to propagate serially network-by-network. The way DNS failover works is that the health checks are integrated directly with the Route 53 name servers. In fact every name server is checking the latest healthiness status every single time it gets a query. Those statuses are basically a bitset, being updated /all/ of the time. The system doesn't "care" or "know" how many health status change each time, it's not delta-based. That's made it very very reliable over the years. We use it ourselves for everything.
Of course the downside of low TTLs is more queries, and we charge by the query unless you ALIAS to an ELB, S3, or CloudFront (then the cost of the queries is on us).
- Route 53 failover record
* primary record: Google global load balancer IP
* secondary record: Route 53 Geolocation set (really need that latency)
- Elastic Load balancer record per region
* routes to mirror region GCP IP address (ELB's application load balancer seems to able to point to AWS external IPs)
* optionally spin up mirror infrastructure in AWS
Seems brittle. Does Azure support global load balancing with external IPs?
Does anyone have such (or similar) setup actually in production? How did it work today?
IP addresses as Targets
You can load balance any application hosted in AWS or on-premises using IP addresses of the application backends as targets. This allows load balancing to an application backend hosted on any IP address and any interface on an instance. You can also use IP addresses as targets to load balance applications hosted in on-premises locations (over a Direct Connect or VPN connection), peered VPCs and EC2-Classic (using ClassicLink). The ability to load balance across AWS and on-prem resources helps you migrate-to-cloud, burst-to-cloud or failover-to-cloud.
Looks like you need an active VPN connection to access external IPs.
"The IP addresses that you register must be from the subnets of the VPC for the target group, the RFC 1918 range (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16), and the RFC 6598 range (100.64.0.0/10). You cannot register publicly routable IP addresses."
I was diagnosing a networking issue from one of our service providers last Friday. For whatever indeterminate reason DNS responses from R53 took upwards of 10-15 seconds to return. While I appreciate the non-configurable default TTL of 60 seconds for ELB is not plucked out of thin air and that actual issue seemed to be on the service providers side, the lower limit seems far too low for medium/high latency networks. I wish it was configurable.
What's worse is it looks like it's our site that is the issue, so we get the complaints and I have to dig through wireshark logs.
I've done a few unplanned DNS failovers, and I agree with this. What can be real trouble though is if you're running a B2B app, and your customers corporate networks can be configured in any strange way. I've met real network admins who think they need to have high TTLs everywhere in order to protect themselves from root DNS DDoSes.
For example, the public wifi in the last Hackspace in Munich I visited did not honour my 10 second TTL.
But in my opinion there aren't enough of them to justify not using short TTLs. It's their problem after all if they don't honour websites' settings: Then they will see downtime when nobody else does.
;; ANSWER SECTION:
s3.us-east-1.amazonaws.com. 5 IN A 126.96.36.199