I don't think you need to be 100% precise on every hn comment.
You can do DNS based failover (short TTL), manually or automatically, and it's the easiest to set up using entirely your own infrastructure. This works great if you can tolerate a variable length outage for customers, or where you're not using it to deal with outage, but rather just migrating load -- say provider A suddenly gets expensive, you can migrate away, but don't need to hard kill provider A, at least until any DNS cache has expired.
You can do IP based failover (various techniques -- anycast, which doesn't really work for most apps, making your own announcements of the same netblock, IP address failover below the BGP level/internal to a network, arp stealing on a subnet (not useful across providers but good for HA), etc.
You can use a smart proxy in front of your app (an F5 "Global Load Balancer", something you've developed yourself, nginx with minimal state, or an inexpensive service like Cloudflare or their 1000x more expensive Prolexic competition).
You can do the best thing for non-web apps, a smart client, which knows to go down a list of servers (randomly?) and find the closest or best one. More intelligence in the client = more better.
I've set up all of these except Anycast (which I'd actually love to do sometime, but RIPE jacked my /24) and Prolexic (because I don't want to spend $30-100k/mo). Which is the best really expends, but IMO at least having a plan (even if it takes a week) to switch hosting providers is worthwhile for everyone.
His first example is round robin DNS. Sorry, but the terms DNS failover and round robin are often used interchangeably when you're dealing with business continuity.
Yes, certainly there are other options that increase the complexity, but why not start there? While the impact to users on shitty ISPs or behind proxies is unfortunate, it is relatively easy to implement and low-cost.
Going beyond that increases the complexity and cost exponentially and is certainly not easy.
Yeah, RR is strictly returning a response of a set (>1 records, ideally) to select from each time. Being able to remove those entries based on outages (which really needs a short TTL) is an optimization.
Unfortunately some stupid resolvers cache a single answer set for a long time, but for some applications, you're willing to accept 1/x of attempts fail during an outage (just hit reload, or come back in a bit), since this costs ~nothing to implement.
The basic concept works great for NS, MX, and other protocols where they're designed to retry.
You can do DNS based failover (short TTL), manually or automatically, and it's the easiest to set up using entirely your own infrastructure. This works great if you can tolerate a variable length outage for customers, or where you're not using it to deal with outage, but rather just migrating load -- say provider A suddenly gets expensive, you can migrate away, but don't need to hard kill provider A, at least until any DNS cache has expired.
You can do IP based failover (various techniques -- anycast, which doesn't really work for most apps, making your own announcements of the same netblock, IP address failover below the BGP level/internal to a network, arp stealing on a subnet (not useful across providers but good for HA), etc.
You can use a smart proxy in front of your app (an F5 "Global Load Balancer", something you've developed yourself, nginx with minimal state, or an inexpensive service like Cloudflare or their 1000x more expensive Prolexic competition).
You can do the best thing for non-web apps, a smart client, which knows to go down a list of servers (randomly?) and find the closest or best one. More intelligence in the client = more better.
I've set up all of these except Anycast (which I'd actually love to do sometime, but RIPE jacked my /24) and Prolexic (because I don't want to spend $30-100k/mo). Which is the best really expends, but IMO at least having a plan (even if it takes a week) to switch hosting providers is worthwhile for everyone.