This outage made me realize that github is served over a single IP address (A re...

raphaelj · on June 8, 2021

> The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ..

Well, Internet was indeed designed for redundancy, and it worked as intended. A no point in time it failed to make you reach the server it was supposed to make you talk to.

What are failing are all the application protocols that are running on top of the network.

kayfox · on June 8, 2021

Github's DNS likely will serve up a different IP for github when there is an outage. I can't talk about the details but GitHub and the rest of Microsoft use a global load balancing system that works through DNS.

omk · on June 8, 2021

Would be interesting to know what these fail over patterns are. As DNS takes a while to propagate, I thought DNS records already indicate fail over addresses.

kayfox · on June 8, 2021

I think only MX records indicate any priority for each additional record returned, for A records theres no indication of which records have priority over others and the usual behavior of authoritative DNS servers is to rotate the order in which records for the same thing are returned, so effectively returning more than one record for the same question results in a distribution of requests to the IPs returned rather than any sort of failover behavior.

In the case of the software Microsoft uses, it monitors endpoints for the websites in question and then changes which IP(s) are returned based on the availability of those endpoints, the geographic region and other factors.

bombcar · on June 8, 2021

Some reliability systems change the routing for the IPs instead of updating the DNS as BGP can propagate faster than DNS caching.

Priority for A records would a nice feature.