This outage made me realize that github is served over a single IP address (A record) for my point of origin (India). Stackoverflow has 4 A record listing, but all of these belong to fastly.
The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ... <confused>.
> The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ..
Well, Internet was indeed designed for redundancy, and it worked as intended. A no point in time it failed to make you reach the server it was supposed to make you talk to.
What are failing are all the application protocols that are running on top of the network.
Github's DNS likely will serve up a different IP for github when there is an outage. I can't talk about the details but GitHub and the rest of Microsoft use a global load balancing system that works through DNS.
Would be interesting to know what these fail over patterns are. As DNS takes a while to propagate, I thought DNS records already indicate fail over addresses.
I think only MX records indicate any priority for each additional record returned, for A records theres no indication of which records have priority over others and the usual behavior of authoritative DNS servers is to rotate the order in which records for the same thing are returned, so effectively returning more than one record for the same question results in a distribution of requests to the IPs returned rather than any sort of failover behavior.
In the case of the software Microsoft uses, it monitors endpoints for the websites in question and then changes which IP(s) are returned based on the availability of those endpoints, the geographic region and other factors.
The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ... <confused>.