
Maximum theoretical downtime for a website: 30 minutes - paraschopra
http://visualwebsiteoptimizer.com/split-testing-blog/maximum-theoretical-downtime-for-a-website-30-minutes/
======
kilburn
<blockquote> Our research led us to conclude that even for the best
configurations in the world (yes, that includes Google, Amazon and all
biggies), maximum theoretical downtime (for a few users) will be 30 minutes.
You cannot possibly escape this limitation. </blockquote>

You've overestimated your own abilities. "The biggies" don't need to use DNS
failovers, they use anycast routing and the like. Therefore, their "maximum
theoretical downtime" is probably much lower than 30 minutes...

~~~
paraschopra
Even with Anycast, a portion of their users will still be routed to old, down
server until DNS cache expires. No? I'd love to get clarified as this is a
problem we desperately want to solve.

~~~
skorgu
IANANetwork Engineer but basically with anycasting you can announce a single
IP from two different places and different providers will route to whichever
is 'closest'. If your datacenter goes down one of those routes stops being
advertised and providers will re-route to the next best advertised route. BGP
cycle time is variable but way less than 30 minutes in many cases.

This presumes you can essentially have two 'active' datacenters, not all
application designs allow for this. You can still use BGP as a failover
mechanism, just announce the route to your production IP from DC1 and if that
dies, announce it from DC2 once you fail over. Again, limited by response time
and BGP cycle time but that's faster than DNS.

~~~
paraschopra
Thanks for your info. I researched a bit and it sounds like Anycast is a very
expensive option. But, anyway, I agree it could be faster than relying on DNS
routing.

------
niyazpk
AFAIK not only does the browser cache the IP address, but the many DNS servers
(user ISP or any others) also cache the IP addresses.

I wouldn't expect all the DNS servers out there to check for a new IP every
couple of minutes. Many of them manage their cache according to their
policies, not _strictly_ according to the arbitrary constraints set by the
individual domains.

BTW shouldn't some default variant load in the site if your service is not
working?

~~~
paraschopra
Today, almost all DNS servers who cache entries respect TTL values. We did a
number of benchmarks and observed that TTL was respected in all cases.
Ultimately it boils down to the browser caching DNS.

Yes, default variant loads if service isn't working. The slowdown part occurs
when browser tries to connect to our service for loading JS but our servers
don't respond. The timeout for this kind of behavior is 60 seconds. That means
for only after 60 seconds, browser will abandon its attempt for connecting.

------
paraschopra
Just FYI, I found this excellent guide
<http://www.tenereillo.com/GSLBPageOfShame.htm>

