Every time there is an outage at some cloud provider, I enjoy knowing that my site has maintained 100% availability since its launch. I run 3 redundant {name servers, web servers} pairs on 3 VPS hosted at 3 different providers on 3 different continents. Even if the individual providers are available only 98% of the time—7 days of downtime per year—my setup is expected to still provide five nines availability (details: http://blog.zorinaq.com/release-of-hablog-and-new-design/#co...)
Edit: It's not about bragging. It's not about the ROI. I want to (1) experiment & learn, and most importantly (2) show what is possible with very simple technical architectures. HN is the ideal place to "show and tell" these sorts of projects.
I guess the part I'm confused about here are the DNS records and DNS pinning. If your zone returns 1.1.1.1 and 2.2.2.2 and 3.3.3.3 as IP addresses to use, and I'm browsing your site while resolving to 1.1.1.1 and 1.1.1.1 goes down -- your site will appear down for me, correct?
My browser won't automatically try 2.2.2.2 or 3.3.3.3, or would it?
Yes both Chrome and Firefox would automatically try the next IP when I checked a couple of months ago. We used to use this for load balancing, returning the IPs of our servers in random order. Even when one of the servers would go down we wouldn't loose any traffic as browser would just use the next one.
Not the op, but won't browsers fall back to the other IPs? In olden days some browsers would not handle multiple IPs per DNS record correctly, hopefully it works now.
It still depends, and it's a combination of two behaviors: (1) what a browser's resolver does when it receives multiple A/AAAA records, with some selecting the first unconditionally and forcing authoritatives to play spin-the-data and (2) failure behavior, both between timeouts and positive failures -- the difference between layer 3, 4, and 7 failures also comes into play. What happens if the connection resets after a single byte? What happens if a positive refusal comes back? What happens with a warm cache for the domain? etc. etc. The failure matrix explodes very quickly, and I seem to recall there being several hundred scenarios to test last time I looked into this seriously.
Last time I researched this, behavior was quite different across the board, and it's something one should test extensively when designing HA for HTTP. In some situations, the same browser on another platform will defer to the system resolver versus its own, for example, which will potentially change behavior #1 even for the same browser. Mobile is starting to perform weird tricks with TCP, too, so you really have to dig into this one to do it right. Then throw in HTTP/2 and you've magically created yourself about a decade of justifiable work ;)
Not sure, that's what I'm wondering. Sounds like a good solution if they automatically do fallback. I also wonder how browsers behave depending on how the server at a particular IP address is responding, as they can respond in different ways. (i.e. a server might respond with an error, accept a request but timeout on a response, or appear totally unreachable)
Any HA solution I've seen that attempts to reliably achieve this five nines capability relies on network-level things like virtual IP's and what not. And I don't consider it a five nines solution if only some customers can access it. How the browser behaves in this case could be critical depending on how your visitors use the site. I would not consider a site "up" if it's only available to some people and not others.
> And I don't consider it a five nines solution if only some customers can access it.
Well, that depends on the SLA/SLO, which is really what "nines" is speaking to. Intuitively I agree, but it can, realistically, not be the case and be "valid". Doesn't make it right. Just is.
The ROI on keeping a personal blog up at 5 9s is awful... I understand the desire from a geeky perspective, but it's only really useful for personal enjoyment of the challenge, or bragging.
That's almost certainly lower than combining reputable VPS providers geo-redundantly, and likely a much higher cost for the "convenience".
And it's not just for geeky pride. You learn the most when things break. Far too many funded startups run poorly architected apps in a single AWS EZ, unlike GP.
Off by 6 orders of magnitude: actually 10,000 rps sustained for a few hours (2,500 page hit/sec × 4 req per page). These are my—admitedly very rare—peaks when the blog gets slashdotted.
Edit: It's not about bragging. It's not about the ROI. I want to (1) experiment & learn, and most importantly (2) show what is possible with very simple technical architectures. HN is the ideal place to "show and tell" these sorts of projects.