Every time there is an outage at some cloud provider, I enjoy knowing that my si...

devhead · on Sept 14, 2017

to be fair, it's a static html blog; while most of us who are now experiencing issues are running a bit more complicated set of services.

edit: punctuation

mrb · on Sept 14, 2017

Actually dynamic (blog comments.)

But yeah, you are right it's a very, very simple site.

devhead · on Sept 14, 2017

sorry, i thought i read you were using flat files for those.

brandon272 · on Sept 14, 2017

I guess the part I'm confused about here are the DNS records and DNS pinning. If your zone returns 1.1.1.1 and 2.2.2.2 and 3.3.3.3 as IP addresses to use, and I'm browsing your site while resolving to 1.1.1.1 and 1.1.1.1 goes down -- your site will appear down for me, correct?

My browser won't automatically try 2.2.2.2 or 3.3.3.3, or would it?

ErikDub · on Sept 14, 2017

Yes both Chrome and Firefox would automatically try the next IP when I checked a couple of months ago. We used to use this for load balancing, returning the IPs of our servers in random order. Even when one of the servers would go down we wouldn't loose any traffic as browser would just use the next one.

fulafel · on Sept 14, 2017

Not the op, but won't browsers fall back to the other IPs? In olden days some browsers would not handle multiple IPs per DNS record correctly, hopefully it works now.

jsmthrowaway · on Sept 14, 2017

It still depends, and it's a combination of two behaviors: (1) what a browser's resolver does when it receives multiple A/AAAA records, with some selecting the first unconditionally and forcing authoritatives to play spin-the-data and (2) failure behavior, both between timeouts and positive failures -- the difference between layer 3, 4, and 7 failures also comes into play. What happens if the connection resets after a single byte? What happens if a positive refusal comes back? What happens with a warm cache for the domain? etc. etc. The failure matrix explodes very quickly, and I seem to recall there being several hundred scenarios to test last time I looked into this seriously.

Last time I researched this, behavior was quite different across the board, and it's something one should test extensively when designing HA for HTTP. In some situations, the same browser on another platform will defer to the system resolver versus its own, for example, which will potentially change behavior #1 even for the same browser. Mobile is starting to perform weird tricks with TCP, too, so you really have to dig into this one to do it right. Then throw in HTTP/2 and you've magically created yourself about a decade of justifiable work ;)

brandon272 · on Sept 14, 2017

Not sure, that's what I'm wondering. Sounds like a good solution if they automatically do fallback. I also wonder how browsers behave depending on how the server at a particular IP address is responding, as they can respond in different ways. (i.e. a server might respond with an error, accept a request but timeout on a response, or appear totally unreachable)

Any HA solution I've seen that attempts to reliably achieve this five nines capability relies on network-level things like virtual IP's and what not. And I don't consider it a five nines solution if only some customers can access it. How the browser behaves in this case could be critical depending on how your visitors use the site. I would not consider a site "up" if it's only available to some people and not others.

jsmthrowaway · on Sept 14, 2017

> And I don't consider it a five nines solution if only some customers can access it.

Well, that depends on the SLA/SLO, which is really what "nines" is speaking to. Intuitively I agree, but it can, realistically, not be the case and be "valid". Doesn't make it right. Just is.

d3ad1ysp0rk · on Sept 14, 2017

The ROI on keeping a personal blog up at 5 9s is awful... I understand the desire from a geeky perspective, but it's only really useful for personal enjoyment of the challenge, or bragging.

scott_karana · on Sept 14, 2017

S3 is only three nines, and EC2 is 99.95%.

That's almost certainly lower than combining reputable VPS providers geo-redundantly, and likely a much higher cost for the "convenience".

And it's not just for geeky pride. You learn the most when things break. Far too many funded startups run poorly architected apps in a single AWS EZ, unlike GP.

https://aws.amazon.com/ec2/sla/

https://aws.amazon.com/s3/sla/

killbrad · on Sept 16, 2017

What is EZ and GP?

scott_karana · on Sept 16, 2017

I meant to type "AZ", availability zone.

GP is "grandparent", the poster who started the line of discussion. (mrb)

jsmthrowaway · on Sept 14, 2017

[flagged]

mrb · on Sept 14, 2017

«.01RPS»

Off by 6 orders of magnitude: actually 10,000 rps sustained for a few hours (2,500 page hit/sec × 4 req per page). These are my—admitedly very rare—peaks when the blog gets slashdotted.

jsmthrowaway · on Sept 14, 2017

[flagged]

johnfn · on Sept 14, 2017

> This isn't a competition (you'd lose)

This is particularly bad discourse imo.

293984j29384 · on Sept 14, 2017

This blog hurts my eyes to look at...

javabean22 · on Sept 15, 2017

Who cares?