
Keep a static "emergency mode" site on S3 - jonthepirate
https://coderwall.com/p/68ozza
======
colmmacc
(I work on Route 53).

As __lucas has mentioned, this can be achieved with Route 53 Failover, which
we'd recommend.

Route 53 Failover is (hopefully) pretty easy to configure. Just mark your ELB
as the primary and enable target healthchecks, and add the S3 website bucket
as the secondary. We'd also suggest that you use an S3 website bucket hosted
in a different region than your ELB. This should take no more than a minute or
two of setup in our console.

The difference is makes is that Route 53 failover doesn't depend on any "make
this change" control plane ; the status of every Route 53 healthcheck is
constantly being polled from at at least 16 locations and then sent (via 3
redundant networking paths) to every Route 53 DNS server. The system is always
operating at full-load, with a tremendous degree of fault-tolerance and
redundancy. We hope that makes it very robust against very "hard to fix"
internet and power problems, and also API outages.

So for an awkward "worst case" example; if there were a large networking
outage in an intermediate transit provider your customers might not be able to
reach your ELB, and likewise you may not be able to reach the API to make the
changes necessary. Route 53 failover should work anyway by detecting the
reachability problem and flipping to the pre-configured secondary - an action
which is triggered at our edge sites.

If you'd rather not use Route 53 as your primary DNS provider that's ok; all
of the above can still be achieved by using a dedicated zone on Route 53 just
for managing the failover, which you may then CNAME to, just as with ELB. Each
zone costs $0.50/month. Of course we'd also like to make this kind of
functionality easier to use and built-in, and that's something we're
constantly working on.

~~~
jedberg
Except that his way all the traffic instantly switches, and your way you have
to wait for DNS propagation, which about 15% of the users on the internet will
not pick up for over a week.

DNS is an awful way to do failover.

~~~
colmmacc
To your point ; it's ok to do both.

Route 53 supports DNS TTLs as low as 0 seconds. ELB and S3 endpoints both have
60 second TTLs. My experience with flipping names like www.amazon.com doesn't
reflect the 15% figure. I've seen about 97% of web traffic honouring the TTL
and flipping quickly. Within 5 minutes almost all of the rest too. We also
take CloudFront sites in and out of service for maintenance, and in 5 years
I've never anything like a 15% straggler effect.

That said, we do see a very small number of stragglers. While resolvers over-
riding TTLs hasn't shown up as a significant problem, buggy clients can be; we
come across clients now and then who either don't re-resolve ever (Various
JVMs and their infinite caches are a common cause), or only re-resolve on
failures (which is fine for failover, but not great for traffic management).

If you have a distribution time plot for the 15% figure it'd be interesting to
see; [https://www.dns-oarc.net/](https://www.dns-oarc.net/) would be a good
venue, [https://lists.dns-oarc.net/mailman/listinfo/dns-
operations](https://lists.dns-oarc.net/mailman/listinfo/dns-operations) is the
open list. Ignoring TTLs for a week is very concerning; it would very likely
break many DNSSEC configurations. Is it possible you were dealing with robots?

~~~
jedberg
When we flip Netflix domains we see about a 15% straggler effect (although to
be fair only about 3% take a week, but many take around 24 hours).

~~~
revertts
How much of that 15% is driven by ISPs versus misbehaving clients? (eg. set to
boxes)

------
mayop100
Or better yet -- make your entire site static to begin with! This is how our
site works (firebase.com). Our entire site is static content that's generated
at deploy time and hosted on a CDN. Dynamic data is loaded asynchronously as
needed. If a server were to go down, at least all of the static pieces (which
is most of the site) would be unaffected.

We use Firebase to power the dynamic portions (obviously), but you can use
plain old AJAX requests as well.

The age of the dynamically-generated HTML page is coming to an end.

~~~
adrr
You still need dynamic "pages". Things like geoip redirection or localization
etc. You could in theory load this stuff via ajax, but this doesn't work for
search engine crawlers. But i do agree most stuff can be static, you can push
json on fragments onto s3 and have the web page fetch them via ajax.

~~~
Bockit
We've done this for a lot of data vis work. Clients have access to a cms which
lets them stage and publish their data. Doing so puts JSON files on s3 where
we also serve the site. There are some trade offs, sometimes you miss having
that rest api, but you also gain a lot too.

------
__lucas
You can do this now automatically with Route53 DNS failover
[http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns...](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-
failover.html)

~~~
krallin
Yup - and Route 53 can now even serve the Zone Apex to an S3 bucket!

------
giovannibajo1
A drop-in solution to achieve the same is using CloudFlare as CDN for your
website. CloudFlare has a configurable "Always Online" mode that is
automatically triggered whenever your site is down, that shows the user an
offline version of the website, together with a warning message.

Obviously, if you're using CloudFlare in the first place, chances are that you
won't have too many problems with high peaks of traffic anyway. But obviously,
it can still happen, depending on how high the peak is :)

[https://support.cloudflare.com/entries/22050652-What-does-
Al...](https://support.cloudflare.com/entries/22050652-What-does-Always-
Online-do-)

~~~
Afforess
replying to 23david, not the parent. 23david you appear to be shadow banned.
It seems to be recent, might want to get it checked out.

~~~
23david
thanks for letting me know. emailing the mods.

~~~
Afforess
glad it's cleared up. Seemed innocent to me :)

------
svmegatron
This is a totally awesome idea. You are still in trouble if the load balancer
has problems or S3 has problems (unlikely, but _not_ impossible!). It's always
smart to have a couple of ways of failing over to something if your main site
has problems - for instance, I'm always surprised that more people don't spend
time customizing the default rails error/maintenance pages for heroku.

~~~
pquerna
I previously wrote about making a downtime page:

[http://journal.paul.querna.org/articles/2009/08/24/downtime-...](http://journal.paul.querna.org/articles/2009/08/24/downtime-
page-in-apache/)

One thing you'll probably want to do is set 5xx HTTP status codes on your
static site, and to try to get things to not cache it as much as possible.. A
redirect to a subdomain hosted on s3 makes it more likely something like
Google will pick up on it.

------
brianmcconnell
App Engine is also a good way to do this. Let's say you have the bulk of your
site running on AWS (assume for whatever reason you don't want to use GAE as
your primary environment)

* Have a heartbeat task on GAE that polls your server, and if its running, write to "serverisrunning" on memcache with a short time to live.

* (normal operation) : redirect initial visitors to your AWS site.

* (server not responding to a heartbeat task on GAE, or memcache miss), serve static content from GAE, or a limited functionality version of your app hosting on GAE (for example, a static site with signup form).

This type of setup has the added bonus of automatically detecting an outage
and responding to it. While App Engine has its own downtime issues, outages
are transient. Since they migrated to the high replication data store, I
haven't seen anything that lasted more than a few minutes.

------
bobfunk
S3 works for static websites but in general the latency without Cloudfront in
front is not that good.

I'm actually working on something that'll make it incredibly fast to get a
static site up and running with a powerful CDN and get form submissions
working. Will be up at [http://www.bitballoon.com](http://www.bitballoon.com)
soon.

------
EGreg
Yeah but the big question is, how can you switch the DNS over in time when
NONE of your servers can respond fast enough?

~~~
Retric
You still need to update DNS ASAP, but unless your dealing with a physical
flood/fire you can often get something to respond on the old IP. So, depending
on the type of failure your dealing with, often setting up a static redirect
is viable for insane levels of traffic even if your hosting it off of a single
underpowered CPU and limited bandwidth.

Aka the site has been running off your FIOS connection and a spare CPU the
suddenly the taffic spikes 5,000% what do you do? Host a single short test
only, sorry we where not ready for prime time please come again, or redirect
to a nice scalable static site you can manually update with nice pictures of
your total failures / requests for donations or whatever.

~~~
EGreg
Why don't you just always have your site static and connect to your back end
as necessary? Treat it as a webservice with uptime, etc.

~~~
Retric
Because designing V0.1 of your application from the ground up based on edge
cases is a Great way to never release anything. Spending a weekend setting up
a static failover on the other hand has no long term downside and let's you
put off worrining about a host of those edege caeses without any real down
side. It's like buying a UPS for your dev box it's probably never going to
matter, but it's cheap so feel free.

~~~
EGreg
Actually it's a great way to simultaneously design a website AND an api for
others to use. It's also a great way to separate concerns. It's also a great
way to reduce load on your server. In fact, it's an easy way to have some
people code a standard back end with a standard authentication so that some
other people can make front ends for the web, iphone, and more. You can use,
for example, oauth to authenticate with the back end, from any front end app.

[http://www.discourse.org/](http://www.discourse.org/) is one example of such
an approach

~~~
Retric
I dont't think we are quite on the same page, having an public API etc is
wonderful but let's use a slightly different topic. Using some 3rd party ORM
to talk to your database is generally a no brainer, but v0.1 might not even
have a database yet because persistence is not generally needed for a demo.
Why put off such a core feature, because just changing your objects is less
friction so and the goal is to see if anyone is ineested. aka idea validation
and nothing else.

~~~
EGreg
Because it's just as easy to code your front end independently and then hook
up your back end to it. If the front end is static, it can be completely
hosted on a CDN. If your back end is unreachable the front end can just take
another code path. There's your 0.1

------
frakkingcylons
Great idea. Putting the static site on Rackspace Cloud Files would also be
advisable as an alternative to S3 in the event of an AWS outage.

EDIT: It also takes like no effort to turn on Akamai CDN option for Cloud
Files as well.

------
peterwwillis
Disaster Recovery. It's called a Disaster Recovery Site.

[https://www.google.com/search?q=disaster+recovery+site](https://www.google.com/search?q=disaster+recovery+site)

------
reeses
Your origin is extremely slow. Perhaps this is an artifact of the HN rush, but
it's slow enough that I would be looking for ways to improve home page
response time.

Ideally, under normal conditions, your 'active' landing pages should be as
fast as your static maintenance page.

------
gregd
Why not just do an origin-pull via CloudFront? No need to build a static site
on S3.

~~~
reeses
You would need to make sure you still build the 'static site generator' on
your current site (so login, search, and any other functionality dependent on
your app is not exposed).

This is relatively easy, and could possibly even just use CSS with the
understanding that yes, someone could have a bad experience.

------
mrweasel
What is the point of involving S3? Why not just run the emergency site
directly on the Apache installation doing the rewrite? Unless your traffic is
absolutely massive there shouldn't really be any need for the S3 step.

------
dustingetz
I would pay for someone to take care of this for me. I presently run a GH
Pages static blog but would like complete control of the build. I want to
upload a .zip somewhere and have things just work.

~~~
bobfunk
Mentioned it further down in the thread, but working on a really easy way
([http://www.bitballoon.com](http://www.bitballoon.com)) to get a static site
online and backed by a CDN (and we do support uploading a .zip).

What kind of control do you feel you lack when using GH Pages?

~~~
dustingetz
I want to use clojurescript on the page, (which has a compilation step), and I
don't want to track the build output in source control.

------
gmu3
I remember last year when Kony 2012 blew up for a couple days, they switched
their site to static s3 pages to handle the traffic and collect donations. I
thought it was pretty clever.

------
davidgerard
+1. The question to ask yourself is: "Is Amazon's uptime better than mine?" If
the answer is "yes", use them.

Route53 is also an excellent DNS service.

------
lttlrck
Your primary site has mobile formatting issue (IOS). The maintenance site does
not ;)

------
kmfrk
As someone said in the comments, just keep in mind that S3 is just for
_storage_ , not serving. You'll need something like CloudFront for that,
although I don't know at which degree of activity it's going to save you money
to use it. Maybe from the get-go?

~~~
cenhyperion
That's completely false. Hosting a static website on S3 is very well
documented.

~~~
philfreo
yes but the recommended approach is to put CloudFront in front of it, if you
care about performance at all

~~~
res0nat0r
I think the point of their post, is that this is their "oh shit we are down"
setup, which will consist of a small static site that S3 will happily serve up
to anyone anywhere in the world very quickly. This isn't meant to be a full
wack highly performant copy of their existing website fronted by CDN's.

------
angersock
Very good point--a static site up now can be waaaay better than a dynamic one
that is slow and user-raging.

That said, looking through their linked startups is kind of depressing; the
firstest of the first-world problems.

EDIT: Okay, okay, not all of them, as my salesbro points out.

------
diminoten
> As far as I'm concerned, S3 static file website serving is completely
> indestructible. I only need one bland Apache server to bump requests over to
> it.

Is this generally the experience of everyone here?

~~~
ceejayoz
Nothing you can do is going to take S3 as a whole down, and it doesn't appear
to have EBS dependencies so it generally dodges the bullet when AWS has
troubles. I'm not aware of any significant S3 outage in the last five years.

~~~
vidarh
One would hope not... With the API they offer it _should_ be the type of
service that is reasonable easy to make pretty much indestructible.

------
Pxtl
I'm surprised this forum isn't full of the usual "omg, use nginx instead of
apache for rerouting".

~~~
anderspetersson
If the only thing you're doing is rerouting I'm sure Apache can handle pretty
sick traffic as well.

