
Designing AWS Architecture to Withstand Outages - danoprey
http://madeiracloud.com/blog/introduction-to-high-availability-deployment-on-aws
======
TomFrost
We've seen floods of these "What you should have done if your application was
interrupted Friday night" articles, but a common theme is to leave out the
fact that many of these sites require synchronized data storage, which Amazon
doesn't support between regions. If your organization is shelling out the
money for something like Riak, Oracle, or the mess necessary to support such
an architecture though the likes of MySQL or Postgres, this is all well and
good (save for the data transfer overhead costs).

If your application is architected to use SimpleDB, DynamoDB, SQS, or even
RDS, these simple "You should have been using Route 53 and multiple regions"
articles get increasingly frustrating. Most applications simply aren't
elementary enough to fit into that boilerplate structure, and getting around
that fact either requires switching away from Amazon's managed database
services (lots of money) or writing synchronizing scripts that play nicely
with your stack and launching them on additional servers (lots of money _and_
lots of resources).

While ideally I'd like to see Amazon release some sort of feature for multi-
region sync, it would be interesting to see how the tried-and-true multi-
region businesses have approached this problem.

~~~
danoprey
Completely true, there's a lot to be written on the subject and there are some
great detailed posts out there.

I think Google Compute Engine does exactly that, with a direct connection
between regions so the entire system works as one network. I'm not sure if
that will be the case outside the US, though.

~~~
rdl
I don't know if I'd trust Amazon to run links between their data centers. They
had a couple routing related outages just this year alone (which lasted for
some time), and adding complexity would make that worse. You could easily lose
interconnections between hosts in AZs in distinct 3 regions, partitioning
everything, making it irrelevant that all 3 actually stay up and are
accessible to outside users.

------
rytis
The only problem I have with this, is that for a really simple system
consisting of 3 components (web+app+db) now I need to deploy (and pay for!!)
24 components.

Yes I know, business continuity and stuff. Still, just doesn't feel right
somehow.

~~~
jasonkester
Most businesses can make the conscious decision to go with the simple
3-component option and accept that they might only get three or four nines of
uptime, depending on how flaky AWS decides to be that year.

If you run a service where the worst case scenario for your site being down
for an hour on a Thursday afternoon is "millions of dollars are lost" or "our
high profile customers go out of business in a way that's directly traceable
back to us" then yes, you need all 24 of those things.

If, on the other hand, the worst case scenario for your site being down for an
hour on a Thursday afternoon is "some of our customers have to _manually_ post
to Facebook so that their friends know that they've been for a run", you can
probably shave about 21 nodes off that diagram.

~~~
danoprey
Well said! Often the right thing to do in an outage is "wait it out".

------
benjaminwootton
Off topic but their product, madeiracloud.com, looks very slick.

Does anyone have any experience with it?

~~~
danoprey
Have a try, no credit card or AWS credentials required ;-)

We're still in early days but we did a Show HN a little while ago:
<http://news.ycombinator.com/item?id=3808031>

------
Loic
Interesting, but not at all new. A really interesting article on the subject
would be how to do it while having nearly consistent data across the data
centers. In the case of the proposed approach, you have simply no
communication between the DCs.

~~~
danoprey
Hi Loic. I agree, only meant to serve as an introduction to the subject for
people who are not familiar with it, which seems to be way too many people.

