

 When is the right time to introduce high availability for web site? - dennisgorelik
http://stackoverflow.com/questions/6338610/when-is-the-right-time-to-introduce-high-availability-for-web-site

======
DenisM
There are two questions here - data integrity and service availability.

To the first question - daily backups are too rare, consider incremental SQL
log backups throughout the day, like every 15 minutes.

To the second question - 5 minute downtime is likely a lot less important for
your business than you think. But your hardware will fail at some point and
cause several days of downtime it takes to order a new machine. It's probably
not acceptable so you will need a standby machine.

Personally I run all my SQL Servers on AWS EC2, so if one of my servers fails
I can simple reattach the EC2 EBS volume to another machine and I am back in
business a few minutes later. I also do EBS disk snapshots every 15 minutes.
Even if entire availability zone fails, I can restore the EBS disk from a
snapshot and get up and running in about an hour.

The setup you are describing will work as well it just seems very capital
intensive with all the hardware and software licenses. And you're still prone
to Geo failure (power failure, flood, tornado, fire, theft).

~~~
dennisgorelik
DennisM - how much that AWS EC2 solution did cost to you in terms of time
spent on setup and testing and hosting fees?

~~~
DenisM
Setup is a breeze. Launching a new instance takes 10-15 minutes, and from
there it's well, the same old story of installing recent OS updates and your
own stuff.

I split this in two parts - "provisioning", that is installing generic stuff
(IIS, SQL Server, python for scripts etc), and "setup", that is installing my
own databases, scripts, and ASP.NET apps.

So, I have created an instance, did "provisioning" on it, and then created a
"pre-canned provisioned image". Now I can create as many EC2 instances from it
as I need, and only thing that is left is deploying my own stuff - that takes
one-click and however many minutes it takes to copy the data. Actually, the
data does not always have to be copied - if you are tearing down old machine
you simply reattach the EBS volume and it takes zero seconds. If you are
creating a copy of older production machine, you can create an EBS volume from
a production snapshot, but that volume is available immediately, populated
from the snapshot in the background, prioritizing parts of the volume you're
trying to access. Hence the volume is immediately online, but is very slow at
first.

Re cost, please have a look at this blog post of mine, in particular the TCO
columns of the spreadsheet:

[http://blog.altudov.com/2010/11/03/amazon-ec2-reserved-
insta...](http://blog.altudov.com/2010/11/03/amazon-ec2-reserved-instance-
cost-breakdown/)

------
hrasm
At 8k USD a month, each day is approx. 267 dollars in revenue on average. So,
downtime of 1 day due to a catastrophic event is that much loss. That is the
base cost. However, how many premium members are going to be angry if the site
goes down for a day? Support costs for replying to angry emails? Losses in
member cancellation (I would consider that permanent...they are not coming
back)?

IMHO, a standby server is worth it.

~~~
dennisgorelik
1) Are you suggesting Standby server instead of SQL Server Cluster?

2) Another alternative could be RAID 1 or RAID 10. Catastrophic failure of two
drives at once is extremely unlikely. Still there would be occasional
maintenance downtime, but it looks like maintaining RAID solution would be
easier than Stand By server.

~~~
hrasm
My apologies. I misread the whole thing as a question about fault tolerance
(which is not a bad thing to consider at all IMO). RAID will certainly help in
your case. You need to come up with an estimate of how much your losses will
be if you have a single point of failure. Usually, paid-for services have this
factored into their pricing model.

But as another comment suggests, unless you are grumpy about Azure ToS or US
laws (maybe Azure services exist outside the US) or whatever, you should take
a look at Azure services.

~~~
dennisgorelik
Would Windows Azure allow me to avoid downtime during windows upgrades?

------
vyrotek
Have you considered moving to Windows Azure? You might be able to get high
availability for less than you're paying now.

My company runs on .Net and SQL Azure with multiple web and worker roles. Its
worked great for us so far.

~~~
darickard
I second this. Spend your time focused on your app, not your hardware.

------
patrickgzill
I wouldn't bother setting up all the fancy stuff. Get a second machine,
install HyperV on both and set up live migration.

Install your web app on a Windows 2008 VM on serverA ; move the VM to ServerB,
reboot serverA, then move the VM back. For applying patches to the VM image
itself, I figure it should reboot in what, about 60 seconds or less?

You can get fancier, have separate dev, staging, production VMs etc., adding
web balancer and what not later. Then when you want, you can go all-VM should
you ever tire of the hardware maintenance.

