

Ask HN: SaaSers, how do you backup your customer's data? (details please) - marcamillion

Everybody that runs a SaaS app, or some online web business, where the safety and security of your customer's data is paramount, how do you secure it and back it up?<p>I would love to know your main host (e.g. Heroku, Engine Yard, Rackspace, MediaTemple, etc.) and who you use for your backup.<p>Be as detailed as possible - e.g. a quick overview of your service and the data you store (images for instance), what happens with the images when the user uploads them (e.g. they go to your Linode VPS, and posted to the site for them to see - then they are automatically sent to AWS or wherever, then once a week they are backed up to tape by the managed hosting provider, and you also back them up to your house/office).<p>If you could also give some idea as to what the unit cost (per GB/per user/per month) of storage is - on average, I would really appreciate that.<p>Getting ready to launch my app, and I would love to get some more perspective on the nitty gritty details involved.<p>Thanks!
======
RyanGWU82
I handle operations for a relatively large online collaboration service. We
have nearly 200GB of relational data in MySQL databases, and about 20TB of
user-generated content.

We own our own server hardware, so we have a few file storage servers for the
user-generated content. These servers run MogileFS, which is configured to
replicate 3 copies of every file across the cluster. Since we have copies on 3
different servers, we don't use RAID on the file servers. Storing 3 copies
means that we can handle a drive failure, or take a server down for
maintenance, and still have 2 copies of every file.

Every hour, we perform offsite backups with a homegrown script. The script
looks for new files and makes a backup on an offsite server. This script keeps
its own database of exactly which files have already been backed up. (We could
just use the timestamp to determine which files to copy, but that wouldn't
allow us to "backfill" backups when needed. We could also just check the
remote server for the presence of each file, but with 150 million files in
MogileFS, that would take forever!)

Our remote backups are encrypted using a symmetric key. This is actually quite
elegant. We have a single master backup key. The private key is stored in a
safe deposit box, and the public key is available on all of our file servers.
Backups are encrypted using the master public key. This means they can all be
decrypted with the master private key, but even if our servers are all
compromised, the attacker would not be able to gain access to the encrypted
backups.

We use Backup Manager to handle our MySQL database backups and other
miscellaneous server backups. Backup Manager is awesome -- it handles
everything automatically, like encrypting the backups and sending them off-
site. Backup Manager performs incremental backups of specified directories on
the servers, and runs mysqldump for our databases.

You can even pipe other tools through Backup Manager -- for example, I'm
testing Percona XtraBackup with Backup Manager for faster MySQL backups.
(Technically, XtraBackup doesn't speed up the backups -- it speeds up
restores. Without XtraBackup, it was taking almost a week to restore our 40GB
MogileFS database. It helped to tune MySQL, but even then the restore
performance was not adequate. I've just started testing XtraBackup, which may
allow us to restore the database in hours, not days.)

I strongly recommend using Backup Manager -- it's ridiculously quick to set
up, and handles all the mechanics of establishing remote backup. I've even
started using it on two of my own personal servers, storing encrypted backups
on Amazon S3. Until your data set is very large, Backup Manager should be able
to handle everything for you, including server config, source code, and user
data. It's automatic, encrypted, and very simple to use.

~~~
andrewcooke
typo: "a symmetric" (an asymmetric?)

Would you consider using duplicity instead? Unfortunately the site seems to be
down (which might be one answer...) <http://duplicity.nongnu.org/>

~~~
RyanGWU82
Errr... we use public-key encryption. :)

I haven't used Duplicity myself so I can't really compare it to Backup
Manager. I've been super pleased with Backup Manager.

~~~
andrewcooke
public key encryption is called "aysmmetric". that's because you have two keys
(one public and one private).

symmetric key is when you have a single key (and have to keep it secret).

------
radu_floricica
A small shop, with a few custom web apps:

\- We have several virtual servers for hosting. Most of them from the same
hosting provider (close geographically), but also a small Slicehost server
used mostly for backup and DNS.

\- A cron script does mysqldump for each server, and targzips the applications
themselves, including uploaded content (daily, at about 4 in the morning)

\- An hour later another script makes copies of the backups to at least
another server. In one instance, it encrypts the mysql dump before copying it.

\- There's backup history for the sql dumps, but not for the application
archives (too big). Backups older then 7 days are deleted.

\- A script on my desktop computer (run on restart) downloads the last
backups, and also synchronizes locally stuff to big to be copied (WinSCP)

\- There's also at least one form of more "immediate" backup. In the
application logic, most updates are stored in the database. More than once
this has been useful, usually for finding and repairing user mistakes.

------
s3graham
Main site is Linode.

MongoDB slaved to offsite location (my basement right now!), and rsync other
misc filesystem-based data daily.

Weekly dump to S3 in case of crazy double failure.

I might just drop that and switch to the Linode backups thingy, but I was a
bit nervous about what a restore would actually look like in that case.

(edit, obviously I'm still rinky-dinky so don't put too much stock in my
answer :)

