

Ask HN: What is your startup's backup policy? - vaksel

How often do you backup?<p>Is it automated or manual?<p>Do you run your own script, or use some opensource/paid solution?<p>Do you encrypt your backup data?<p>Which files do you backup? DB? source? images?<p>How many and which sources do you backup to?
======
patio11
I rely partially on backups and partially on wide distribution of key data.

There are at least 5 accessible copies of my source code at any moment, for
example: my laptop, my server (source control), my server's backup from
yesterday, my server's backup from a week ago, and my per-release In Case Shit
Happens tarball chilling on a Google server farm somewhere.

Which means if my apartment burned down on the same day that Slicehost's colo
facility got hit by a meteor, I would still have a copy left.

Customer records? Important but not as important as source code. I have five
redundant copies of every transaction: one at the payment processor, one at
e-junkie.com, and 3 in my database and its backups.

Email? Much less important to me. I pay Google to worry about it and assume
they will be competent.

Analytics data? Nice to have, wouldn't cry too much if I lost it all tomorrow.
Mine gets backed up as a side effect of backing up the other stuff. Google
Analytics and Clicky also keep historical data for me, and I rely on them to
make sure it doesn't vanish.

None of this is encrypted (except to the extent my service providers do it --
Paypal certainly does, for example). I don't store customer billing
information, so the most sensitive data I have is a list of names. (Source
code? Pfft. Very little of my business value is in the source code.)

~~~
jwilliams
I'm just curious - You say that source code is the most important thing to
backup, but at the end you go on to say that there is very little business
value in it. What's the distinction you're making here?

~~~
parenthesis
I think he means that he doesn't worry about encrypting third-party-hosted
backups of his source code, because the code alone wouldn't be of much value
to _someone else_ , whilst it is very valuable to himself.

~~~
patio11
Got it in one.

------
gcv
For the S3 users here, if you don't mind sharing: how do you actually do it?
Do you use duplicity, Jungledisk, s3rsync.com, the s3sync utilities, tarsnap,
something homegrown? Do your backups require incremental updates, snapshots,
encryption?

~~~
aschobel
We have a trivial implementation that does a full backup and prepends the
date/time stamp to the backed up filename.

We call it from crontab to run once a day with the following:

java -jar s3backup.jar /etc/iptables-save /etc/lighttpd/lighttpd.conf ...

You just list out which files you wanted backed up, and then they show up in
your bucket as: yyyyMMddHHmmssZ-filename

Code is here: <http://3banana.com/pics/S3Backup.java>

You need Amazon's Java S3 libary which you can snag from here:
[http://developer.amazonwebservices.com/connect/entry.jspa?ex...](http://developer.amazonwebservices.com/connect/entry.jspa?externalID=132)

If you want to toss it into a JAR and simplify startup use the following META-
INF/MANIFEST.MF

    
    
      Manifest-Version: 1.0
      Created-By: IntelliJ IDEA
      Main-Class: com.threebanana.S3Backup

~~~
silvestrov
You forgot to close the BufferedInputStream bis = new BufferedInputStream(new
FileInputStream(file)); so you'll run out of file descriptions if backing up
too many files. And bis.read(jpgByte); doesn't necessary read all of the
files; at least som java versions on some Solaris versions doesn't.

~~~
aschobel
I'm so incredibly sorry about that. The contract for InputStream.read is very
clear, that was needlessly sloppy.

New version is up. Also includes the unit test that should have been there in
the first place. _grumble_

<http://3banana.com/pics/S3BackupTest.java>

------
aschobel
For customer data, we do daily automated encrypted backups to S3.

We also do more frequent backups to an internal server during the day.

Our code is in SVN, which is also backed up to S3.

We have scripts that take a vanilla etch build and prepare it for production.
That way we don't have to worry about backing up the OS.

------
andrewljohnson
Our policy is outsource the IT to WebFaction. They RAID our data and back it
up daily. Thanks WebFaction!

If you are a start-up that doesn't have serious SysAdmin talent on-board, the
only logical call is to outsource every machine possible. Even if you are
ludicrously good hackers, you need to have a true Linux money, who isn't
needed elsewhere coding, to consider not outsourcing web and email hosting.
And you should back-up anything important on your personal computers to a
managed box.

I really can't say how deeply I believe this. I've worked with multiple start-
ups, and the minute you start trying to host your own website, do your own
backups, serve your own emails - you're asking for trouble. Even if you set up
everything right, just the time dealing with the computers could be better
spent changing the world

Whatever you do, don't host actual email boxes in house. A telephone pole will
get hit by a truck, and you will lose email for two days, and no amount of
tech smarts will get you out of that. Trust me, it happens.

You can get a slice of a computer for $10 bucks a month these days from
Webfaction - which comes with tech support that extends from installing
PostGIS, to an optimized PostGres install, an optimized Apache static content
server, and answers to tech support within the hour, even on the weekends. As
soon as a machine you manage costs you an hour, you're financial decision has
become a poor one.

Also, they have better security than you.

~~~
jd
I've seen how seriously some providers/hosting companies treat backups. Let's
just say that deleting backups to free up space for new accounts was
considered perfectly OK.

I completely agree it is something you -want- to outsource, because, exactly
as you say, it's not something you want to spend time on. But you depend on
your servers so completely that you become completely dependent on your
provider.

What if your provider goes bankrupt? You'll have to move your server / move
the data and nobody knows how it used to work. At that point you have very few
options left. What if a tech decides to upgrade the kernel and the server
doesn't boot anymore? Oops. In general, the linux techs aren't paid very well,
are often overworked, are expected to fix problems at any point in the night
and they don't particularly care about your server (no more than they care
about the 100 servers next to it). Now add root access to the mix and watch
what happens.

If nobody in your startup team knows how to administer a server then you're
taking a risk. A big and unnecessary one.

ps: saying something like "an optimized Postgres install" makes no sense.
Optimization is for specific tasks. So unless the techs know exactly what kind
of queries you run they can't do much optimization. They can however, easily
cripple sql performance.

------
cperciva
I do daily backups, automatically, using my own script which is a paid
solution; my data is encrypted (and signed), and I back up complete
filesystems minus easily re-downloadable stuff (e.g., the FreeBSD ports tree),
to one location (unless you mean geographic location, in which case tarsnap
counts as 2 locations).

------
tdavis
Server: iSCSI cross-country replication, snapshots (automated)

Desktop: Time Machine (hourly; automated)

All code is hosted at GitHub; I just sort of assume the EngineYard people know
how to back stuff up. Google for e-mail, though it is locally mirrored via
IMAP.

------
there
rsnapshot (<http://rsnapshot.org/>) against all of our openbsd servers over
the internet (via ssh) to an off-site machine. runs every few hours and keeps
6 hourly, 7 daily, 4 weekly, and 3 monthly backups. once a server's initial
full rsync is done, the incrementals finish very quickly even on servers with
lots of changing data (mail server messages, web server logs, etc.)

regular mysql dumps are taken from all databases on all servers every so often
in case the rsync'd mysql binary files won't restore.

------
shizcakes
Anecdotally, it seems like S3 is pretty popular here for backup purposes. Does
anyone care to enumerate any pros/cons they've experienced?

~~~
anotherjesse
Make sure the tool you use to post the data uses the Content-MD5 header!

Many of the first backup scripts and libraries didn't use this corruption
detection feature.

~~~
chadr
Which tool(s) do you recommend that do this?

------
truebosko
Source code is on a mix of Github and Subversion servers

Customer data (basically, SQL data) is backed up daily, archived, and put up
encrypted on Amazon S3 with the date/time appended to it. The files are tiny
so we have no issue with keeping year-old ones there as the costs are
minuscule.

All other things like documents, staff related items, pictures of products are
stored on a single server, that uses rdiff-backup and sends it to a second
drive on the same pc. rdiff-backup is very nice as we've had a few instances
where we needed to fetch a file from 2 months ago that was heavily modified
since. The version history helps a lot

------
jbyers
MySQL: replication across VPN to server in a different state. Week's worth of
daily full backups (innobackup), compressed, encrypted, periodically sent to
S3. As much binlog history as we have disk to keep.

MogileFS: real-time encrypted backup to S3 (in addition to multiple local
copies). This covers all of our blob-like data, keeps MySQL relatively small.

Systems: logs nightly synced to S3, otherwise no data to back up. One puppet
script away from reconstruction.

Email: local server rsync in addition to procmail failsafe copies.

Code: svn master encrypted and sent to S3 nightly.

We use boto to access Amazon services.

------
dcurtis
Daily + hourly + monthly backups to S3, my local machine, and a USB flash
drive that I keep in my wallet at all times:

[http://www.amazon.com/KingMax-Microsoft-Certified-Drive-
Wash...](http://www.amazon.com/KingMax-Microsoft-Certified-Drive-
Washer/dp/B001AMI91E/ref=pd_cp_e_1?pf_rd_p=413863501&pf_rd_s=center-41&pf_rd_t=201&pf_rd_i=B000JWTQLI&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=1J5Q01PC3FNSYBHYMEN7)

------
InVerse
when i first read the title three meanings came to mind:

1\. how do you back up data?

2\. how do you back up 'progress' (eg, code commits, releases)?

3\. what is your plan B (eg, this startup failed, what do i do next)?

maybe it's just me, but verbal communication is funny.

------
pclark
we keep local (on our web box), networked (NFS) and S3 backups of a) our
entire server build, b) our code base, c) our database dumps, d) our email &
configs.

we backup nightly (we backup at the "quietest" period of last nights activity
on the subsequent night)

on our local machine we can keep: yesterdays backup, day before yesterday, end
of last week, and start of month backups.

it's entirely automated, except for the bi-monthly backup check (I CANNOT
STRESS THIS ENOUGH - CHECK BACKUPS!) where we fully restore our data and code
on a spare box.

by code i mean live code - we also backup our git working code base.

------
bemmu
I do an automatic daily dump of all databases and an automatic rsync of the
dump and all code, git repository + other files to a disk in a different
location. Surely this could be improved upon, but I think it's a good start.

------
bkbleikamp
Nightly backups to S3 and to our in-office dev machine. We also all have
copies of the code base that we are constantly pulling from Git, so the only
thing to really worry about is DBs.

------
hs
i use mercurial for everything (code, images, generated html/jpg, etc) ...
maybe i should put some in .hgignore especially the generated bits

so there are at least 2 copies (colo and desktop) ... and dvd

i do still tar on structural disruptions (jquery update, openbsd upgrade, data
structure change, etc)

that's per site, every morning, semi-automatic (i still prefer to ssh and then
manually hg update _shrug_ )

------
zitterbewegung
I use time machine to back up data on my mac os x box. I haven't launched yet
but if I did I would create a mirror on S3 since I am going to use EC2.

------
jm3
for in-progress code and data, we use Dropbox for everything, with two
geographically segmented local caches. recovery takes only a few minutes.

------
gibsonf1
We backup all data to S3 every 30 minutes. We are a bit paranoid about data
loss as we host business critical data.

------
trickjarrett
I do automated backups of the dev environment nightly to an external hard
drive and to an ftp location off site.

------
sreitshamer
I put everything in git repositories and push changes to a clone on my
slicehost slice.

------
iheartrms
I backup to S3 using bacula and an s3sync wrapper which I made called
s3-backup.py

------
iheartrms
Oh, and for my purposes I do daily incrementals and monthly fulls.

------
charlesju
GitHub + Dropbox

------
viggity
two chicks at the same time

oh, oops, wrong question

------
ahoyhere
We backup a complete snapshot of our customer data every hour to S3, via a
Ruby cronjob.

We should be able to keep up this strategy because it's "just" a time tracking
app (<http://letsfreckle.com/>) with fairly lightweight data.

I was originally worried that once we got into production, this'd cause a
performance drop every time it ran, but so far it's all good.

As for our codebase, we're on github and our own 4 laptops. But thinking about
it, we should have another system in place for that. _looks around nervously_

