

Ask HN: Help me fix your backups - drewcrawford
http://spreadsheets.google.com/viewform?formkey=dFpPNl9FYTFlQVFCQXowWVdhRHF2TUE6MQ&ifq

======
cperciva

      How is your data backed up?
      ...
      [X] I use Tarsnap
      ...
    

Where's the "I _am_ Tarsnap" option?

Somewhat related: Are the responses to this going to be made public somewhere?
I'd love to see what they look like (for obvious reasons) and was thinking
just last week that maybe I should post a survey to HN and Reddit.

~~~
drewcrawford
Lol. I knew it was you as soon as I saw "I use tarsnap" as the only one
checked.

I _love_ Tarsnap. I use you for a lot of very-critical-yet-small files.

My personal pain point is that I'd like to back up a few TB of data but can't
afford to--it would be nice to back up an entire workstation. I'm interested
in seeing if that's a sentiment that others share, or if it's just me.

~~~
cperciva
_My personal pain point is that I'd like to back up a few TB of data but can't
afford to--it would be nice to back up an entire workstation._

Why do you have a few TB of data on a workstation? What sort of work are you
doing?

~~~
drewcrawford
Right now about a quarter of respondents have 1TB or more, so while I'm not in
the majority, it seems I'm in good company. 47% of respondents seem to be in
the 100GB-1TB bracket as of right now.

Here's the breakdown of my data (largish categories only)

==BUSINESS CRITICAL==

* 5GB of codebases I've committed to in the last 3 years.

* 50GB worth of software. Mostly development tools. Five different versions of XCode, I actually use most of them every day (scary, I know…)

* 50GB of VM images I use for development. I could rebuild them, but I really don't want to.

* 80GB on my boot volume. Stuff I would backup if and only if I want a live, bootable backup. Save me a lot of time during a restore.

* 5GB in ebooks; primarily used for reference / searching

* 20GB in music

* 20GB in scanned documents, mostly financial and legal paperwork, contracts, etc.

* 15GB of e-mail

==OTHER THINGS AVAILABLE NOWHERE ELSE==

* 250GB of samples/patches/instruments that I've built over the years and songs I've written (I play keyboards for fun). Not business critical, but I'd be sad if I lost it.

* 20GB of backups from computers I used to own growing up. The nostalgia of that game I was coding in middle-school, etc.

* 100GB in learning resources I've collected over the years that are out of print or otherwise unavailable anywhere else. Not useful to me (anymore), but sharing them with others is valuable to both me and them.

* 20GB of home video

* 13GB in personal photos

==THINGS I DON'T WANT TO DOWNLOAD AGAIN==

* 375GB of Apple Development videos. I refer to these fairly regularly. I typically watch a few each day to keep up-to-date.

* 20GB of Steam games

Looking at this list, sure, I could live without some of it. But a lot of it
is pretty important, and it would be nice to back up most of it.

~~~
aw3c2
Backup the whole thing on harddisks and store one at your bank (if they do
have that capability where you live) and another one at some relatives place.
Encrypt and seal of course.

Update that with "slow" data every month or couple of months. Decide what
needs to be backupped daily/weekly and pay some money for that to do it
online. How much would it be worth to you if you lost it in a second?

------
pwim
As a software development company, all of our code is in a git repository that
has a remote on a VPS. Our documents are primarily Google docs. Important docs
are checked into git.

Nothing we have locally that is important is not also at some remote location.

~~~
drewcrawford
Thanks, that is great feedback. Maybe so much is web-based now that backups
aren't really relevant.

Is there any pain associated with restoring your dev environment? How long
would it take you to get your text editor reinstalled, color scheme set back
up, system preferences back, etc.?

~~~
dlsspy
Almost everything I care about is managed via git. That includes editor and
shell configs and ~/bin -- restoring my own dev environment is so easy, I just
randomly do it on different computers.

Every time I make a change worth saving and want to share it, I push it to
gerrit, which means there are now two copies (one on my laptop and one "in the
cloud" on my gerrit server).

Every time a change is reviewed, verified, and blessed as part of our code
base, the reviewer or verifier pushes a button to submit the change, it's
automatically sent up to github, github fires webhooks that automatically
replicate the data down to a machine in the office, another copy on our build
master (also in the office), and then, shortly after, another copy on every
build slave.

Internally, files are stored via nfs or smb onto a solaris box that takes
snapshots every 15m. Those snapshots are not stored externally.

The biggest part of our data, though, is made up of vm images. Big files that
change a lot. Restoring them hurts, but anything that loads up IO on that box
makes vmware think the NFS server isn't responding and it unmounts and
remounts it (it even does this when nothing's using it -- though the NFS
server itself works fine for other clients during this time).

I wouldn't mind a backup procedure there, but it'd have to not break that.

(sorry to not use your form... the further down I got, the less it applied to
me)

------
larsberg
"The biggest problem with backups is"

Missing option is impact on the computer while I'm using it. All of the
transparent services (especially Time Machine) that I have experience with are
abysmally bad at turning themselves off when the machine is being actively
used.

~~~
altano
That's my favorite aspect of Windows Home Server's backup. It does it once a
day whenever you're asleep. It'll even wake up your machine, back it up, and
them put it back to sleep. And it's smart enough to only do this if you're
laptop is plugged into AC and not running on battery. And if any machine isn't
backed up in a while, you get a Health warning in your system tray. And this
is all pretty configurable.

How's that for low impact.

~~~
larsberg
That's awesome! I'm very jealous.

------
Skyline
Are you going to post the survey results on HN?

I bet a lot of people would find them interesting.

------
apowell
I got about halfway through the survey before realizing it was about
workstation backup, not server backup.

Time Machine + Dropbox makes workstation backup a solved problem.

Slicehost Backup + Tarsnap makes server backup _almost_ a solved problem. And
I say almost because I don't like having no choice but to store my full
bootable backups with the same provider that hosts my primary server. If I
could store bootable weekly images off-site, I'd be golden.

------
jhg
> * Required

> Looks like you have a question or two that still needs to be filled out.

If I want to skip a question, I should be able to do so.

~~~
lincolnq
Agreed. If you give me a survey and mark all the questions "required", if I
find anything I don't want to answer, you're out of luck on the whole survey
because I will just click back.

------
pierrefar
One thing to point out: backups in the cloud are useless if you don't have
internet access when you want access or to restore data.

If you want to fix my backups, make the data readily accessible when there
isn't a net connection between me and the service.

One time this usually bites is if you just (re-)installed a new OS and you
need the network driver, which is only available online. I always keep a
backup of the drivers on my external hard disk for this situation.

Another service I worry about is Gmail and other Google apps. With email you
can easily download the several GB of email onto your hard disk, thus keeping
a backup for when you or the service are offline.

Dropbox solves this nicely: it's a local folder that's synched to the cloud.
If Dropbox is down or I'm offline, I still have access to my data.

------
lovskogen
Dropbox solved everything backup related for me, and I love it.

~~~
sreitshamer
On the Mac, Dropbox doesn't do so well with restoring metadata:
[http://www.haystacksoftware.com/blog/2010/06/the-
importance-...](http://www.haystacksoftware.com/blog/2010/06/the-importance-
of-metadata-on-the-mac/) [http://www.haystacksoftware.com/arq/dropbox-backup-
bouncer-t...](http://www.haystacksoftware.com/arq/dropbox-backup-bouncer-
test.txt)

(Disclaimer -- I write backup software: <http://haystacksoftware.com/arq/>)

~~~
huhtenberg
Arq looks nice. I wasn't able to digest the (S3) pricing table, but all in all
it leaves a favourable impression of a well-designed and well thought out
program.

~~~
sreitshamer
Thanks!

S3 pricing is confusing. It boils down to this: $.10/GB per month for storage,
plus an insignificant amount of transaction costs;

After Nov 1, data transfer to S3 will be $.10 per GB;

Restores (data transfer from S3) are $.10 per GB (1 GB data transfer from S3
free per month).

------
dcreemer
Since I've not seen it mentioned yet: I use CrashPlan to back up all of my
personal systems, both to a home server and to an remote peer. It's a bit RAM
heavy, but other than that nicely stays out of the way & does it's job. The
killer feature is trivial peer-to-peer connectivity, making off site backups
easy. I'm not affiliated with the company -- just a happy customer.

~~~
huhtenberg
What is this "trivial peer-to-peer connectivity" exactly?

