How is your data backed up?
[X] I use Tarsnap
Somewhat related: Are the responses to this going to be made public somewhere? I'd love to see what they look like (for obvious reasons) and was thinking just last week that maybe I should post a survey to HN and Reddit.
I love Tarsnap. I use you for a lot of very-critical-yet-small files.
My personal pain point is that I'd like to back up a few TB of data but can't afford to--it would be nice to back up an entire workstation. I'm interested in seeing if that's a sentiment that others share, or if it's just me.
Why do you have a few TB of data on a workstation? What sort of work are you doing?
Here's the breakdown of my data (largish categories only)
* 5GB of codebases I've committed to in the last 3 years.
* 50GB worth of software. Mostly development tools. Five different versions of XCode, I actually use most of them every day (scary, I know…)
* 50GB of VM images I use for development. I could rebuild them, but I really don't want to.
* 80GB on my boot volume. Stuff I would backup if and only if I want a live, bootable backup. Save me a lot of time during a restore.
* 5GB in ebooks; primarily used for reference / searching
* 20GB in music
* 20GB in scanned documents, mostly financial and legal paperwork, contracts, etc.
* 15GB of e-mail
==OTHER THINGS AVAILABLE NOWHERE ELSE==
* 250GB of samples/patches/instruments that I've built over the years and songs I've written (I play keyboards for fun). Not business critical, but I'd be sad if I lost it.
* 20GB of backups from computers I used to own growing up. The nostalgia of that game I was coding in middle-school, etc.
* 100GB in learning resources I've collected over the years that are out of print or otherwise unavailable anywhere else. Not useful to me (anymore), but sharing them with others is valuable to both me and them.
* 20GB of home video
* 13GB in personal photos
==THINGS I DON'T WANT TO DOWNLOAD AGAIN==
* 375GB of Apple Development videos. I refer to these fairly regularly. I typically watch a few each day to keep up-to-date.
* 20GB of Steam games
Looking at this list, sure, I could live without some of it. But a lot of it is pretty important, and it would be nice to back up most of it.
Update that with "slow" data every month or couple of months.
Decide what needs to be backupped daily/weekly and pay some money for that to do it online. How much would it be worth to you if you lost it in a second?
This is definately a pain point of mine. Luckily, the data doesn't change often, and if I needed to restore from data loss it doesn't need to happen ASAP. So I'm thinking just saving the data to seperate hard disks and storing those in a safety deposit box will fulfill my needs.
Does that sound right? Or am I missing some large dangers in this solution? (I'm pretty new to this stuff)
Right now I do a versioned rsync to a server I host myself that has a (just upgraded from 1TB) 4.5TB RAID5 array. That was ~400$ in disks when I upgraded. Even at the current ~1TB usage that would be 3600$ per year in tarsnap storage alone, without the bandwidth cost.
What I'm still missing is a way to backup that server. I could just collocate a second server for a fraction of the tarsnap cost, and I don't really want to be backing up into physical media (tapes or drives) and shuttling that to a deposit box. What I need is tarsnap only 10x cheaper. I'll accept a lower redundancy than S3 for it and especially a lower turnaround time for restores. S3 seems to be built for on-line transactional stuff not backups so is a poor fit for these services. Even the reduced redundancy version only seems a marginal difference.
It would be awesome if Amazon offered something analogous to a single offsite automagically-expanding RAID volume for a lot less money than S3.
And that was exactly my point. Amazon is selling apples and I want an orange. Tarsnap is an apple pie and is the perfect kind of pie I want. But I want an orange pie.
I would be ok with a service that didn't give me online access to my data. Run a very large stack of removable SATA drive slots. Copy incoming files into two of them in two locations. Store them securely when they are full. Fetch them back when I need to restore. That should cut way down on the CapEx of all the NAS/SAN hardware S3 has to run.
Subject to pricing, reliability and privacy infrastructure being sane / good, I'd be very interested in that
What I want to avoid is to have to bother about backups in the normal case. I just want to set it up properly as a daily cron job and mostly forget about it. Putting physical disks in the mail or safety deposit boxes is too much trouble. The server I want to backup isn't even usually physically next to me.
This problem is only growing as people get better cameras and start shooting a bunch of video and stills that take up a bunch of space. An interesting solution would be a mix of this online service plus a NAS appliance to put on the home network that does fast filesharing and automatic backups locally and to the online service. When a disk in the appliance fails you call the guys and they send you a replacement drive with the data already in it.
Hope it's not a stupid question, but how do you deal with the "bus factor" (a.k.a. "truck factor")? In other words, if the service relies on encrypted S3 backups and the founder exits for whatever reason, is there a contingency plan?
I don't. If I get hit by a bus, Tarsnap will almost certainly cease to exist.
That said, the service runs itself quite happily on its own (I've gone for months without touching any "live" code) so the odds are very good that Tarsnap users would have plenty of time to download their data between "email goes out announcing bus hit and account balances get refunded" and "service shuts down".
I intend to solve the bus-factor issue once Tarsnap gets larger, but right now it's not financially feasible to bring someone else in.
I'm hoping that the results are made public they'll be anonymised at least somewhat.
Nothing we have locally that is important is not also at some remote location.
Is there any pain associated with restoring your dev environment? How long would it take you to get your text editor reinstalled, color scheme set back up, system preferences back, etc.?
Every time I make a change worth saving and want to share it, I push it to gerrit, which means there are now two copies (one on my laptop and one "in the cloud" on my gerrit server).
Every time a change is reviewed, verified, and blessed as part of our code base, the reviewer or verifier pushes a button to submit the change, it's automatically sent up to github, github fires webhooks that automatically replicate the data down to a machine in the office, another copy on our build master (also in the office), and then, shortly after, another copy on every build slave.
Internally, files are stored via nfs or smb onto a solaris box that takes snapshots every 15m. Those snapshots are not stored externally.
The biggest part of our data, though, is made up of vm images. Big files that change a lot. Restoring them hurts, but anything that loads up IO on that box makes vmware think the NFS server isn't responding and it unmounts and remounts it (it even does this when nothing's using it -- though the NFS server itself works fine for other clients during this time).
I wouldn't mind a backup procedure there, but it'd have to not break that.
(sorry to not use your form... the further down I got, the less it applied to me)
Missing option is impact on the computer while I'm using it. All of the transparent services (especially Time Machine) that I have experience with are abysmally bad at turning themselves off when the machine is being actively used.
How's that for low impact.
I bet a lot of people would find them interesting.
Time Machine + Dropbox makes workstation backup a solved problem.
Slicehost Backup + Tarsnap makes server backup almost a solved problem. And I say almost because I don't like having no choice but to store my full bootable backups with the same provider that hosts my primary server. If I could store bootable weekly images off-site, I'd be golden.
> Looks like you have a question or two that still needs to be filled out.
If I want to skip a question, I should be able to do so.
(Disclaimer -- I write backup software: http://haystacksoftware.com/arq/)
S3 pricing is confusing. It boils down to this: $.10/GB per month for storage, plus an insignificant amount of transaction costs;
After Nov 1, data transfer to S3 will be $.10 per GB;
Restores (data transfer from S3) are $.10 per GB (1 GB data transfer from S3 free per month).
Only problem: getting the 100gb option is too expensive for me (for now), which is a shame. Also, I'd love it if they had a much larger size as well, so I can quietly back up everything without worrying about space.
If you want to fix my backups, make the data readily accessible when there isn't a net connection between me and the service.
One time this usually bites is if you just (re-)installed a new OS and you need the network driver, which is only available online. I always keep a backup of the drivers on my external hard disk for this situation.
Another service I worry about is Gmail and other Google apps. With email you can easily download the several GB of email onto your hard disk, thus keeping a backup for when you or the service are offline.
Dropbox solves this nicely: it's a local folder that's synched to the cloud. If Dropbox is down or I'm offline, I still have access to my data.