Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you back up your data?
52 points by wbsun on Nov 16, 2013 | hide | past | favorite | 58 comments
I have hundreds GBs of photos/videos, thousands of songs, many important docs, Mac's time-machine backups, software installers... I just find it is really difficult for me to figure out an easy and reliable way to backup them.

I currently put the photos, videos, docs and software on two portable hard disks, some photos and docs on both Dropbox and Google Drive. But such way is really inconvenient, every time I have to grab the hard disk to browse my family photos and videos. I also don't feel the cloud storage reliable, so looks like hard disks at hand better. However, those hard disks may get older and 'wear out', so I have to upgrade the disks every three or four years..

Just wondering is there a better way to do the backup?

I've got a Synology DS213j [1] with two WD Red 2TB [2] drives in RAID-1. All machines in my house run some sort of Linux, thus they are encrypted and backed up daily with Deja-Dup (which is just a front-end to duplicity [3]).

The Synology has netatalk configured out of the box so it can be seen in the network as an AFP share and can be used as a destination for TimeMachine backups (I'm not using it in that way but a couple of friends who use Macs told me it works well).

It also has a package to backup data stored on the Synology to Amazon Glacier, and that is something I'm going to enable soon in order to improve redundancy (and to have an off-site copy of my data).

On top of that some of that stuff is also spread between Dropbox, Drive, Github and thumb drives. I might have like three to five copies of each file at any given time.

- [1]: http://www.synology.com/products/product.php?product_name=DS...

- [2]: http://www.wdc.com/en/products/products.aspx?id=810

- [3]: http://duplicity.nongnu.org/

Another vote for Synology here. Have been using them for a good 6 years now. I think I might have had one of the first DSM setups.

Just solid NAS products.

Upgraded recently to the DS213 and the WD RED 3TB disks in RAID-1. Running all sorts of services from SMB, AFP, UPNP, Surveillance, Backups, Etc. Performs flawlessly. Incredibly fast, DSM is an elegant UI (and powerful) and just couldn't be happier for all my wireless/wired client.

I backup the Synology once a month to a standalone 3TB disk via eSATA for even more redundancy.

Another vote for Synology NAS. I have the same device in the same configuration. I love it lots, and have ended up using it for far more than just backups.

My previous setup was two large USB drives, and doing everything manually (when I remembered and/or could be bothered to), but I almost lost data this way.

Yep, the Diskstations are super slick appliances. I'm also planning on using my Diskstation's offsite backup support, though I'm not sure if I'll be uploading to S3, Glacier, or Rsync.net. In the mean time I've plugged a Fantom external HDD[1] into my Diskstation and have set it to run backups there.

[1] http://www.amazon.com/gp/product/B004IYJX20/ref=oh_details_o...

Since both my wife and I use either Mac or Linux based laptops I have a quick and dirty shell script that gets fired off by hand at the end of the day. The script uses rsync and cpio to copy all files in the laptop's home directory onto a machine running debian packed with 4 1TB drives in a RAID5+1 spare configuration. The script uses hard links to keep daily increments without consuming disk space and goes something like:

  # rotate daily backups
  ssh -i $SSHKEY -q $BACKUPHOST rm -rf $BASEDIR/home.3
  ssh -i $SSHKEY -q $BACKUPHOST mv $BASEDIR/home.2 $BASEDIR/home.3
  ssh -i $SSHKEY -q $BACKUPHOST mv $BASEDIR/home.1 $BASEDIR/home.2
  ssh -i $SSHKEY -q $BACKUPHOST \
  "(cd $BASEDIR/home.0 && find . -print | cpio -dplm $BASEDIR/home.1)"

  # backup
  TMPFILE=`mktemp /tmp/excludeXXXXXXXX`
  cat << EOF > $TMPFILE

  rsync -Lazv -e "ssh -i $SSHKEY" --update --delete \
  --exclude-from=$TMPFILE $HOME $BACKUPHOST:$BASEDIR/home.0


So you have a hand rolled version of rsnapshot?: http://www.rsnapshot.org/

Wow, guess so. I've been using my script for years but I'm not surprised someone else has done this right.

This is not answering the question, but someone please build this for me (and hopefully a million others): Simple live off-site backup for worst-case scenario.

I buy a pair of devices (e.g. Tonidos) with attached commodity HDDs that I split with a friend. He plugs his into his network, I plug mine in.

I dump files to my local device, but they actually get buffered and then stored overnight at his place, and vice-versa. It's a remote backup storage device, not a NAS. Not for fast file serving.

If my house burns down, he hands me the HDDs and I decrypt them with my password.

If his IP changes, his device re-registers with mine somehow. Cleverness needed here, not a $10 month subscription. E.g. his IP is emailed to me, or my device, or something like that. Someone knows how to make two devices find each other over the net, I am sure.

Edit: Of course, it would be nice if it still worked if he was tech-illiterate, e.g. you could give the other one to your elderly parent or neighbor or whatever.

I think Crashplan's software will do this without you paying them a sub.

No, it'll back up to other computers with a Crashplan account. I don't want to depend on third parties who may or may not be around, I want to give a drive to a friend and say: Plug this in.

Dunno what to tell you. Perhaps rsync and a dynamic dns approach would be good? IF you want the machines to find each other even through IP changes you need somewhat of them rediscovering each other.

Back up should be a multitiered thing so you don't mind if one goes kaput. That way you can use a 3rd party without 'depending' on them.

I think we're talking sneakernet here...

How about http://www.bittorrent.com/sync/ where data is stored on other user's computers.

Space Monkey - http://www.spacemonkey.com/

Takes the distributed backup idea to the n-th degree.

Interesting, but "All subscriptions come with a device that is owned and managed by Space Monkey".

I'd like to buy something that I own and store where I want.

I think for most people, CrashPlan to a local and remote drive gives you the most bang for back. Here is what I do...

# Principles

Back up everything. Preferably to the cloud and to a local disk. Automatically.

You should test restores, but you probably wont. Have two independent backup methods for every piece of data.

You should be able to be up and running to last day's backup within an hour or so.

# Personal

CrashPlan: I want to backup every version of every file on my local computer to a local disk and cloud storage.

SuperDuper: I also want to have bootable nightly images of my machine in case I need to restore immediately.

CloudPull: I want to have a local copy of any Google file I have (Docs, Gmail, Calendar)

IFTTT to DropBox: And whenever possible, I want to have a copy of any other cloud-based service (e.g., Instagram)

# Servers

Linode Backup: Turnkey and presumably the Linode folks are smarter than I am.

Duplicity to S3: In case Linode folks aren't that smart, I will have nightly backups of everything.

I use one weird trick...

I have 2 USB drives. Every month or so I put my code on a USB drive and take it to my mom's house when I do laundry. I leave the latest backup there, and take the other drive home.

Cloud providers hate me.

I've been burned before, so now I use a mix of incremental networked backups and offsite storage. My home server runs ZFS, snapshotting every hour/day/week. Critical files get sent off to TarSnap each night. Friday nights, a script archives important data to a set of tar.gz files that are written out to an external HDD. Saturday mornings, I take the HDD to the bank and swap it with its counterpart in my safe deposit box. It's a bit cumbersome, but it works for me.

Locally I have a RAID 1 and backup everything to Crashplan. I use Crashplan with family and friends, I plan on subscribing to backup to their servers for extra protection. I also use Wuala / Google drive for photo sharing.

Never rely on hard disk. They will break. Don't rely on home solutions, you'll lose everything in case of fire, flood, lightning strikes, robbery, ...

Any home base solution has a single point of failure, it's location. So any solution with offsite backup will be better.

My suggestions :

- Buy a 2 disks NAS (Synology, Qnap, Lacie or similar), you can have 8TB on some of them. Want extra safety? Go with a 4 bay NAS in RAID10.

- Get a Crashplan yearly plan.

- Subscribe to unlimited bandwidth with your internet provider.

Any other online file sharing service will do the job but Crashplan is built for that and the price is fair.

And whatever solution you choose, the most important thing is, TEST IT. Do a simple recovery test once in a while. Setup alerts or reminder to look at your backup.

every time I have to grab the hard disk to browse my family photos and videos

That is not a backup, that's an external storage. A backup should only be accessed to sync data (preferably with versioning, so the old data is not overwritten) and to recover lost data.

My computers are synchronized with a home server that has mirrored disks, then the home server is backed up to cloud and to my parent's desktop (off-site backup). The backed up data has ~2TB (mainly photos, private git repository, documents).

I use CrashPlan for the cloud and off-site synchronization. With a free account you can backup data to another computer (yours or a friend's). With a paid account you can backup to their cloud storage and you can get more things to tweak. You can provide your own encryption keys, so the data should be safer (as long as you don't loose the private key). The best thing is that it offers file versioning (even with a free account, but you don't get some of the advanced settings for that), so if something really bad happens - like a ransomware virus, that encrypts your files - you can still recover the older version. The only issue is that the application needs Internet access even when using it locally - so if something happens to your account, or you don't have Internet access, you cannot easily[1] access your data.

[1] http://crashplan.probackup.nl/remote-backup/support/q/how-to...

CrashPlan is excellent. You can get a "family plan" for ~$12 a month, backup as many files you want on a bunch of computers. No file space limit, no bandwidth throttling. Online access to files, including from native mobile apps.

I've become a big fan of Arq, although your collection may be a bit large. Crash Plan and Backblaze do well too.

If price is an issue, and you don't care about security, I would also consider a private Flickr account for photos and iTunes Match for music, and then use Dropbox/Arq for the documents/other valuables. Photos/music are fairly safe offsite with those, then your docs will be in S3/Glacier with Arq. Not perfect, but an alright solution for cheap.

As I understand it, some of those solutions won't work because they require you to keep a copy on your machine too no? I believe Backblaze and Dropbox does. That won't work if so.

I don't have offsite backups yet (but read on). Currently, I auto-backup daily at 4am, my entire fedora desktop using rsyncbtrfs https://github.com/oxplot/rsyncbtrfs which does an incremental backup on btrfs using subvolumes. I have everyday backup for last 4 years and I can access any version of any file at any time with no preparation (ie just cd into the appropriate directory for that day).

I have also written https://github.com/oxplot/cloudnbd for offsite backup. It presents any cloud storage (via pluggable backends) as linux block device which you can format to FS of your choosing. You can then mount it, RAID it, etc. and do crazy stuff with it. I don't use it yet because it's not well polished (and I'm lazy) but that's my ultimate weapon. It encrypts everything on the client side too so that should make NSA's job a tad harder.

I pay ~$4/m for CrashPlan. I keep all my data - all of it - on one desktop PC (including periodic downloads of remote data like emails) and all my other devices (laptops, smart phones) contain only some subset of that data passively synced with the desktop, E.G. using Dropbox. I do incremental backups down to 10 minutes to CrashPlan's cloud and to a local external HDD. I remind myself to replace the HDD every 3 years.

I've got terabytes of data and it works great. The only thing I want is a third, off-site, non-cloud backup. I don't have a solution for that, but having my data in 3 places, one of which is the cloud, is good enough for me for now.

I've got my stuff on my macbook air and usb drive. I've got the drive partitioned into two, half for time machine, the other for more photos (that don't fit on the MBA's 256gb SSD). When the MBA gets full, I move old photos to the USB drive.

MBA and the data part of the USB drive backed up with Crashplan ($5/month)

Also have some stuff on Dropbox (free account) also backed up to Crashplan - it's mostly just stuff I need in other places.

All my documents on Google Drive, don't back that up because I don't think you can (the desktop client doesn't give access to actual documents). Additionally most of best pictures are either on Flickr or Facebook.

Previous backup method: 1. Back up working dir (~1.5 GB) daily with rsync to Raspberry Pi. 2. Back up encrypted archives of photos/videos monthly to mega.co.nz and external drive, and back up new photos/videos during month with #1 above.

New back up method: 1. Back up everything with Deja Dup to $25/year, 50GB VPS daily - I don't have a lot of data. 2. Back up everything with Deja Dup to external drive monthly (maybe going to weekly).

Edit: All photos are resized monthly with a script to create a ~100KB copy that lives on my laptop and phone. I can see all photos I've ever taken on my phone, but still have the full-res copy backed up.

My method is pretty close to yours:

1. Google Drive (got 2 years of 100GB space with Chromebook), was previously using Dropbox

This provides me a backup locally on the computer and one off site.

2. I have an external hard drive connected to the workstation that runs an update weekly to give me a third copy of all my data. I switch out the drive with a second external drive on a weekly basis.

If my computer dies, I have Google Drive and then the external hard drive x2. If the external hard drive fails then no big deal. If Google drive wipes out everything still no big deal.

I backup around 70GB.

*edit: The OSes I use are Windows 8.1, ChromeOS, and OSX,

This is how I do that for servers, but I'm experimenting with using the same backup schema for my local machine, too:

On my servers, every virtual machine is on an encrypted block device, so the data is already encrypted when it's written to the disk. Then I use Shasplit [1] and Rsync for efficient incremental backups. Restoration is a simple "cat" invocation (or use Shasplit for that), and I get back a full, running, encrypted VM with everything.

[1] http://vog.github.io/shasplit/

I have a 500Gb portable hard disk that gets appended to daily and cycled every 2 years. Plus a digitalocean VM which gets rsynced to daily.

Oh and a VS320 DLT which gets done when I can be arsed as that has a two decade shelf life. It and the drive live in a 400kg fire safe.

And I print all my photos on 10x8 paper that are worth keeping.

And I have CDs for music still.

Edit: see a lot of people relying on just cloud services. Don't do that. Shit does go missing and fail. And I bet you don't test your multi-gigabyte cloud drop until you need to restore it which is bad juju.

Dropbox. (Google Drive is not stable enough, and I have too many things on Google already)

One mistake I made with Dropbox is installing it on one of my linux servers, on my server I moved Dropbox folder, and it made Dropbox believe that I have deleted everything on my server. So it deleted most of my files, and recovering it was a mess. Even talking after talking with Dropbox support, I couldn't recover everything :(.

Looking for a better solution, but I need the cloud as I want to be able to access some of the files from my phone.

1. I keep working data on my laptop, and archived data/media on a desktop machine in my home.

2. Both are backed up to Time Machine to a network drive, and also with Arq to Glacier.

3. I keep important documents in Dropbox.

This gives me "Oh shit, I deleted a file" protection from Time Machine and Dropbox, two local copies of all media and data in case of hardware failure or machine loss, and a last-ditch offsite Glacier backup.

Works well, and Arq isn't that pricey if using only Glacier.

I have independently reached the same conclusions. Glacier for photos / videos backup and Dropbox / time machine for everything else.

BackBlaze has worked well for me:


easy to restore, and works silently in the background.

I've been reading up on Backblaze but don't they require you keep a local copy and they only keep history for 30 days before its wiped?

Yes. I use them; the other bad thing you might read about is that to download encrypted stuff, you need to enter your password on their server.

So I'm curious then, given those flaws, why still use Backblaze as it doesn't seem to be a true cloud backup? Would feel more like a cloud Time Machine

Disclaimer: I'm a Backblaze employee. Backblaze tends to be extremely simple to use. After installation, you enter your email address and a password (on the client side) and you are now CORRECTLY configured. By default, Backblaze backs up any and all files on your computer. You don't have to pick and choose folders, write a script, setup a second computer at your Mom's house, nothing. There are only 3 billing options: $5/month (utterly unthrottled and no size limits), or if you pay for 1 year up front it is $50, pay for two years it is $95.

Simple and "just works" makes some customers happy, but Backblaze isn't perfect for everybody. If you like playing with your backups, scripting them, configuring machines, excluding folders - Backblaze will drive you crazy as you fight to control it. For example, if you add a new folder of images to your computer, Backblaze will push them to our datacenter unless you EXPLICITLY exclude that folder. Some customers would like an option that all new folders ARE NOT backed up until they "add" them to the backup. Backblaze must be thought of as a complete solution - if it doesn't fit your needs, we can highly recommend other more scriptable products to you.

There is no shame that we only fit the needs of a subset of customers, our biggest challenge is communicating what we do and do not do clearly so we don't waste your time.

The original question was "with these flaws why use Backblaze?" The answer is, only customers who view these "flaws" as good things should use Backblaze. For example, your files are encrypted at Backblaze and we absolutely don't store your password. But by default, you can "recover" your password with access to your email address. You can enable a higher level of security on Backblaze where you can NEVER recover your password and if you forget your password, you are screwed. This is NOT the best option if your Mom is using Backblaze to backup her cat pictures. Friendly and easy is a BETTER solution for some people. But if you are really, really concerned about security, look into Backblaze's "private encryption key" option.

Here is a blog post/screencast I wrote almost a year ago on how to go about using OS X TimeMachine and SMB (Samba) to do backups.


Never made it on HackerNews, but many people have found it useful.

I have a 1 TB USB drive and allow Ubuntu's backup software to do weekly backups of my code, docs, etc. It's simple a $100 solution.

For my laptop: I use backblaze for everything. I have timemachine setup on a small server (running zfs and snapshotting). I also use arq to backup more important stuff an extra time to S3.

For servers with data, which currently is just postgres - Heroku pg backups. We download, verify and archive them on S3 and to local server.

> I have timemachine setup on a small server (running zfs and snapshotting).

Are you talking about the Apple Time Machine app here? I'd be curious to hear about your setup if so. Right now I'm using time machine to a USB attached HDD, because my last foray into network attached volumes + time machine (using linux/samba as the volume host) was pretty painful.

I also like the idea of ZFS and periodic array scrubbing to detect invisible failures, which most RAIDesque configurations ignore.

Dropbox for non-media stuff, I back up everything occasionally on a USB hard drive with rsync. I guess it's not a big deal to me because aside from my work which is in dropbox and git, I don't have anything I couldn't replace.

I don't have anything really on my computer that needs backing up. I use SkyDrive to store documents like Homework etc. but apart from that nothing.

Everything nowadays is easy enough to re-download from the internet at a later date.

I have a raspberry pi at home in which I backup to via sshfs (if I am away from home). It is connected to a usb driven external 2TB drive.

I backup this drive and keep a copy at my parent's house in their safe.

I use Apple Time Capsule and iDrive for off-site backups (https://www.idrive.com/). Both use encryption to protect my data.

I switch between Windows and Linux quite a bit but for me it is mainly two tools: Rsync (Linux) and Robocopy (Windows).

I don't store anything online unless it is something mundane, like my dotfiles.

I only back up important documents, encrypted on an SD card. I estimate that the chances of fire or theft are not important enough to invest in anything more.

btsync [1] seems pretty promising so far. I've been syncing a NAS drive at home with my a disk at the office and a small linode instance.

Shame it's not open source as there have been a few things I'd like to tweak (like quieten down the UDP noise a bit), but it's a pretty good backup solution IMO.

[1] http://www.bittorrent.com/sync

Arq to SQ and Glacier: http://www.haystacksoftware.com/arq

What is SQ? I think you meant S3?

Yes, that was the meaning, S3. I too use Arq, to Glacier.

I use Amazon Glacier through FastGlacier. Pennies a month, and I'm reassured.

Arq -> Amazon Glacier

Timemachine + dropbox.

I use CrashPlan

pay for dropbox.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact