Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you backup your files without depending on a third party service?
160 points by sbjs on July 24, 2018 | hide | past | favorite | 107 comments
I'm using Github, Dropbox and iCloud, but this makes me nervous about my data's privacy and longevity for a number of reasons. Among the three services, I have ~700GB of data.

But I got to thinking, aren't USB thumb sticks reliable and big enough nowadays to fit this much data? I could just buy two 512GB sticks and use rsync to backup to them a few times per week. This way I wouldn't have to lug around an awkward hard drive if I'm ever traveling.

What strategy do you use to keep your data safe and private for the long term? Do you have a portable solution, or do you recommend something else?

Synology or QNAP NAS. They collect backups from all computer in the house, encrypt them and ship the encrypted blobs to AWS Glacier.

Synology in particular is a fantastic platform which does way more than a traditional NAS. It's quite polished personal cloud platform (way more so than FreeNAS) and includes hundreds of apps, even the backups into the cloud can be done in different ways.

I also place a premium on the fact that Synology is a piece of hardware. The software comes built-in and taken for granted. What this means is that it's been sitting somewhere in the dark corner of my house since 2012, self-updating and not bothering me with upgrades or expired credit card / subscription, etc. It doesn't care if I run Linux/MacOS/Windows, it's just dumb storage that all of these OSes can backup to using built-in tools.

Another tool I like is Resilio Sync [1], it's a cloudless Dropbox and it's perfect to sync /Documents and /Desktop across computers at home and at work. It's not a backup tool, but very much related, i.e. if I put a file on my desktop at work I know it will end up on Synology at home and eventually will get encrypted and backed up to Glacier, and yes, without any 3rd party servers involved.

[1] https://www.resilio.com/individuals/

+1 for synology NAS. The quality and frequency of updates has been high. I've used both rsync and Time-Machine based backups.

And I too like that it's a hardware business model: I have the sense that my purchase(s) fund development of useful features, not creepy upsell opportunities.

They had a scare several years back about a remote exploit. I think it has caused them to take security and updates seriously ever since.

+1 to Synology. Love their hardware and software.

Another advantage is that they're reasonably open (modules, plugins, etc), and the base operating system is GPL (but unfortunately not the UI / modules), so you can play with it when needed.

I had to play with the source code a few times over the years, and having the ability to understand what the operating system is doing [1][2], or be able to do some hardcore recovery through the serial console [3] is pretty remarkable for a consumer storage product.

(note: I wrote these posts 5-6 years ago, and the software has evolved a lot since them. I haven't checked the source code or used the serial console recently, so would love to hear thoughts if anyone has done it recently)

[1] https://wrgms.com/reverse-engineering-synology-openssh/

[2] https://wrgms.com/synologys-secret-telnet-password/

[3] https://wrgms.com/recovering-a-failed-synology-diskstation-d...

I would stay away from QNAP, their quality is terrible. I had a QNAP QVR and it just reeked of insecurity.

My friend has one of the smaller Synology boxes with 20TB and it is pretty nice. You can think of it as the Apple of the NAS systems out there, everything just works.

However, if you like to tinker with your NAS a bit more, FreeNAS is excellent. It's pretty comforting running zfs over standard RAID, and you have a FreeBSD system where you can run anything you want in native jails.

Agreed, QNAP os/distro/apps just feel shoddily made and maintained. If you maintain systems for a living you'll go berzerk trying to track down just exactly what it is doing and why and the constant (weekly?) firmware security updates.

I haven't tried Synology in a long time. I'm looking to get my few TBs off of the QNAP and sell it in favour of something custom built.

To add on: Synology largely repackages open protocols into a fairly user-friendly package. Connections can be made over SMB/AFP/NFS, and their Synology Hybrid RAID for utilising different-sized disks is basically mdadm + LVM.

I think it helps if anyone's concerned with avoiding vendor lock-in.

How can you advise synology when their system doesn't even have FULL DISK ENCRYPTION? As longas it doesn't have it, i wont be even interested looking at their platform.

Can you imagine it? NAS'es usually are targeted for business users, yet they dont have full disk encryption...

For now i'm using QNAP, in terms of possible features i believe QNAP is little ahead (x86 cpu, virtualization, containers, fde). It has all what synology has + more.

However their platform is badly constructed, most processes are esentially running as root user... You get like firmware update every 2 weeks. It doesn't mean that there are bugs tho. I'm running it for 2 years and never had a problem.

Synology is great however I'd like to caution anyone considering their enterprise gear that they do not offer traditional 24/7 support. Regardless of which unit you purchase (in our case we purchased the most expensive RU model), all support tickets are triaged and prioritized equally, and there is no phone support (unless you scream loud).

Get your reseller involved, and if you don't have one; get one. They do local seminars, you can just attend to become one yourself.

At least on Synology you can have your choice of storage vendors using the built in cloud sync app; it natively supports Rsync, AWS, Glacier, B2, Amazon, DropBox, Azure, whatever.

I have a 4TB Hard Drive attached to an OpenWRT router (Rasberry Pi's are also a popular choice). The router exposes a locked down SSH port (SFTP only) through a cheap $5 a month VPS.

I also use Jottacloud [1] who provide unlimited (yes unlimited) storage for 7.5 EUR a month. Based in Norway, they keep an up to date warrant canary.

Client side, I backup to both of these services using Duplicati [2] which offers client side encryption, block level deduplication and excellent integration with Jottacloud and SFTP. For synchronisation, I use Syncthing [3] since I don't want to expose plaintext documents to Jottacloud. All in all, I'm very pleased and it has required no maintainence since I set it up a year ago.

[1] https://www.jottacloud.com/en/

[2] https://www.duplicati.com/

[3] https://syncthing.net/

I just started experimenting with Syncthing because Dropbox no longer meets my needs.

So far I'm pretty happy and impressed, it's a decentralized piece of software that Just Works (TM). My plan for backups is to spin up a VPS somewhere and have them back it up.

I also love that it can use LAN; I can finally sync up my giant mp3 collection.

+1 on syncthing.

I have one on my computer, one one our Raspberry Mediabox (with a dedicated drive) and one on a dedicated VS.

New version using inotify really helped the Pi !

Syncthing is awesome. My main use of it is to share between my laptop and seedbox. Also if I want to share something with my friends in an unattended manner.

Jottacloud sounds interesting, how do you integrate with it, doesn't seem to offer any standard API's besides their own CLI.

Not op but apparently he uses Duplicati to upload to jottacloud: https://duplicati.readthedocs.io/en/latest/05-storage-provid...

Go by the 3-2-1 rule: 3 backups, 2 different formats, one offsite.

Similar to the military mantra: three is two, two is one, one is none. (Meaning your "spare" should already be considered in production or broken, so you need a "spare spare" to really have a "spare").

Time Machine your Macs to a drive and also perform an image backup to a second drive, keep them both in separate places and implement a rotating image or Time Machine backup.

On your PCs File History is probably adequate, and you can image the drive with something free like Acronis.

Linux there are a hundred ways, but the o'reilly 'crocodile' book (backup and recovery) has many good solutions.

Also as a tip you can get solid LTO5 mechanisms for a good price these days on eBay and they're pretty robust and hold a good chunk of bits.

Synology NAS are good machines.

If you’re tech savvy and want the easy answer then check out Tarsnap[1].

If you’re tech savvy and want to save a few bucks by doing it yourself, check out borg[2]. You’ll need to set up an external server for it but at scale (TB) it’s an order of magnitude cheaper than Tarsnap.

Regardless of off site storage I’d suggest also buying two large USB drives and formatting them with LUKS. Keep them offline and only sync them occasionally as live storage isn’t really a backup.

If you’re not tech savvy I’m convinced what you’re asking for is impossible. The best a layman can do is pick a reputable provider (like the ones you’ve listed).

[1]: https://www.tarsnap.com

[2]: https://borgbackup.readthedocs.io/en/stable/

Tarsnap is a third-party service which in turn depends on S3. The author specifically requested solutions which do not depend on a third party.

I know and I still gave the answer I thought was best.

Unless you're going to run your own servers in multiple locations (or convince you friends to let you host a NUC at their homes), you're going to rely on a third party. All that changes is the type of service that you're receiving, i.e. whether it's an object store (S3, B2, etc), a virtualized server, or even a rack in a data center.

I tried Borg for sometime and it worked fine. I used to backup to a 1TB VPS. But when I decided to move to B2 I had to look for alternatives since Borg doesn't support object storage. I'm playing around with Restic[0] these days. They have recently added many nice features, e.g compression.

Looks promising. Though I wish apps like Borg/Restic had functional GUIs, even though minimal. In fact preferably minimal. Just notifications, list of archives, info about data backed up and actual space used, scheduling etc.

[0] https://restic.net

You’ll need to set up an external borg-server ONCE. From there you can additionally sync it to where ever you want using the method you want. E.g. Google cloud. As long as it's encrypted, you don't need to bother about the hosters activities. But forget the USB-sticks.

E.g.: Borg to local RAID every 2 hours, sync RAID with cloud every 24 hours.

> But forget the USB-sticks.

> E.g.: Borg to local RAID every 2 hours, sync RAID with cloud every 24 hours.

The problem with a fully online (non-append only) backup system is that if everything is working properly then an accidental 'rm -rf /' will propagate throughout all your supposed backups and destroy them as well. The USB drives are for that type of "Oh shit!" moment as a (possibly stale but better than nothing) backup of last resort

I use an external USB HD and automate my backups with rsync and cron. Works great.

Here's a blog post showing everything from which drive I use to the backup script: https://nickjanetakis.com/blog/automatic-offline-file-backup...

I've been using this strategy for about 20 years and it hasn't failed yet. Although in the older days I had a slaved HD and also used CDs on occasion.

I would never trust a consumer grade USB flash stick. I've had so many fail on me over the years.

I switched to ARQ (https://www.arqbackup.com/) a while back after a near miss with a hard drive failure, and think its an amazing product - the key elements to me are that:

1. It is storage location agnostic (backup to a usb, external HD, network HD, dropbox, amazon, etc)

2. your backup data is encrypted locally before being backed up to any medium

What's great is that i can back up one set of data to an external hard dive manually as needed, another set of data to say dropbox on a nightly basis, and another set to a B2 bucket on a monthly basis (all of which are encrypted so i retain my privacy/data security).

Edit: i suppose that it is a "third party service" but i think it reliably solves for the lock-in and privacy problems you reference

I use arq as well. Some things I don't like about it are:

1. There's no information about what it is doing (which file are currently being backing up, what's the current upload rate, what's the estimated completion time, how much data is left, etc)

2. Large files (like VM images, or crypto containers) block small files from being backed up. Essentially, your backup frequency can be no more than largest_file/upload_rate (well, it's more complicated than that because of deduplication)

#1 the information is limited, but it is available https://goo.gl/images/9zCMNR

#2 is an issue I've encountered as well

Two FreeNAS servers on premises and in different locations with ZFS replication.

Then use SMB, NFS, Time Machine, rsync, Syncthing, sftp or anything else you like to push data to one of the FreeNAS machines.

Periodic ZFS snapshots prevent common cases of PEBCAK and ransomware.

Fully redundant, encrypted & open-source solution that can saturate 10GbE!

Just wanted to leave this excellent tool here for those replicating ZFS:


If you have ecc memory and two locations this is probably the best option.

If you don’t have a second location you can still replicate to https://www.rsync.net with all the benefits ZFS provides.

ECC is highly recommended but even a low powered CPU will do.

A lot of the replies either rely on third party services (albeit encrypted) or forget about the third part of the holy backup trinity of having something offsite.

It sounds like you're thinking of local backup while you're traveling, which is great. The portability of USB sticks seems perfect for your use case, although I think a 1TB stick is still a stretch, maybe look into a small self-powered external SSD drive. If using OSX you can encrypt it and use Time Machine to auto-backup when plugged in (easy) or just use an rsync script (manual).

But don't forget to ship these backups offsite once you're home. If you get a couple of 2TB drives to account for growth, you could get into the habit of depositing the current backup into offsite location, grabbing the old backup from offsite, and just rotating every couple of weeks. Offsite could be a local bank safe deposit, a work office, or out of town family. (I consider family second-party, not third-party.) Local backup means nothing if you have to evacuate due to flood (east coast hurricanes), fire (california wildfires), or any unforeseen disaster.

I use this method to backup 10TB of photography that I can't do online without a fiber uplink. Thankfully 10TB platter drives are possible these days for me to flat-rate mail to my parents across the country. (And yes, Synology for local backup, that seems to be the hit here.)

The common wisdom always says have an off-site backup. But what if the backup is small enough to fit in your pocket? I always have a Swiss Army knife on me anyway so if I have a backup thumb drive that’s the same size, it should fit just fine, and be as safe as my house keys. I really think we’re in a new age of self sufficiency because of advances in technology and physical storage size.

One of the biggest benefits of off-site backup is stability. That's why something like a bank's safe-deposit box or my parent's firesafe are my preferred locations. A backup in your pocket is great until you lose your keys, your pants get a hole in the pocket, you get pushed into a pond, etc.

The rsync solution doesn't have to be manual. On Mac you can install a launchd script to run using StartOnMount.

You can buy USB3 M.2 SSD enclosures for ~$20. They are larger than normal flash drives at about 5" long but still reasonably pocketable and much higher performance than normal flash drives. Drop in a 1TB SSD for ~$200 and you're good to go.

I had a friend in college that backed up all of his files on a flash drive but he used git rather than rsync.

> Drop in a 1TB SSD for ~$200 and you're good to go.

O_o ...For some reason, my mental model of SSD rates was stuck in 2014 at $0.50/GB. As I type this, my laptop is busy moving a VM image over to the NAS to free up some space on my 256 GB SSD.

But no, according to https://pcpartpicker.com/trends/price/internal-hard-drive/ and https://pcpartpicker.com/products/internal-hard-drive/, you can indeed find SSDs at $0.20/GB (the low end being currently dominated by Crucial MX500 drives at $0.18/GB). Looks like it's long past time to upgrade!

After my last SSD died, I'd have a hard time trusting SSD's again for backup purposes (this is probably 8-10 years ago though I suppose).

It was more the way it failed - I had one or two boots where a few errors started to pop up, and then the next boot, nothing - the disk had disappeared to the BIOS' eyes and any other tools - just dead as a doornail with no easy way of getting any of the data off it.

I assume they've improved in that regard, or are spinning platters still a safer proposition over the long term?

8-10 years ago is a very long time, especially for what was still a relatively new, low-volume product in the consumer market.

> I assume they've improved in that regard, or are spinning platters still a safer proposition over the long term?

I don't think you need to assume..just check out what statistics are out there. Sadly, I don't know of anyone publishing numbers for consumer SSDs like Backblaze does for spinning disks.

You can get even cheaper, the Micron 1100 2TB drives are only $0.15/GB at $300.

The issue with "simple" backups is that, should your house/office burn down, you lose everything, including the backups. Any alternative (like bringing the usb sticks to a safety box etc) is much more cumbersome than standard cloud-based solutions.

I have a SpiderOak subscription for 2TB that is private enough for my needs, and that's where I backup pretty much any machine i have. I also keep a low-power Linux server at home (a cubox) with two 1TB disks in raid-1, that acts as a Time Machine for the occasional quick restore. Most of my code lives in Gitlab or Bitbucket, and some business stuff in OneDrive. Family pics and videos, that are massive, get pushed every few months to Amazon Glacier.

Not if I carry the USB sticks in my pocket at all times!

I thought I was bright carrying around my portable hard drive with my backups on my backpack.

Until one day I got robbed and I lost both my laptop AND my backups... (and USB sticks, etc). That sucked.

That's a recipe for losing data even more frequently...

I haven't lost my house keys once. I'm sure I can manage to do the same with a USB stick? Especially if it has a keychain hook so I can put it on my house keys keychain!

I guess that's a strategy. It wouldn't work for me, I lose everything (including house keys a few times).

RClone ( https://rclone.org/ ) synchronizes (similar to rcopy) directories with a multitude of providers and allows for client-side encryption. I use a combination of Google Cloud Storage (for really important files and pictures) with the "crypt" extension and an USB HDD Hub (4 disks) with Raid 5 hooked to a Raspberry PI for less important data. The backup scripts start on boot. It's a charm. Oh and rclone is written in Go so it works on all platforms, including Windows, macOS, Linux (and also ARM).

I maintain three sets of backups - two personally, and one with a third party service.

For the personal backups, I keep one at home (second location) and another in the office. In both cases, I rely on rsnapshot (http://rsnapshot.org/). It's an oldie but a goodie - I've been using it for over 15 years. It solves the rotating backup problem - and thanks to the glory of symlinks reduces my disk usage.

I also use duplicity to gpg-encrypt files that I save into SpiderOak - but feel free to use s3 buckets. I very much like having encrypted backups just work. In both cases I have my scripts notify my via a webhook when things fail - and every 7 days regardless, just so that I don't have a heart-attack wondering if anything happened.

SpiderOak is new - about 8 years. The webhook is newer, about 4. The rest is ancient and continues to work across everything, always.

I love simple software that doesn't too much magic.

For this reason, I use rsync to a btrfs volume (both local, on a USB stick, and remote over SSH).

Then I take a btrfs snapshot, so I have versioned and unduplicated backups.

Furthermore, rsync supports gitignore-like files, so you can control what gets and what doesn't get backuped in a very fine-grained way.

Tarsnap is also a great option.

For your particular use case, I'd go for both a remote server and a local pair of USB sticks to have duplicated backups in different locations all the time.

> I love simple software that doesn't too much magic. > > For this reason, I use rsync to a btrfs volume

If btrfs were simple software redhat/fedora wouldn't have abandoned it in favor of xfs.

Then another filesystem with snapshot support.

My point is to use utilities that do one thing, and compose well.

Ok, but this approach is pushing the complexity down the stack into the kernel/filesystem. And a general-purpose performance-oriented snapshotting filesystem is going to be making a lot of compromises that aren't really ideally suited to backup duties.

Plan9's Venti/Fossil immediately come to mind as an example of simplicity in this vein.

> I love simple software that doesn't too much magic.

> For this reason, I use rsync to a btrfs volume (both local, on a USB stick, and remote over SSH).

I love simple software too: that's why I use lmnop and throw in a little xyz.

* "Syncthing" for keeping multiple copies of the current files, across devices in different locations, that will never malfunction at the same time. This is usually enough.

* "Back in Time" for keeping older file versions in external hard drives.

The two were chosen mainly because they have decent GUIs, both support diffs, and the backups can be viewed using nothing but a file explorer.

Syncthing also provides backup versions (if updated on a different machine). So if it pulls a newer version from another machine, it will back it up first :)

Edit: Granted, time machine provides a MUCH more 'slick' interface

I store my data in 3 places: 1. A 2TB WD Mybook Live 2. A $60 per year 1 TB Amazon Drive account (contemplating moving to Backblaze) 3. A $100 per year 1 TB Google Drive account.

I use rclone to sync with Google and odrive to sync with Amazon. Stuff on my phone (including photos) gets synced to Amazon and Google using the respective photos apps for photos and foldersync for other stuff.

I use Bvckup 2 (https://bvckup2.com/) to backup on to a 4 TB external drive. I also have Duplicati (https://www.duplicati.com/) running to backup those backups onto OneDrive and Backblaze.

Just chiming in here to say bvckup2 is awesome. It's one of those rare pieces of software that has rock-solid engineering. My heart warms with glee every time I think about it.

I've been using it in place of robocopy for all my Windows backup needs for a while now, with rsync for linux. It's fast.

Hourly backups to a NAS and one of two local USB3 SSD drives that I rotate weekly.

You can get around 400 MBps write rate on the latter [1], which is pretty decent by itself, but I also use bvckup2 that does block-level delta updates and that cuts down backup time to around 10 minutes. That's on a bit less than 1TB of data mostly comprising very large encrypted file containers and VMs. Worked pretty well so far, went through a couple of restores just fine, so no complaints.

[1] https://ccsiobench.com/s/fxkH0x

What's your use case for such a strict backup regime?

I wouldn't call it strict. It still means an average of 15 min of work data lost in case of primary storage collapse.

Certainly local backup has benefits but where it falls down in practice is that it requires manual intervention from the user. Also, if you don't store the backup media in a safe place (preferably multi-homed) then an event that causes loss of the original may also destroy the backup.

Windows user here. I use the built-in robocopy utility to sync files to one of 2 external hard drives. One hard drive stays at work, the other at home, and I switch them periodically (usually, right after downloading new family photos onto the computer).

With this method, a virus or something like a fire that only affected one location would not be able to take out all copies of my data and, at most, I'd lose the most recently backed-up information.

It's not 100% bullet-proof, but it's good enough for my personal purposes.

No exactly answering your requirement (not using 3rd party) but i use crashplan for multiple reasons:

- they support all platforms (mac, linux, windows) - agent running on your system, detects changes, encrypts data and transparently send stuff to the cloud.

- They support client-side encryption (key or password). If you lose password or key - you screwed.

- they support unlimited versioning of back up files. If you screw your PPT or DOC - you can restore previous version(s).

- they support unlimited destination size (i'm already at 0.5TB and counting)

- they keep deleted files.

- they support removable media attached to your system (which helps me greatly as I do lots of hi-res photography where each image is 50MB. I already got 5TB external drive ready to be sync-ed)

- everything is $120/yr


- above price is per device. So if you need to backup 2 separated computers - you'll need to pay twice, etc...

Conclusion: I been enough through semi-shitty geeky solutions and love crashplan because it's a no brainer to setup and forget. Of course - if they suddenly get out of business or cloud will explode or whatever - then you're screwed. But IMHO chances are - this backup will work for me better than anything i tried before.

I have always kept it simple. A storage array in JBOD (non-raid) setup. I buy two of whatever drive(s) will contain the duplicated data and just do a clean mirror using whatever backup software you desire. I happen to use 4TB HGST drives in a Mediasonic PROBOX 4 tower hooked into a PCIe card in my computer that connects them all through one multichannel eSATA cable. The software I use is Allway Sync (free for up to a certain amount of backed up data per month, after that a one time fee for lifetime updates).

I do one additional thing, and it's for third-party storage. So if you're dead set on cutting out a third-party, I understand. But here me out. If the data is encrypted on the third-party system, it's largely a non-risk in terms of privacy unless your filenames themselves put your privacy at risk. I am fine with my filenames being seen, but the content being encrypted. So what I do is use Boxcryptor to encrypt the "destination" drive and then I use pCloud as my third-party storage provider. Both pCloud and Boxcryptor show up as physical drives on your computer, which I find convenient. So I have my primary source drive synced with the Boxcryptor drive/location and Boxcryptor is set up on my machine to encrypt the data on the fly as it is copied (surprisingly, this is not the default setting). Then, The pCloud software is set up to automatically sync that Boxcryptor directory where the files show in their encrypted format (the "drive" version shows them unencrypted because everything is done on the fly both ways) to my pCloud storage. In the end, I have 3 versions - the unencrypted source, the encrypted Boxcryptor copy, and the pCloud backup which contains the encrypted copy.

Also important to note - pCloud is based in Germany and thus subject to the EU data protection laws and even without that I find pCloud to be a much lesser risk in terms of data privacy than any company in the US.

Just because nobody has mentioned it yet, there is a file format called zpaq [1] which is incremental and journaling with good support for linux, mac and windows. It support multi-threaded compression and error detection and recovery.

[1] http://mattmahoney.net/dc/zpaq.html

While we are on the subject I can imagine a device that would be ideal for some of my clients: A small (deck of cards/paperback book) battery powered device that connects to the office wireless network and is left on the user's desk or in a drawer overnight (when network usage is low) that communicates with the office NAS and stores a backup. The next morning the user sticks it in her purse or briefcase and leaves another for the next night. The one taken home is plugged in to charge for the next time.

This still requires the user to remember this process but it is about as minimal as I can imagine for a local backup that is mostly kept offsite.

I'm thinking about this in particular for a local non-profit that has privacy concerns with storing their data with a 3rd party.

I'm considering putting something together using a Raspberry PI or other SBC. Having the result be under a $100 USD with 128GiB-256GiB of storage would probably be necessary for it to be acceptable.

Another thought that might make it even simpler (for the user) is to have the device download the data at a throttled rate so it could be done during the workday without impacting the network too much. This way it would be unnecessary to leave it overnight or perhaps even take it out of the purse, backpack, or briefcase. It would still be necessary for the user to remember to recharge it and maybe switch it out or rotate it with other devices.

I have a home server with a lot of storage; it's running ZFS with full redundancy. If you're in the US, wait a few months and you can buy an 8TB external HDD when it goes on sale again. WD EasyStore HDDs regularly go on sale [0].

For documents and important files I usually keep an extra copy with two or more cloud storage providers. It's unlikely that these companies would suffer data loss, so the most likely cause for losing access to those files would probably be if they closed my account. Using multiple services reduces the chances of losing access to your files, since it's unlikely you'd be banned from all of them at the same time.

[0] https://www.reddit.com/r/DataHoarder/comments/7fx0i0/wd_easy...

I have a core file server that has a 9TB RAID-Z array in it. This server uses key-based authentication to connect to all of my other hosts - my desktop, VPS, work systems, etc and uses rsync to copy any changes to a locally maintained copy. It also attempts to back up laptops, if they are left on overnight.

Every night this system also reaches out to a remote system in my office that has an 8TB RAID-5 array. It "pushes" any changes, again using rsync, to this remote host. This time it uses the --link-dest argument to create nightly "snapshots." If a file hasn't changed, it simply gets hardlinked. This backup volume is also encrypted, and only the core file server has the key to decrypt it.

For the remote system, I have a script that runs periodically that purges backups according to my own specification. Right now it keeps unlimited yearly backups, three years of monthly backups, six months of weekly backups, and 14 days of daily backups.

Once a month I also run a script to back up the core file server to an external 8TB hard drive, again with --link-dest, creating snapshots. This hard drive is stored in a fireproof safe. It is also encrypted.

I have various levels of redundancy elsewhere, too. (Yes, RAID/redundancy is not a backup). For example, my desktop system has a RAID1 array in it.

In 2000, I installed Windows ME on my desktop system. It corrupted my hard drive somehow, and I lost all of my work prior to then. In case you can't tell, this scarred me for life, and I have worked hard to make sure it cannot happen again.

I'll add, every single storage device that I use is encrypted. Sometimes it's just with a key stored on a separate boot disk (usually an SSD). The rationale being if I ever have to RMA a disk, I don't want the manufacturer to be able to see any of my data.

On another tangent, the remote system mentioned above is actually a rather old GX260 with a SATA expansion card and 3 4TB disks. The boot disk is an ancient EIDE 40GB drive. I keep it around mostly for entertainment at this point. Here's why:

  9 Power_On_Hours          0x0012   001   001   001    Old_age   Always   FAILING_NOW 123756
That's right, the drive has been spinning for a little over 14 years.

Unfortunately I've recently hit a problem where e2fsck runs out of memory (it's a 32-bit CPU, so only 2GiB per process) checking the file system. I tried setting a scratch file, but that bombed for some reason too. So, I may finally have to upgrade the thing to a 64-bit system.

Oh well.

I've done exactly that to a couple of transcend usb sticks and managed to kill one when deleting some files while the rsync was running (to free up enough space for the rsync to finish). I think the controllers on the sticks are just not as good as SSDs. At least it was an obvious problem (drive completely unresponsive after that), but I wouldn't make that your only copy.

The main thing I have been doing is using old hard drives, each in an inexpensive case[0] that can easily be attached to a usb to SATA/IDE adaptor[1] (there are others to consider if you don't have IDE drives, but that one seems to work well and not all of them do; for SATA only look at UASP). You might think that old drives are more likely to die randomly, but I haven't had that happen once yet (good to keep a couple of copies of stuff you care about).

[0] https://www.fasttech.com/p/1645000 [1] https://www.fasttech.com/p/8292100

I sometimes use rsync but also have an archive script that a) runs mtree with sha512 per file and leaves the file in the directory being archived, b) creates a .tar.xz with SHA256 checksum in the xz (except for already compressed stuff, then I either rsync or use gzip with a seperate SHA256 sum), and c) encrypts most things via scrypt -t 5 before hitting unencrypted external disk (ext2 with limited features for maximum portability). Not exactly push a button level convenience but it works ok for me.

If you just use basic rsync you can end up copying corrupted files from your system over the backup before noticing that they are corrupted. I think there are various ways to avoid this, so far I just have multiple copies that I update at different times but there are likely better ways.

Tarsnap seems like a good option for stuff you wouldn't want to loose in a fire, but I haven't used it so far.

I use Back In Time on Linux (which uses rsync) to back up data to a USB external drive. I backup critical data on 2 USB sticks one of which I carry with me at all times, the other one in a bank locker (this version of backup runs a little behind since I have to make trips to the bank). Not the most convenient way to make backups but I find it an acceptable trade off for me.


1 No data in the cloud.

2. Multiple copies of backup one of which is outside the house (to guard against fire etc.)

3. I have the most critical data with me at all times.


1. Data still in same city (does not guard against major earthquake, flood etc.)

2. Manual back ups (not automated on a schedule - though that can be set up on a home network, I just haven't bothered).


ZFS snapshots and clones are like git for your entire filesystem. ZFS replication over ssh is like rsync but more efficient. ZFS supports multiple compression methods so you can optimize for latency or compression ratio. Zpools let you add new disks and grow your filesystem without having to mess around reformatting. Zpools support various redundancy configurations so you can survive disk failures. It also supports checksumming so you can monitor and prevent data corruption.


A 3TB main hard disk HD01. It has everything of mine from 1998, photos, docs, code, media (self-generated) books everything. All things sorted by type in folders (Media, Own Data, Books, Video, Cloud(Dropbox, Yandex, Mega, Drive)).

An exact mirror of above HD01, labelled aa HD02. Mirror updated every 20-30 days.

An exact mirror of above HD01 in cloud(Yandex & Mega). Mirror updated every 20-30 days. This does not get downloaded to PC back to cause recursion (Cloud on Hard Disk & Hard Disk to cloud)

Few working folders on PC, worked upon, work in progress, mirrored back to their original place on HD01 every week or 10 days.

Stuff of two phones & one tablet gets synced to Dropbox, Yandex & Mega as daily or hourly via foldersync app (Whatsapp backup is synced daily whereas Camera folder as soon as possible). This gets downloaded on PC boa Dropbox or other windows versions. This whole cloud gets synced to HD01 every week or 10 days. Also PC's Chrome's download folder also get mirrored to Cloud very often.

Stuff on gitlab, github & tumblr gets downloaded 1st of every month to Google Drive via Google Apps Script with an email report of log. This also gets downloaded to PC via Drive windows app. This Drive folder is synced to HD01 in Cloud folder.

All sync profiles on PC are setup in FreeFileSync with Drive Labels, Exclusions, Inclusions, Filters & limits.

HD02 is at different house, brought in only for updating monthly.

At your scales, bluray discs are probably the most affordable per GiB year (looks like 2.5T spread out across 100x discs is about $80 right now and should last 15-25 years or more). Double or triple your data and you should just use cold hard drives. 10x or more and you should probably use magnetic tapes. Mirror everything 2-3x and do yearly integrity checks (I just write the SHA of my discs on the disc label) and your data should last a good long time.

Thermaltake makes what I lovingly call a hard drive toaster. It’s a SATA to USB disk caddy that’s vertical so gravity holds the drive in. I use all my older .75-2TB drives and Time Machine, which you can set up to alternate backups to multiple drives. I should rotate them out to a deposit box but most of my best stuff is out on the internet anyway.

We used a similar setup - for some reason the drives failed very often.

* 2, 1 TB on Hard Drives, 1 for each MBP. I use time machine

* A dozen or so Flash drives containing our wedding pictures and current copies of our taxes in locked / encrypted volumes.

* Facebook & Google photos for our favorites from vacations

* 4 hard drives scattered through out the house and 1 in our safe

* Typically I bring a flash drive with me when we travel and leave one in the car at the airport

I recently converted all my Linux boxes to use BTRFS filesystems. That allows for snapshots of each sub volume, and you can even send them (incrementally) to another btrfs drive. Last weekend I also set up btrbk, which gives you an environment to automate the process. It let's you snapshot any subvolume to a schedule you configure, and then send backups to any destination you specify via ssh, or even USB drives. Since these are incremental it uses very little bandwidth. It even has an archive function for long term storage, and just like the other backups you can define individual retention policies for each location. Did I mention it also supports non-btrfs support, encryption, and both host/server initiated backups? Awesome. Next weekend I'll set up encrypted archives to the cloud...

If you do that (just rsync every day), what happens when your primary storage gets corrupted, and you rsync the corrupted copy over your USB backup?

Much better to do some sort of incremental or differential backup.

As someone else said, borg is a good way of doing this.

Given you're using cloud storage, you could use rclone to sync (one way, from cloud to local) to a local disk, and then use borg (on your local machine) to backup the local copies to a second drive.

Of course, it's best to have this rclone+borg setup in two locations (e.g. second location can be parents' house or kids' house) so that it's very very unlikely that the borg repo(s) get corrupted due to user error or software malfunction.

borg is awesome, I once happened to talk to the developer of this. He said sometimes people report bugs to him and it turns out these are hardware errors that the integrity check of it found...

I have encfs folder in Dropbox, works reasonably well, but I do not have any automatic process writing to this from two different machines, that may cause trouble. That gives me privacy and keeps the most important files always backed up.

For a bulk backup, once per month I bring one of the 3 portable HDDs from another place, attach it to my server and start my custom backup script that saves encrypted zips of sensitive directories and rsync-backup of everything else to it. At any time 2 or 3 HDDs are outside the house.

HP MicroServer with FreeNAS at home, it pulls down backups from my vservers and pushes encrypted parts of local data back there.

Every few months I manually sync most of the NAS onto an external, encrypted HDD (just bought a fresh 3 TB one) and that will be stashed in other parts of town/out of country with family. It's my last line of defense should the apartment burn down or something. Data might be half a year old, but the last 20 years are mostly there. Also it's not too inconvenient.

Many options here for backups, but I didn't see most of them touch upon backup integrity verification and losing information through bad backups (any of the storage devices/locations suffering from bitrot or other corruption that silently stays and gets propagated) that may not be noticed until it's too late (during restore time, which may happen quite rarely).

So, how do you know your backups are good and that what you're backing up is good?

I use git-annex to do pretty much what you're describing. It takes some getting used to, but it's much better than your suggested method of rsyncing to some USB sticks.

Under the hood it's basically doing that, but everything's SHA-256 checksummed so you can check its validity, and all copies of your data also sync metadata telling them who else has copies, it's easy to enforce policies that some content must have >N copies etc.

I use github for all my private code, and then I backup my entire workspace as an encrypted 7zip file to Google Drive 2-3 times a year (about 10 gigs). There are plenty of services that do client side encryption + cloud service, but Google Drive is probably the only one you can do completely for free.

As a private user, I just use either Clonezilla Live or an old version of Norton Ghost, and burn that to media (DVD-RW lately, it used to be CD-RW until that became unfeasible, and before that floppies). I prefer the RWs because that means I can use the discs from 2 or 3 images ago and recycle them.

Do it the old fashioned way. Full backup to a hard drive, store that offsite and do incremental backups on USB sticks. There are dozens of good tools.

Personally, I manage media separately from normal files, and use hard drives for media, sticks for files.

It’s easier, cheaper and more secure than any backup service.

I’m just going to speak to longevity. If you are worried about the longevity of your data I would keep it simple.

1. Primary data storage 2. Local online backup 3. Off-site backup (rotate/sync regularly) 4. Online backup - Backblaze

All hosts backed up to a NAS w RAID 1. I have an extra disk that gets hot swapped every 3-6 months and it goes into my safety deposit box. No encryption although I might add that after reading this thread.

I use duplicity for create, encrypt backups. To store them I use two USB SSD drives (in two separate locations); for off-site I use AWS' glacier.

The backups are scheduled with jobs in self-hosted Jenkins instances.

We use Synology NAS (with OpenVPN support). It offers hardware RAID 1 as well as encryption. We used to backup into this for ~4 years and it served its purpose.

roughly, my $HOME ( 400Gb ) is rsync'ed to a external RAID ( https://www.amazon.com/Oyen-Digital-MobiusTM-FireWire-Enclos... ) , my preferred media is burn on DVDR , a few specific files go to dropbox and/or gdrive , and all my source code sits on bitbucket.

Backblaze for my desktop backup.

Then for large archival files I use: https://www.backblaze.com/b2/cloud-storage.html

And just use Cyberduck to move the files over. Super cheap and gets the job done. Beauty of B2 is you can use a plethora of clients to move files: https://www.backblaze.com/b2/integrations.html

My upload bandwidth is low enough to make offsite backup not practical (~1mbps). Anyone else with the same issue? How do you handle it?

You can have two encrypted external HDDs and mantain one with you, and the other in a friend/family house. Rotate frequently.

A bunch of RAID disks at home. If you value your data, you should own your data.

RAID isn't a backup

Context. The question was "how do you backup your files." My answer: on my RAID system at home. The RAID system IS the backup.

how about a good old hard disk!?

restic to Google Cloud Storage. Their Coldline storage costs a few euros per month for ~1Tb. And I can easily change backends if I want, restic supports many options.

The only problem is low upload speeds for residential customers.

I have multiple harddrives synced on a daily basis.

Are the hard drives stored in the same location?

Onsite NAS Google Drive

So.. a NAS? AFAIK you can't self-host google drive.

I mentally added a comma between “on-site NAS” and “Google Drive”. Might be wrong.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact