
Ask HN: What are your personal backup / sync strategies? - jondot
Hi all,
Due to recent data loss, my need for a solid foolproof way to backup has resurrected itself.<p>I have:<p>* Google Drive (100GB tier 2$&#x2F;mo)<p>* Amazon Cloud Drive (Unlimited&#x2F;any file tier)<p>EDIT: I also have my home Synology NAS<p>My workflow&#x2F;strategy that failed:<p>* GDrive syncs only the root files and not all folders. I keep there notes mainly.<p>* I once in a while upload RAW image files from my Nikon DSLR  to the Amazon Cloud Drive.<p>* I once in a month connect my external HDD to do Time Machine backup.<p>* Most of my personal projects are in Github and Bitbucket and I commit and push in a sane way.<p>* I have an &#x27;experiments&#x27; and &#x27;projects&#x27; folders in which I do most my work. &#x27;Experiments&#x27; is a project that is incubating -  it doesn&#x27;t have a git repo or presence. A project is something that&#x27;s fully maintained and has a Git repo, and I&#x27;ve decided to invest myself in officially.<p>During the HDD failure I&#x27;ve lost around 2-3 weeks of hard work, which i spent my vacation re-doing. Most of the work was POC&#x27;s that I didn&#x27;t consider a &quot;project&quot; and commits I hadn&#x27;t pushed yet (no special reason). I worked on many projects in parallel.<p>My thoughts right now:<p>* Everything should live within a Git repo? (but then I need to force myself to push all the time)<p>* Everything should live on Google Drive? My entire home folder? anyone doing this?<p>* Then again - what should be on Amazon Cloud Drive? it is unlimited..<p>Please share your backup strategy if you can :)<p>Thanks!
======
derekp7
For me, the method I've found most comfortable are snapshot style whole disk
backups. The common go-to solution is rsync/snapshots (ala rsnapshot or
similar utilities), which I've used in the past. However this only does
partial deduplication (it deduplicates files that are in the same path from
one backckup to the next), and it doesn't compress. So to remedy that I wrote
Snebu ([http://www.snebu.com](http://www.snebu.com)), which stores lzo-
compressed files on the target using sha1 hash for the file name (somewhat
like Git does), and has an sqlite DB to store the metadata (file paths,
ownership, backup set name/date, etc). Therefore you get full file-level dedup
even across multiple clients writing to the same backend. On the client side,
it uses standard GNU find and tar, so you can use it on Windows with the help
of Cygwin. Each backup set has a retention schedule label (daily, weekly, etc)
to make it easy to selectively purge old backups (i.e., keep the current 10
daily, 6 weekly, and 12 monthly backups).

I've got the client side script tweaked so that you can run it as a standalone
setup (backup to a local USB drive), or backup to a remote Linux server via a
non-privileged account, or have the remote backup server pull backups from
your clients. The features that are currently in development for the next
version include client-side encryption, and pluggable backend replication, so
you can replicate your backup sets to cloud providers, or tape, or to another
Snebu installation.

I can definitely use some more feedback, probably need to do another round of
documentation improvements, but overall the code seems quite stable. The
project sites is up on github (github.com/derekp7/snebu) if you are interested
in contributing.

------
CamTin
I can't recommend Tarsnap
([https://www.tarsnap.com/](https://www.tarsnap.com/)) enough for personal
backups for hacker-types. Everything is encrypted client-side, so don't lose
you key or thee is literally nothing you can do to get at your data again.
It's also deduplicated and compressed so you can do snapshots often (on my
machines I have a cron that runs every six hours) without going broke since
(similar to git) you're only storing changes.

~~~
walterbell
How does dedupe work with client-side encryption -- are the snapshots and
diffs done on the client before encryption and upload?

~~~
CamTin
Yes, exactly. The client source is available too if you like, though it's not
really free software since the license still only allows you to use it with
the official Tarsnap service.

~~~
walterbell
Sounds useful, I need to look for FOSS software that can do the same for local
storage targets.

~~~
derekp7
As soon as I finish the client-side encryption module for Snebu, would anyone
be willing to give it a quick audit? I want to make sure I'm not making any
critical mistakes (I will be using external libraries, such as openssl for
encryption, but still...)

------
rwhitman
Personally I have a NAS that gets a Time Machine backup every day. I have a
handful of important folders on my hard drive that I mirror to the NAS
manually (various scheduled solutions have failed me unfortunately). Every few
months or so I do some housecleaning and take older projects, personal files
off my laptop HD and archive permanently on the backup drive (which is RAID
and mirrored)

I have cloud services for anything I share with clients or friends but I
honestly don't trust them for anything critical. I actually back up Dropbox to
the NAS too. Problem being that the sync strategy across multiple clients can
wipe a folder shared between them. It's also a beast to recover from a cloud
backup

The trick I'm most proud of is at the end of each year I buy a new external
backup drive (now NAS), copy the old one to the new drive, and then label the
old drive with the year and store it somewhere safe. It costs a few $$ each
year to archive the drives but well worth it - they have saved my butt more
than once. I also buy a new laptop workstation every 2 years or so and
preemptively migrate everything to the new machine before the old machine
dies, and keep the old one around as a spare.

------
smt88
My code syncs to Dropbox. All of my projects (even experiments) are committed
to private Github repos. I have two machines, so Dropbox essentially creates a
physical backup on my secondary machine, which is nice.

I also have scheduled backups to an external drive, which enables file history
as well.

So for me to lose my files, my main machine, my secondary machine, my external
drive, Github, and Dropbox would all have to lose my data before I could get
to the backups at any one of those locations (and all of them have incremental
backups enabled, so I keep my entire history of changes).

Edit: Most of my clients and several of my organizations use Dropbox to send
secret stuff, so Dropbox is already a single point-of-failure for my security.
However, I am planning to move to something self-hosted and encrypted at some
point. I'd love to put encrypted snapshots in Glacier from time to time and
then have my main backups somewhere more accessible.

~~~
jondot
Thanks - so your code lives both on Dropbox and Github? Also - you probably ln
-s'd your workspace folder to your Dropbox folder?

~~~
smt88
Every single time I hit save, Dropbox saves another copy of the file, so I
have a complete file history. Github only has the commits.

I don't actually know how Dropbox does with symlinks. I just put my workspace
folder in my Dropbox folder.

~~~
andymurd
Dropbox works fine with symlinks, at least on Ubuntu. I do exactly what the GP
mentions.

------
NeutronBoy
Local fileserver with RAID as primary store for files, and other random
backups from various machines on my network. This is synced to Tarsnap (highly
recommended) every night. Additionally, encrypted USB key used for other very
important files (Keepass DB, Tarsnap keys, etc) on an ad-hoc basis (given
these don't change very often).

The key is to keep it simple and easy.

------
butwhy
I'm scared to back up the bulk of my data because I don't want to put it on
dropbox or s3 because they could get hacked or otherwise divulge to law
enforcement. And encrypting it all first is a pain.

~~~
auxym
Similar here, don't really want to do a full backup of all my personal files
and such to dropbox.

Right now I only do a local backup to a usb HD. However, I'm considering
getting an amazon EBS volume and setting up a Ruby script or something to put
up a micro server, compress and encrypt my files and sent them out the the aws
volume.

Backblaze/crashplan is cheaper, but they use a server side key, which is
pretty pointless. Glacier would be cheaper but seems more complicated to set
up.

~~~
charsplat
FYI. Crashplan is pretty cheap and purports to have client-side encryption.
They are closed source but they are setup to have you provide a key in the
backup client.

~~~
AndrewAtCode42
We do have client side encryption. You can check out our document outlining
the features here:
[https://support.code42.com/CrashPlan/Latest/Configuring/Arch...](https://support.code42.com/CrashPlan/Latest/Configuring/Archive_Encryption_Key_Security)

------
adrianhoward
For our "internal" stuff:

* Monthly whole drive snapshots to an external HD

* Daily Time Machine backups to an external HD

* Everything backed up continually with CrashPlan

* All work stuff synced continuously in DropBox

* Monthly restore-a-random-file test to sanity check

~~~
walterbell
Are your whole drive snapshots deduplicated?

~~~
adrianhoward
No. But I only keep the last four months.

