
Time-Machine-style backup with rsync - plessthanpt05
https://github.com/laurent22/rsync-time-backup
======
quesera
_The_ key feature of Time Machine is hard links to directories -- which is
only possible on modern HFS+ (and rsync doesn't even try). Some people like
the UI too, of course.

Without hard linked directories, a full --link-dest backup of a decent sized
disk, with zero file changes from the previous pass, can easily consume 100MB
(and take 45 minutes to perform).

This disk consumption might seem insignificant, today, but that's 2.4GB per
day if you run a standard Time Machine equivalent backup schedule. Of course
you might not choose to do that, because the previous hour's backup would only
finish 15 mins before the next one started, which is insane.

These numbers are from direct experience on a 2TB, approximately 60% utilized
source drive.

That said, I use rsync, not Time Machine, for my OSX backups. You'll want a
few additional switches for HFS+, and if your target drive is HFS+ also, make
sure you turn OFF "ignore ownership on this volume" in the Finder... but the
script posted here has the right general idea. Somewhere on my project list is
adding directory hardlinking to rsync.

~~~
laurent123456
The --link-dest option of rsync, used in this project, actually creates hard
links.

~~~
quesera
...to files, but not to directories, which is the key point.

------
ta_tmachine
When will this trend of "like time machine" backup software going to stop ?

Time Machine, as can be seen here
[http://www.apple.com/support/timemachine/](http://www.apple.com/support/timemachine/)
time machine is tightly integrated in the os and provides a self-defining
interface and user experience.

This github page is for a wrapper shell script around rsync, which is _not_
like time machine.

This has been going on for a while now, see Timevault
([https://wiki.ubuntu.com/TimeVault](https://wiki.ubuntu.com/TimeVault) ),
back in time ([http://backintime.le-web.org/](http://backintime.le-web.org/) )
or flyback ([http://www.flyback-project.org/](http://www.flyback-project.org/)
).

Please telling us your backup solution is like time machine when it lacks the
kind of UX time machine offers, thanks !

~~~
hnha
Please elaborate on what you are missing from the wanna-bes

~~~
pudquick
I would guess something like this:

[http://www.youtube.com/watch?v=RDPzVdohrck#t=1m13s](http://www.youtube.com/watch?v=RDPzVdohrck#t=1m13s)

Oh, and integration with the OS X recovery partition / OS reinstallation
mechanism that allows you to point to a Time Machine backup as the recovery
point for your re-installation.

~~~
icarus_drowning
The integration with OS reinstall works very well, and is pretty seamless from
the user's perspective. I used to do two backups-- a local TM backup and a
separate cloud backup, but I found it was actually easier to just use
Automator to up mount my TM volume once a day, image it, and send that up to
the cloud. When my TM volume failed last year, I just pulled the latest image
and put it on a replacement drive, and I was back up and running.

TM has had a few problems, but by and large it is one of the quiet successes
in OS X, and probably my favorite feature if the OS. Why Microsoft hasn't put
something like it in Windows is baffling to me.

~~~
UntitledNo4
Could you please share how you did that in Automaton? I'd love to do that to
my backups.

~~~
icarus_drowning
I found that just running an Automator script for the "new disk image from
selection" command on the root of the drive worked perfectly-- set an iCal
event to run that script once a day, and you're done.

Be sure to test this to make sure it restores, but in my case it works
flawlessly.

~~~
UntitledNo4
Thanks!

------
molecule
This sounds a lot like rdiff-backup, which uses rsync and hard links to
provide incremental backups:

[http://rdiff-backup.nongnu.org/features.html](http://rdiff-
backup.nongnu.org/features.html)

~~~
hannibal5
or rsync into zfs filesystem with snapshots.

~~~
unhammer
or rsync into (optionally LUKS-encrypted) btrfs filesystem with snapshots.

------
whalesalad
This is really cool. For me, most of my data is "in the cloud" now. Important
and frequently-accessed project files are in Dropbox, code is on Github, and
most of my music is either streamed from the web in Spotify or my actual
library is stored/streamed via iTunes Match. Because of this, I actually don't
have a considerable amount of data to keep backed-up, and a lightweight non-
time-machine solution like this looks perfect. I tried rsync before but never
got into a solid routine.

I've become obsessed with the "12factor" app approach everywhere in my digital
life, so that if a device ever disappeared, was stolen, or died, I could get a
replacement fully operational without any problems. Like an app-server dying,
just launch a new one and it will bootstrap itself.

I'm kinda crazy with my new "homelab" and started it off with an old 2U
Poweredge I got on eBay for about $200. It's cheaper than a Synology/Drobo,
has room for 6 drives, and the dual quad-core processors + 16GB of RAM is
pretty cool too. It's running FreeNAS right now in a VM with a 3TB ZFS pool.
I've created an AFP share that appears to my mac as Time Machine and over my
gigabit-network it does a pretty fast backup. I really like the FreeNAS
software. It's open-source, runs on FreeBSD, and the UI/admin tool is built in
Django.

------
fbristow
This looks great, doesn't seem to need anything other than rsync installed.

You should also check out rsnapshot:
[http://www.rsnapshot.org/](http://www.rsnapshot.org/), it does a great job
and has many of the features that this script does.

~~~
jzawodn
Agreed. I've used rsnapshot for years and am not sure how this script really
differs (better or worse). I can say that rsnapshot is a real workhorse that
gives me a lot of peace of mind.

------
andmarios
I have written a similar script that I use for many years now. These sort of
backups are very convenient.

I started too with hard links but nowadays I prefer to format my backup disk
with BTRFS and use btrfs snapshots instead, though I still support hard links.

I prefer btrfs snapshots due to their support for COW, so if I decide to play
with a backup I won't mess all the other versions of this backup. With hard
links you should never write to your existing backups.

Lately I added a helper script to mount remote filesystems and lock mysql
databases but it could be easier to use.

Most of my code is checks to make sure I won't write somewhere I shouldn't to.

[https://github.com/andmarios/mrbStudio](https://github.com/andmarios/mrbStudio)

------
beagle3
As quesera noted below, on a not-so-big modern disk with 500,000 files, the
metadata can easily be in the 50-100MB range, which adds up to >1GB for
metadata (even when nothing has changed) if you back up every hour.

You should, however, consider bup
([https://github.com/bup/bup](https://github.com/bup/bup)) - it takes less
than a minute to figure out nothing is done, it deduplicates _parts_ of files,
(that is, if you have a 20GB virtual machine image, and you've changed one
byte in the middle of it, then the next snapshot is going to take ~10KB, not
20GB). The older release don't keep ownership/modification time, but there's a
new version pending release soon that does.

It also works well remotely (through ssh), can do an integrity check (bup
fsck), redundancy (using par2; important after deduplication). And it has a
fuse frontend that makes it all accessible as a file system, as well as an ftp
frontend.

bup is teh awesome.

------
ilikejam
Missing some options for extended attributes (SELinux, ACLs) and (on OSX)
resource forks.

Also, if you're doing multiple backups of the same data to a filesystem over
time, it's worth doing 'cp -al' from the previous backup to the current backup
destination, then rsync over the top of that - that way multiple copies of
files which haven't changed don't take up any extra space.

~~~
tzs
> Also, if you're doing multiple backups of the same data to a filesystem over
> time, it's worth doing 'cp -al' from the previous backup to the current
> backup destination, then rsync over the top of that - that way multiple
> copies of files which haven't changed don't take up any extra space

Isn't that already taken care of by its use of rsync's --link-dst option?

~~~
ilikejam
Depends if the backup's on the same filesystem as the source.

~~~
greglindahl
You don't want to hard-link backups to the non-backup copy -- the non-backup
files might change, and if it is a change that doesn't remove the file first,
it will modify the backups if the backups are hard linked.

Doesn't vim intentionally do this if it edits files that are hard linked?
emacs breaks the hard link. I'm not saying either behavior is best, but in
this case one might be surprising!

~~~
ilikejam
Good call.

------
davidcollantes
Although useful, this is nothing like Time Machine.

------
scottlu2
Link-backup does this as well, plus it knows how to build hard links to old
backups even when directory structure or filenames change, effectively de-dup
support. It does this by building a content addressable index on the
destination filesystem that backup trees hard-link against.

[http://www.scottlu.com/Content/Link-
Backup.html](http://www.scottlu.com/Content/Link-Backup.html)

------
bello
If you're looking for file snapshots and versioning, I've found Back In Time
([http://backintime.le-web.org/](http://backintime.le-web.org/)) to be
awesome.

~~~
ta_tmachine
Better yet, try zfs.

~~~
XorNot
ZFS solves the hard-link problem spectacularly well (and adds a whole bunch of
data-integrity verification on top of that) with snapshots.

What I don't like about that solution is it can't dedupe (ZFS dedupe just
ultimately doesn't work very well). Hence my interest (and if anyone checks my
post history, my constant spruiking of) bup - which does efficient dedupe and
output of git pack-files. Stick that on a ZFS volume with snapshots, and
you've got block-level checksummed, versioned and deduplicated backups.

What it's all missing of course, is a pleasant interface to use it with (one
which doesn't fallback to the thing I see way too often in a lot of these
scripts "don't worry, we're just going to stat your entire filesystem every 20
minutes).

------
frank_boyd
Are there any advantages over Déjà Dup?

------
mikebo
When you restore a backup using this method is the drive bootable?

