Obnam ( http://obnam.org ) is a similar tool but support forgetting old generations. However, it still suffers for youth problems and tends to corrupt backup repository when pruning old data on a remote server.
I'm really looking forward for a mature and feature full backup system based on git principes. Bup or obnam might be one of those.
I wrote ddar before I knew about bup. It works in a similar way, but uses sqlite and flat files for chunk storage, so removing old archives isn't a problem. I'm not aware of any corruption issues in ddar; I rely on sqlite for integrity.
ddar sounds like it could be really useful. However, it looks like the last functional change to code base on GitHub was a year and a half ago. What do you think the future of the project is? For something like backups, I wouldn't want to invest into a tool that has no prospect of support.
I'm aware, thanks. https://github.com/basak/ddar/wiki contains the content, thanks to a kind contributor. I should probably deprecate the old URL and remove references to it.
Obnam I liked using, because I love the authors attention to testing and details. But I found it started taking longer and longer and longer to run a backup - to the point that it was crazy.
I switched to atic instead, which also started taking longer to run backups as more things changed, and longer still to check the archives, but it is still faster than obnam by a significant margin:
I do a monthly offsite backup. I use rdiff-backup for plain text and 7z with encryption for data I want to keep private, but I am on the lookout for more efficient solutions. I looked at bup, but two things stop me from using it: no encryption, and inability to delete old backups to reclaim space. To me, attic (https://attic-backup.org/) looks attractive so this is what I am going to try soon.
I've used rdiff-backup daily for years, but recently needed to store backups on machines I don't control, so I gave duplicity[1] a try. It encrypts, doesn't need to be installed on the target host and is simple to use if you're already familiar with rdiff-backup and gpg. Test restores were simple, but were all first generation, so not really a good test (but there are hints you'll want to prune backups to one month's worth). I haven't figured out an acceptable way to automate it, so I've been running it manually on an infrequent basis while I evaluate it. Give it a look, if you haven't already.
Thank you for the suggestion! I tried duplicity, but it uses full+differential scheme and I cannot prune old backups unless I start a new chain. 7zip with encryption does not do even that, but I went for it for the simplicity's sake for now, because duplicity did not work well with NTFS volumes.
Since I see various different backup utilities on here from time to time, I'm curious if anybody uses Bacula (or Bareos)?
It does encryption, deduplication, scheduling, pruning etc. I use it at home and run the director and storage daemons in a jail on my FreeBSD NAS, which coordinates all of my computers together for backup regularly. I then back up the encrypted backups to Crashplan.
I tried other stuff (especially duplicity), but they just didn't work very well for me. Restoring encrypted backups with duplicity was especially slow and annoying and even using the duply frontend it was rather fickle.
Bacula has some disadvantages though... for one it is a bit complicated to get working. It's partitioned into multiple daemons, each of which handles a different task and which can be on different machines. This makes it flexible, but also makes the configuration more complicated. It also works best if you have a central backup server that can run the director and storage daemons 24/7. You can set it up to work all on one computer and just dump to a "dumb" storage drive, but it is really designed to work with a central server.
I've sampled a few, and I've set up and used Bacula. At the time (10 years ago now), it seemed way more complex than I needed (a lot of it seemed to be centered around backing up to tape, which was not our use case). We eventually started using BackupPC, which we were really happy with, except it didn't have a native client for backing up windows, so there was a little weirdness with shadow drives, locked filed, etc. What you got in return was an open source project, a great interface, extreme versatility (do you want to back up using rsync, reverse rsync, tar-piping over SSH, or some custom mechanism?), and extreme space savings (all backup files are hard links to specific file, across multiple backups and separate backed up systems) along with reduced network usage if using rsync led to some interesting use cases we couldn't have done otherwise.
I've been out of that field for a while now through, so I'm not sure how much of that is common nowadays.
Last I checked bacula doesn't do deduplication in any normal sense of the word. It allows you to run a base job and then back up diffs, that's a far cry from normal dedupe.
Oh, yeah you're right. I don't use that feature, so I didn't really know the limitations. But looking at it, base jobs, while useful, are definitely not true deduplication.
S3QL may be a good fit for your needs. It's a FUSE filesystem for S3 that supports encrypted dedup snapshots. You can do backups by rsyncing your current state into the filesystem and snapshotting it. I considered using it for my backups but ended up choosing Attic, so I can't say how well it works in practice.
Unfortunately S3QL uses MAC-then-encrypt[1], which is pretty strongly discouraged[2]. Very nice there's a detailed writeup on the details in the docs though, wish more projects did that.
I wonder how this compares to venti. Being able to mount with FUSE can be an interesting feature, specially if it can work well even over the network (maybe with some kind of cache, like fossil).
Kind of related: What do you use to backup binary files like photos? At the moment I've just got copies on old USB drives, but I'd like to put something on AWS Glacier too.
git-annex. It supports multiple external remotes, including USB drives, Glacier[1], bup and ddar[2]. You can keep files on any combination of the external remotes for redundancy and cost management. It keeps track of what is where, so you don't have to.
The only catch is that you have to make sure to back up the (metadata-only) git repository itself also, and maintaining this backup on anything that is not a direct filesystem is painful.
[1] Through my tool, glacier-cli, for which git-annex has native support.
[2] I wrote ddar and the git-annex support for it.
GitLab CEO here, as you mention backing up the git repository is very important. We just announced free hosting for git-annex files and repositories on GitLab.com that might be good for your use case, see https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-... for an overview.
+1 for git annex. For the git repo, if you don't have privacy concerns, you can just push to a private bitbucket repository. But I just had a script with:
git clone /my/repo /tmp/backup-git-annex
tar cvf /tmp/backup-git-annex.tar /tmp/backup-git-annex
That way, you're left with a single tar file to backup.
- New photos live on my laptop (in my Dropbox and covered by Backblaze)
- I manually copy photos from my laptop to a RAID array on a server elsewhere in my house when I'm done editing/processing them
- The server backs up the photos to S3
S3 is the last bastion, so to speak. I expect that normally, I pull any photos I need back off my server. But the house burns down, I go to S3.
I run a cron job on my backup server that picks up all changed files (in my photo storage dir) and pushes them to S3 with s3cmd[1]. It's a pretty 'dumb' process, but works well.
The S3 <-> Glacier lifecycle stuff is pretty cool, so the plan is to eventually enable it to deep-archive older photos.
slight OT, but is there anything that adds transparent encryption to these backup solutions? If you're willing to live without in-file diffs, a FUSE filesystem that presents all files as GPG-encoded or something similar would be interesting, but I've never seen anything convincing like it.
This is really a stopblocker for me.
Obnam ( http://obnam.org ) is a similar tool but support forgetting old generations. However, it still suffers for youth problems and tends to corrupt backup repository when pruning old data on a remote server.
I'm really looking forward for a mature and feature full backup system based on git principes. Bup or obnam might be one of those.