Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bup – An efficient backup system based on the git packfile format (bup.github.io)
90 points by tambourine_man on Feb 19, 2015 | hide | past | favorite | 38 comments


When I had to pick a backup system, I considered Bup until I saw that there were no way to prune old backup ( https://github.com/bup/bup/blob/master/README.md#things-that... ).

This is really a stopblocker for me.

Obnam ( http://obnam.org ) is a similar tool but support forgetting old generations. However, it still suffers for youth problems and tends to corrupt backup repository when pruning old data on a remote server.

I'm really looking forward for a mature and feature full backup system based on git principes. Bup or obnam might be one of those.


I wrote ddar before I knew about bup. It works in a similar way, but uses sqlite and flat files for chunk storage, so removing old archives isn't a problem. I'm not aware of any corruption issues in ddar; I rely on sqlite for integrity.

I modeled ddar after Tarsnap.


ddar sounds like it could be really useful. However, it looks like the last functional change to code base on GitHub was a year and a half ago. What do you think the future of the project is? For something like backups, I wouldn't want to invest into a tool that has no prospect of support.


Don't know if you are aware, but http://www.synctus.com/ddar/ is currently down


I'm aware, thanks. https://github.com/basak/ddar/wiki contains the content, thanks to a kind contributor. I should probably deprecate the old URL and remove references to it.


Perhaps you could have it redirect to the wiki?


Obnam I liked using, because I love the authors attention to testing and details. But I found it started taking longer and longer and longer to run a backup - to the point that it was crazy.

I switched to atic instead, which also started taking longer to run backups as more things changed, and longer still to check the archives, but it is still faster than obnam by a significant margin:

https://attic-backup.org/


I do a monthly offsite backup. I use rdiff-backup for plain text and 7z with encryption for data I want to keep private, but I am on the lookout for more efficient solutions. I looked at bup, but two things stop me from using it: no encryption, and inability to delete old backups to reclaim space. To me, attic (https://attic-backup.org/) looks attractive so this is what I am going to try soon.


I've used rdiff-backup daily for years, but recently needed to store backups on machines I don't control, so I gave duplicity[1] a try. It encrypts, doesn't need to be installed on the target host and is simple to use if you're already familiar with rdiff-backup and gpg. Test restores were simple, but were all first generation, so not really a good test (but there are hints you'll want to prune backups to one month's worth). I haven't figured out an acceptable way to automate it, so I've been running it manually on an infrequent basis while I evaluate it. Give it a look, if you haven't already.


Thank you for the suggestion! I tried duplicity, but it uses full+differential scheme and I cannot prune old backups unless I start a new chain. 7zip with encryption does not do even that, but I went for it for the simplicity's sake for now, because duplicity did not work well with NTFS volumes.


You could try duply (http://duply.net/), a frontend for duplicity.

I use duply for unattended, automated backups.


One of the downsides with duplicity is you have to provide your GPG privkey password in ENV in order for dedupe with GPG encryption to work.


You don't. You can use gpg-agent with --use-agent.

  duplicity inc \
    --sign-key $PGP_PRIV_ID \
    --encrypt_key $PGP_PUB_ID \
    --use-agent \
    ${source} ${dest}


Which would require interactive usage or for you to save your passphrase in a system keychain, no?


I have to give attic my support here. Recently collapsed around 2.5 TB of nightly database backups to about 50 GB. It is very simple to use.


Allow me to plug atticmatic, a wrapper for attic that provides a config file, automatic backup pruning, etc:

https://torsion.org/atticmatic/


Since I see various different backup utilities on here from time to time, I'm curious if anybody uses Bacula (or Bareos)?

It does encryption, deduplication, scheduling, pruning etc. I use it at home and run the director and storage daemons in a jail on my FreeBSD NAS, which coordinates all of my computers together for backup regularly. I then back up the encrypted backups to Crashplan.

I tried other stuff (especially duplicity), but they just didn't work very well for me. Restoring encrypted backups with duplicity was especially slow and annoying and even using the duply frontend it was rather fickle.

Bacula has some disadvantages though... for one it is a bit complicated to get working. It's partitioned into multiple daemons, each of which handles a different task and which can be on different machines. This makes it flexible, but also makes the configuration more complicated. It also works best if you have a central backup server that can run the director and storage daemons 24/7. You can set it up to work all on one computer and just dump to a "dumb" storage drive, but it is really designed to work with a central server.


I've sampled a few, and I've set up and used Bacula. At the time (10 years ago now), it seemed way more complex than I needed (a lot of it seemed to be centered around backing up to tape, which was not our use case). We eventually started using BackupPC, which we were really happy with, except it didn't have a native client for backing up windows, so there was a little weirdness with shadow drives, locked filed, etc. What you got in return was an open source project, a great interface, extreme versatility (do you want to back up using rsync, reverse rsync, tar-piping over SSH, or some custom mechanism?), and extreme space savings (all backup files are hard links to specific file, across multiple backups and separate backed up systems) along with reduced network usage if using rsync led to some interesting use cases we couldn't have done otherwise.

I've been out of that field for a while now through, so I'm not sure how much of that is common nowadays.


Last I checked bacula doesn't do deduplication in any normal sense of the word. It allows you to run a base job and then back up diffs, that's a far cry from normal dedupe.


Oh, yeah you're right. I don't use that feature, so I didn't really know the limitations. But looking at it, base jobs, while useful, are definitely not true deduplication.


Currently, we're using Duplicity for our backups, going to Amazon S3.

I've looked a some of the newer options like Bup, Attic, Obnam etc.

However, none of them seem to support S3 as an endpoint, which is a shame. They all seem to need a full-fledged *nix server on the other end.

Does anybody know of any of these new-fangled de-dupe backup apps that works with just S3?


S3QL may be a good fit for your needs. It's a FUSE filesystem for S3 that supports encrypted dedup snapshots. You can do backups by rsyncing your current state into the filesystem and snapshotting it. I considered using it for my backups but ended up choosing Attic, so I can't say how well it works in practice.


Unfortunately S3QL uses MAC-then-encrypt[1], which is pretty strongly discouraged[2]. Very nice there's a detailed writeup on the details in the docs though, wish more projects did that.

1: http://www.rath.org/s3ql-docs/impl_details.html (last paragraph) 2: http://www.daemonology.net/blog/2009-06-24-encrypt-then-mac....


Not technically a backup app, but camlistore (http://camlistore.org/) kind of does that:

- you push your files to it

- it optionally encrypts the content and hierarchy

- and stores it pretty much anywhere it can handle (on your disk, on S3, on MongoDB, on GCS, on another instance of camlistore, ...)



Here's a writeup of my quest for the perfect backup tool:

http://www.stavros.io/posts/holy-grail-backups/

Summary: Use attic.


I wonder how this compares to venti. Being able to mount with FUSE can be an interesting feature, specially if it can work well even over the network (maybe with some kind of cache, like fossil).


bup can mount the backups with FUSE too. It also have a web interface with WEBDAV support, which our graphists uses to recover their old creations.


Kind of related: What do you use to backup binary files like photos? At the moment I've just got copies on old USB drives, but I'd like to put something on AWS Glacier too.


git-annex. It supports multiple external remotes, including USB drives, Glacier[1], bup and ddar[2]. You can keep files on any combination of the external remotes for redundancy and cost management. It keeps track of what is where, so you don't have to.

The only catch is that you have to make sure to back up the (metadata-only) git repository itself also, and maintaining this backup on anything that is not a direct filesystem is painful.

[1] Through my tool, glacier-cli, for which git-annex has native support.

[2] I wrote ddar and the git-annex support for it.


GitLab CEO here, as you mention backing up the git repository is very important. We just announced free hosting for git-annex files and repositories on GitLab.com that might be good for your use case, see https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-... for an overview.


+1 for git annex. For the git repo, if you don't have privacy concerns, you can just push to a private bitbucket repository. But I just had a script with:

  git clone /my/repo /tmp/backup-git-annex
  tar cvf /tmp/backup-git-annex.tar /tmp/backup-git-annex
That way, you're left with a single tar file to backup.


git-annex website says[1] it is not supposed to be a backup solution:

> git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system

With this in mind, can we treat it as a backup tool? Assuming backing up the git repository itself as you mentioned.

[1] http://git-annex.branchable.com/not/


So git-annex now has support for efficiently deduplicating backends? Might be time for me to give it another look.


For "local" storage as supported by bup and ddar, yes. Generally for external remotes: not yet, AFAIK. http://git-annex.branchable.com/design/assistant/deltas/ has some design thoughts, so I think it is planned.


I have a layered photo backup process:

- New photos live on my laptop (in my Dropbox and covered by Backblaze)

- I manually copy photos from my laptop to a RAID array on a server elsewhere in my house when I'm done editing/processing them

- The server backs up the photos to S3

S3 is the last bastion, so to speak. I expect that normally, I pull any photos I need back off my server. But the house burns down, I go to S3.

I run a cron job on my backup server that picks up all changed files (in my photo storage dir) and pushes them to S3 with s3cmd[1]. It's a pretty 'dumb' process, but works well.

The S3 <-> Glacier lifecycle stuff is pretty cool, so the plan is to eventually enable it to deep-archive older photos.

[1] http://s3tools.org/s3cmd


slight OT, but is there anything that adds transparent encryption to these backup solutions? If you're willing to live without in-file diffs, a FUSE filesystem that presents all files as GPG-encoded or something similar would be interesting, but I've never seen anything convincing like it.


I've being using this for a while now. I even backup movies, it's saves on space.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: