Hacker News new | past | comments | ask | show | jobs | submit login
Bup 0.01: It backs things up (apenwarr.ca)
84 points by blasdel on Jan 4, 2010 | hide | past | favorite | 14 comments



This is actually very similar to the "multitape" utility I wrote back in October/November 2006 which turned into "multitar" (when I integrated it with libarchive and made several major optimizations) in January-May 2006 before turning into Tarsnap (when I added encryption and an online storage protocol).

Important differences between bup and multitape: I used C, not Python; I used a more sophisticated chunking algorithm; I used tape names rather than just numbering them.


I'm curious- Was this ever released anywhere? I can see you mentioned a multitape layer in tarsnap, but I'm curious if you released a standalone version of the archiving tool.

Hopefully you don't view bup as competition for tarsnap. There's a lot of situations such as Xen backups, where I really want to dump things to a local backup machine, then off to tape- I'm not interested in any form of hosted service, but tools that make binary diffs efficient and easy would certainly be welcome.

It might also be interesting to expand bup (or multitape/multitar, if it is public), to use the method described in your thesis paper- http://www.daemonology.net/bsdiff/

I apologize for potentially touching on a delicate subject. I certainly wouldn't want to come across as advising someone to use your theories to steal food out of your mouth, but local v. remote bkp seem to be sufficiently different markets.


Was this ever released anywhere?

No; I never really considered it to be useful except as a step towards Tarsnap.

Hopefully you don't view bup as competition for tarsnap

Not really, no. Of course, the author could follow the same path as I took, of integrating this with tar code, and end up producing a competitor to Tarsnap.

It might also be interesting to expand bup (or multitape/multitar, if it is public), to use the method described in your thesis paper- http://www.daemonology.net/bsdiff/*

That doesn't really work. Binary diffs are about comparing old and new files to produce a small patch; snapshotting compares the new file to a list of parts of the old file*. It's a tradeoff between needing more local state (with binary diffs you need to have the old file to compare against) and having larger deltas (with snapshotting, you have a new chunk even if only part of it changed).

That said, my experience with bsdiff was certainly useful in terms of shaping how I think about efficient deltas and compression, so even though none of the ideas translate directly it definitely helped me in writing Tarsnap.


Nit: That URL needs cleansing by adding a space between it and the trailing asterisk when submitting (a problem with HN's parsing of '❄blah❄' markup that I've noticed/noted before; I seem to recall the breakage scenario as occurring when the markup is used at the end of a paragraph).

Test case: The last word of this sentence -- sans any training punctuation -- has the markup

Err... nope: It looks like it's only when wrapping a URL in the markup http://www.google.com/* a̶n̶d̶/̶o̶r̶ ̶w̶h̶e̶n̶ ̶t̶h̶a̶t̶ ̶U̶R̶L̶ ̶i̶s̶ ̶a̶t̶ ̶t̶h̶e̶ ̶e̶n̶d̶ ̶o̶f̶ ̶a̶ ̶p̶a̶r̶a̶g̶r̶a̶p̶h̶ ̶̶h̶t̶t̶p̶:̶/̶/̶w̶w̶w̶.̶g̶o̶o̶g̶l̶e̶.̶c̶o̶m̶/̶̶

Looks like I need to break some continuing italicization now.

Trying the same sort of markup but also wrapping a trailing space produces http://www.google.com/ , which is better..

----

Where ❄ represents an asterisk.


Backing up the file system solves the wrong problem. The real problem is that applications (for example, VMs and databases) don't automatically map onto the file system, and even sophisticated users (even sophisticated ones like the Jeff Atwood) don't find this very easy. I can see 3 ways around this:

- Applications could register with the back-up utility, so bup (for example) knows to get a hotcopy from the svn repository and a dump from the database.

- Applications could be told to dump to a specific locations on the file system on a regular basis.

- Unix magic could be used, so reading from certain parts of the file system would trigger a dump from the appropriate applications. I'm not quite sure if this is possible (I'm a Unix weenie).

I don't care is best (they would all work). The real solution would have the following features:

- Automated nags (SVN-style) about dirty looking locations.

- A white-list of locations to suppress nags on (the same way SVN can be set to ignore the /bin directory, and the .pyc files).

- A way to resolve the nags (i.e. telling the backup server what commands to run in order to backup certain applications).

- A file format for backup hints (left in a hidden file called .bup in the program's main directory), so applications could automatically tell give hints to the backup program on how to get them to dump.

A nice GUI that auto-suggests backp-up commands (with shell integration like tortise-SVN) would be cool, but not essential.

I've tried to use "applications" consistently in the post. It could mean a database, a repo, website server, or anything. As long as the "application" has some sane way to be back-up up.

And yes, I do know that talk is cheap.


Jeff Atwood is a poor example to cite here. His data loss had nothing to do with the subtleties of how applications map onto the file system; it was just due to carelessness. His published advice to others (host images on S3 and back up your files to a different machine than the one you're on) would have prevented the whole mess.


Yeah, but his host messed up by not being able to back up virtual machines.


That's like saying his disk messed up by crashing



I think DAR (http://dar.linux.free.fr/) already does most of that and is more mature at this point...


Dar has a lot of features (http://dar.linux.free.fr/doc/Features.html), but I don't see iterative backups listed- That's what's really useful about bup- I make daily snapshots of hundreds of VM images, and spend a LOT of disk space keeping past copies, just in case...

I generally keep complete copies from the past, since there's no easy way to say "Use this 20G image file as the base, then store the changes in this .iterativebkp file."

bup looks like a nice way to do that, but it'll need to mature a bit more (Like.. Pruning bkps..) before I could deploy it, even as a test system.


rdiff-backup (http://rdiff-backup.nongnu.org/) is very mature and efficient, using the rsync algorithm to store incrementals and save bandwidth. I use it for automated daily local/remote backups. I regularly use it to back up VMs. Restores are a snap. It has some nice features, like the ability to keep only N days of backups to save space for noncritical data.


Offtopic - good on you for working over the Christmas break , i am finding it hard to get back into rhythm.


Not useful to me, but I love seeing stuff built that's compatible with Git.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: