
Bup 0.01: It backs things up - blasdel
http://apenwarr.ca/log/?m=201001#04
======
cperciva
This is actually very similar to the "multitape" utility I wrote back in
October/November 2006 which turned into "multitar" (when I integrated it with
libarchive and made several major optimizations) in January-May 2006 before
turning into Tarsnap (when I added encryption and an online storage protocol).

Important differences between bup and multitape: I used C, not Python; I used
a more sophisticated chunking algorithm; I used tape names rather than just
numbering them.

~~~
e1ven
I'm curious- Was this ever released anywhere? I can see you mentioned a
multitape layer in tarsnap, but I'm curious if you released a standalone
version of the archiving tool.

Hopefully you don't view bup as competition for tarsnap. There's a lot of
situations such as Xen backups, where I really want to dump things to a local
backup machine, then off to tape- I'm not interested in _any_ form of hosted
service, but tools that make binary diffs efficient and easy would certainly
be welcome.

It might also be interesting to expand bup (or multitape/multitar, if it is
public), to use the method described in your thesis paper-
<http://www.daemonology.net/bsdiff/>

I apologize for potentially touching on a delicate subject. I certainly
wouldn't want to come across as advising someone to use your theories to steal
food out of your mouth, but local v. remote bkp seem to be sufficiently
different markets.

~~~
cperciva
_Was this ever released anywhere?_

No; I never really considered it to be useful except as a step towards
Tarsnap.

 _Hopefully you don't view bup as competition for tarsnap_

Not really, no. Of course, the author could follow the same path as I took, of
integrating this with tar code, and end up producing a competitor to Tarsnap.

 _It might also be interesting to expand bup (or multitape/multitar, if it is
public), to use the method described in your thesis
paper-<http://www.daemonology.net/bsdiff/*>

That doesn't really work. Binary diffs are about comparing old and new files
to produce a small patch; snapshotting compares the new file to _a list of
parts of the old file*. It's a tradeoff between needing more local state (with
binary diffs you need to have the old file to compare against) and having
larger deltas (with snapshotting, you have a new chunk even if only part of it
changed).

That said, my experience with bsdiff was certainly useful in terms of shaping
how I think about efficient deltas and compression, so even though none of the
ideas translate directly it definitely helped me in writing Tarsnap.

~~~
pasbesoin
Nit: That URL needs cleansing by adding a space between it and the trailing
asterisk when submitting (a problem with HN's parsing of '❄blah❄' markup that
I've noticed/noted before; I seem to recall the breakage scenario as occurring
when the markup is used at the end of a paragraph).

Test case: The last word of this sentence -- sans any training punctuation --
has the _markup_

Err... nope: It looks like it's only when wrapping a URL in the markup
_<http://www.google.com/*> a̶n̶d̶/̶o̶r̶ ̶w̶h̶e̶n̶ ̶t̶h̶a̶t̶ ̶U̶R̶L̶ ̶i̶s̶
̶a̶t̶ ̶t̶h̶e̶ ̶e̶n̶d̶ ̶o̶f̶ ̶a̶ ̶p̶a̶r̶a̶g̶r̶a̶p̶h̶
̶_̶h̶t̶t̶p̶:̶/̶/̶w̶w̶w̶.̶g̶o̶o̶g̶l̶e̶.̶c̶o̶m̶/̶ _̶

Looks like I need to break some continuing italicization_ now.

Trying the same sort of markup but also wrapping a trailing space produces
_<http://www.google.com/> _, which is better..

\----

Where ❄ represents an asterisk.

------
wisty
Backing up the file system solves the wrong problem. The real problem is that
applications (for example, VMs and databases) don't automatically map onto the
file system, and even sophisticated users (even sophisticated ones like the
Jeff Atwood) don't find this very easy. I can see 3 ways around this:

\- Applications could register with the back-up utility, so bup (for example)
knows to get a hotcopy from the svn repository and a dump from the database.

\- Applications could be told to dump to a specific locations on the file
system on a regular basis.

\- Unix magic could be used, so reading from certain parts of the file system
would trigger a dump from the appropriate applications. I'm not quite sure if
this is possible (I'm a Unix weenie).

I don't care is best (they would all work). The real solution would have the
following features:

\- Automated nags (SVN-style) about dirty looking locations.

\- A white-list of locations to suppress nags on (the same way SVN can be set
to ignore the /bin directory, and the .pyc files).

\- A way to resolve the nags (i.e. telling the backup server what commands to
run in order to backup certain applications).

\- A file format for backup hints (left in a hidden file called .bup in the
program's main directory), so applications could automatically tell give hints
to the backup program on how to get them to dump.

A nice GUI that auto-suggests backp-up commands (with shell integration like
tortise-SVN) would be cool, but not essential.

I've tried to use "applications" consistently in the post. It could mean a
database, a repo, website server, or anything. As long as the "application"
has some sane way to be back-up up.

And yes, I do know that talk is cheap.

~~~
idlewords
Jeff Atwood is a poor example to cite here. His data loss had nothing to do
with the subtleties of how applications map onto the file system; it was just
due to carelessness. His published advice to others (host images on S3 and
back up your files to a different machine than the one you're on) would have
prevented the whole mess.

~~~
wisty
Yeah, but his host messed up by not being able to back up virtual machines.

~~~
idlewords
That's like saying his disk messed up by crashing

------
Braaf
See also Gibak by Mauricio Fernandez

<http://eigenclass.org/hiki/gibak-backup-system-introduction>

<http://eigenclass.org/hiki/gibak-0.3.0>

------
viraptor
I think DAR (<http://dar.linux.free.fr/>) already does most of that and is
more mature at this point...

~~~
e1ven
Dar has a lot of features (<http://dar.linux.free.fr/doc/Features.html>), but
I don't see iterative backups listed- That's what's really useful about bup- I
make daily snapshots of hundreds of VM images, and spend a LOT of disk space
keeping past copies, just in case...

I generally keep complete copies from the past, since there's no easy way to
say "Use this 20G image file as the base, then store the changes in this
.iterativebkp file."

bup looks like a nice way to do that, but it'll need to mature a bit more
(Like.. Pruning bkps..) before I could deploy it, even as a test system.

~~~
qjz
rdiff-backup (<http://rdiff-backup.nongnu.org/>) is very mature and efficient,
using the rsync algorithm to store incrementals and save bandwidth. I use it
for automated daily local/remote backups. I regularly use it to back up VMs.
Restores are a snap. It has some nice features, like the ability to keep only
N days of backups to save space for noncritical data.

------
ivenkys
Offtopic - good on you for working over the Christmas break , i am finding it
hard to get back into rhythm.

------
pwmanagerdied
Not useful to me, but I love seeing stuff built that's compatible with Git.

