
Comparison of Attic vs. Bup vs. Obnam - Brian-Puccio
http://librelist.com/browser//attic/2015/3/31/comparison-of-attic-vs-bup-vs-obnam
======
avar
From the article:

    
    
        Of particular concern is that Obnam has a theoretical collision
        potential, in that if a block has the same MD5 hash as another
        block, it will assume they are the same. This behaviour is the
        default, but can be mitigated by using the verify option. I tried
        with and without, and interestingly did not notice any speed
        difference (2 seconds, which is statistically insignificant) and
        also did not encounter any bad data on restoration. So I don't
        know why it's off by default.
    

Worrying about this violates Taylor's Law of Programming Probability[1]:

    
    
        The theoretical possibility of a catastrophic occurrence in your
        program can be ignored if it's less likely than the entire
        installation being wiped out by meteor strike.
    

I've seen a lot of sysadmins or programmers nitpick systems that have the
theoretical possibility of md5 or sha1 collisions, but it's _amazingly_
unlikely to happen in something like a backup system where you're backing up
your own data, and not taking hostile user data where the users might be
engineering collisions:

1\.
[http://www.miketaylor.org.uk/tech/law.html](http://www.miketaylor.org.uk/tech/law.html)

~~~
Uberphallus
It's unlikely to happen by chance, but it can be quite vulnerable to malicious
attacks.

~~~
avar
"Quite". Let's look at the potential attack. You're running a backup system
with user-supplied data, fair enough, and one of your users has:

    
    
        1) Access to an existing object, or its checksum.
    
        2) Can write a *new* object where they intentionally
           produce a collision with an existing object.
    

There's a trivial way to get around this attack in practice, which is that you
just lazily write objects and don't re-write an object that exists already.
This is what Git does with the objects it writes, which insulates it more from
future SHA-1 collision attacks than just the security you'd get from SHA-1
itself.

This means that you've changed an attack where someone can maliciously clobber
an existing object to an edge case where their object just won't get backed
up.

~~~
detaro
Assuming of course that the object they want to clobber is either already
backed up or processed before the malicious object. They can still attack a
new object.

------
reedlaw
I wish there was more information about the kind of data corruption caused by
Attic. Is there a related issue here? [1]

1\.
[https://github.com/jborg/attic/search?q=corrupt&type=Issues&...](https://github.com/jborg/attic/search?q=corrupt&type=Issues&utf8=%E2%9C%93)

~~~
static_noise
In addition, some form corruption is expected to happen due to factors besides
the backup software, such as bugs in the kernel or hardware errors.

What I wonder more is why the corruption of the file data wasn't caught by the
backup software since it uses checksums for deduplication.

A third option is user error such as that between the (user) checksums and the
backup there were changes to some files.

------
zobzu
YA: [http://zbackup.org/](http://zbackup.org/)

YA: [http://duplicity.nongnu.org/](http://duplicity.nongnu.org/)

~~~
static_noise
Care to elaborate how they compare? Do they fit the use case of millions of
files and terabytes of data?

~~~
StavrosK
I don't like duplicity very much, it requires you to reupload everything every
so often (because it uses base backups and then diffs on top of that), which
won't work for my slow connection and large dataset.

~~~
zobzu
It doesn't require that, you choose how often it should do it (which can be
never)

I do agree the UX isn't always the greatest CLI UX tho, albeit it "works for
me" well enough. Wouldnt mind an easier solution tho.

------
static_noise
Isn't the real conclusion that all three tools failed the test case at some
point (data corruption/too slow/aborted)?

Which backup tools should we use for linux?

~~~
csirac2
As a heavy btrfs user backups have always been on my mind. I run a lab with a
handful of busy VMs, all using btrfs. I was frustrated that there were no
backup solutions (at the time) which leveraged btrfs, so I created snazzer [1]
(one day soon it will support ZFS).

You might scoff, but... btrfs send/receive is _insanely_ fast and painless. To
mitigate btrfs shenanigans, snapshots end up on non-btrfs filesystems too. I
wrote a tool [2] which produces PGP signatures and sha512sums of snapshots to
achieve reproducible integrity measurements regardless of FS.

Of course, in the time it took to polish up snazzer a bit for public release,
many [3] other [4] cool [5] solutions [6] have materialized [7]... :)

[1] [https://github.com/csirac2/snazzer](https://github.com/csirac2/snazzer)

[2]
[https://github.com/csirac2/snazzer/blob/master/doc/snazzer-m...](https://github.com/csirac2/snazzer/blob/master/doc/snazzer-
measure.md)

[3] [https://github.com/masc3d/btrfs-
sxbackup](https://github.com/masc3d/btrfs-sxbackup)

[4] [https://github.com/digint/btrbk](https://github.com/digint/btrbk)

[5]
[https://github.com/jimsalterjrs/sanoid/](https://github.com/jimsalterjrs/sanoid/)

[6] [https://github.com/lordsutch/btrfs-
backup](https://github.com/lordsutch/btrfs-backup)

[7] [https://github.com/jf647/btrfs-snap](https://github.com/jf647/btrfs-snap)

~~~
static_noise
Thanks! btrfs is certainly something to think about. Especially compared to
ext4 it seems to make backups much easier and less painful.

------
aidenn0
So the conclusion was that the tool that corrupts your data is the fastest?

I have a backup solution that corrupts your data, but is even faster than
Attic: tar cp > /dev/null

