
Zbackup: open-source, encrypted, de-duplicated, compressed backups - frenkel
https://github.com/zbackup/zbackup
======
jude-
I'm a little concerned about the use of deduplication and encryption in the
same system. If done improperly, deduplication followed by encryption
potentially leaks information--a passive adversary can make some good guesses
at what's going in with the plaintext across backups, based on how big or
small the size delta is. Maybe I'm being overly paranoid, but I didn't find a
satisfactory answer in the documentation at how to plug this side-channel.
Would an experienced user or the author be willing to comment?

~~~
mappu
The active version of this attack is totally worrying (deduplication between
multiple user accounts, alice and eve).

But the passive version above requires a privileged network position as well
as an unrealistically good idea about what was in the original backup. I agree
it's a weakness, but i'm not sure it's a cause for concern

You might equally say something like "the amount of time taken to backup is a
timing side-channel into the size of the deduplication cache", it's true but i
don't think it immediately lead to any practical attacks.

Can you elaborate on a scenario?

------
Klasiaster
ZPAQ has much to give, a good recommendation. But I also found obnam awesome
for de-duplicated backups as it allows cool snapshots which can be mounted via
FUSE to be browsed in Nautilus or extracted via commandline tools.

Package: obnam Version: 1.8-1 Maintainer: Lars Wirzenius <liw@liw.fi> Depends:
libc6 (>= 2.6), python (>= 2.7), python (<< 2.8), python-larch (>=
1.20131130~), python-ttystatus (>= 0.23~), python-paramiko, python-tracing (>=
0.8~), python-cliapp (>= 1.20130808~), python-fuse Description-en: online and
disk-based backup application Obnam makes backups. Backups can be stored on
local hard disks, or online via the SSH SFTP protocol. The backup server, if
used, does not require any special software, on top of SSH. . * Snapshot
backups. Every generation looks like a complete snapshot, so you don't need to
care about full versus incremental backups, or rotate real or virtual tapes. *
Data de-duplication, across files, and backup generations. If the backup
repository already contains a particular chunk of data, it will be re-used,
even if it was in another file in an older backup generation. This way, you
don't need to worry about moving around large files, or modifying them. *
Encrypted backups, using GnuPG. * Push or pull operation, depending on what
you need. You can run Obnam on the client, and push backups to the server, or
on the server, and pull from the client over SFTP.

Homepage: [http://liw.fi/obnam/](http://liw.fi/obnam/)

------
beagle3
Ive recently found
[http://mattmahoney.net/dc/zpaq.html](http://mattmahoney.net/dc/zpaq.html)
which (on paper) is as good as zbackup/bup/rdiff-backup. Haven't had time to
try it - anyone her has experience?

~~~
comboy
I use it for some backups (not automatic ones, just when I have some old stuff
and I want to put it "in the corner"). Very good compression ratio (beats bz2
and lzma in my cases - big parts are db dumps), but when achieving it, it's
also very slow.

Be careful when feeding whole directories to it since it doesn't care about
files attributes.

------
ncza
This looks very similar to attic, anyone able and willing to compare them for
pros and cons?

~~~
jewel
It so happens that I found zbackup in the list of apt packages just yesterday.
It doesn't have network support like attic, so you have to transport the
entire repository of data to the backup server in order to make a backup.

In practice that's impractical for anything big. You can use rsync to do the
transfer to a local clone of the data on the backup server but then the whole
thing still needs to be ingested each time.

It has a rolling checksum splitter just like attic, which is unbelievably
effective way to split files into chunks for deduplication. It'll work really
well for database dumps, which is something that fixed-size chunking fails at
miserably.

A big advantage over bup is that you can remove old backups.

Other backup software worth looking at is rdiff-backup, duplicity, burp, and
obnam, all of which are in apt.

~~~
leni536
You can easily use netcat to pipe tar over the network directly to zbackup. If
you want encryption over the network it's even easier:

tar -c stuff/ | ssh user@example.com zbackup backup
/my/backup/repo/backups/backup-`date '+%Y-%m-%d'`

~~~
jewel
I should have been more clear. The problem is that zbackup won't work for
large data sets. For example, I used to use rdiff-backup for a samba
filesystem at work that had a few terabytes of files on it. The problem was
that whenever someone would rename a directory all of the data in its
subdirectories would have to be backed up again.

Backup software that deduplicates solves this problem wonderfully, since it
doesn't make a difference if the files have been moved or not.

The problem with your approach is that I can't send several terabytes of data
offsite every night. I could use the rsync trick I mentioned, but now I've got
to store two copies of the data on the backup server.

~~~
Am1GO
Wrong.

You could keep in sync only "index" and "backups" directories on your backup
client with server, you don't have to keep "bundles" everywhere.

------
Rapzid
ZFS has de-duplication features however the amount of RAM required for the
lookup tables can become massive and I would generally not recommend using it
for large-ish backup systems. Does this get around the massive lookup tables
somehow?

Edit: The README addresses this under scalability. Apparently the overhead is
much smaller than for ZFS.

~~~
Dylan16807
And even if you give ZFS enough memory, its access pattern becomes so
randomized that it's impossible to get decent throughput on hard drives.

Hopefully this system does not suffer from that.

~~~
comboy
> And even if you give ZFS enough memory, its access pattern becomes so
> randomized that it's impossible to get decent throughput on hard drives.

Any idea if the slowdown also happens for SSDs (with enough memory)?

~~~
Dylan16807
I've never really tried it on SSD, so I can't say. I just know that I couldn't
even get small data sets (sub 100 GB) to perform reasonably with dedup.

I do know that L2ARC on SSD didn't help, and ZIL never got used.

------
Heliosmaster
One small caveat: I don't think that "production ready", for a backup
software, can be interpreted as "i'm not touching anything existing, so no
harm done by using me". If it's a backup system, people will rely on it. So
it's either production ready or it's not.

------
akkartik
I went googling for a survey of backup tools and ended up at.. Hacker News:
[https://news.ycombinator.com/item?id=8621372](https://news.ycombinator.com/item?id=8621372)

------
reardencode
Hmm, [https://www.tarsnap.com/](https://www.tarsnap.com/) does this...

------
je42
MMh. Test suite seems to have a lot of gaps.

~~~
je42
-3 minus votes? wow. Never expected that.

...a backup program without tests is asking to loose your data.

