
A performance comparison of Duplicacy, restic, Attic, and duplicity - acrosync
https://github.com/gilbertchen/benchmarking
======
2bluesc
The results don't look very complete seeing as how attic was abandoned over
two years ago. The master attic branch[0] has 600 commits. The fork of attic,
borg, has over 4000 commits [1] suggesting a significant amount of work has
been done to improve it.

It seems odd for the author to compare it to something abandoned (and
thankfully reborn as borg) and ignore what has happened in two years.

Would love to see similar tests run against borg.

[0] [https://github.com/jborg/attic](https://github.com/jborg/attic)

[1] [https://github.com/borgbackup/borg](https://github.com/borgbackup/borg)

~~~
acrosync
Author here. The experiments actually ran BorgBackup 1.1.0b6 as you can see
from the Setup section. We liked to call it Attic out of the respect to the
original Attic author.

~~~
tombrossman
You realize the name Borg comes from the Attic author, Jonas Borgström, right?
No one else calls it Attic as the two are different projects.

~~~
acrosync
I noticed that, but didn't know if Borg has another meaning. I can understand
why they forked the project, but in my opinion a name that makes the origin
more obvious would have been better.

~~~
dom0
"Borg" was chosen, because it emphasizes _collaborative_ development — and
because someone is a Star Trek fan ;)

------
atonse
Is anyone using such tools as a backup for their NAS (and then using their NAS
for time machine?).

That would beat having to install something like Backblaze on every family
member's machines. Cloud backup is great but it's always better to have a
local (LAN) copy and then an off-site copy.

~~~
robotmay
I use borg (attic fork) backing up to rsync.net for my home server. All my
machines back up locally to that machine (mostly using SyncThing), then it
backs itself up every hour or so. It's not perfect but it does work really
rather well.

Borg is really nice, and rsync.net is of that variety of service that are
always my favourites: it does one thing very well.

Also they offer a discount if you use borg or attic (possibly others) as they
turn off their ZFS snapshot system and assume your software handles that.

~~~
tombrossman
How much data and what's your monthly cost? I have a 2TB storage VPS for under
$10/month. rsync.net looks very good but is possibly total overkill for my
needs. Definitely don't need >1 snapshot/day as that's what my hourly local
backup is for.

~~~
GordonS
Can I ask where you got the VPS? That's a really great price for 2TB!

~~~
tombrossman
At present I'm using a provider in Lithuania called time4vps. Overall the
service is good (assuming you are connecting from Europe) but to use their
website I have to disable my ad-blocking & privacy add-ons, which I don't have
to do on other providers' sites. Not sure why that is.

I'll probably try Delimiter once they start offering service in London, as
they also have some similar low cost + high storage plans.

------
ams6110
> duplicity has a serious flaw in its incremental model -- the user has to
> decide whether to perform a full backup or an incremental backup on each
> run. That is because while an incremental backup saves a lot of storage
> space, it is also dependent on previous backups due to the design of
> duplicity, making it impossible to delete any single backup on a long chain
> of dependent backups. So there is always a dilemma of how often to perform a
> full backup for duplicity users.

Yes and no. Duplicity has the "\--full-if-older-than" option so you can do
incrementals normally, but if your previous full is older than whatever
interval you define, it will do a full backup, without changing the command
line. So that can be e.g. in a cron job.

~~~
willvarfar
Classic source control had this problem.

The clever trick is to reencode the previous most-recent backup as a delta
from the current state, and do a full-backup of the current state, rather than
encoding each new backup as a delta from the previous state (which becomes
slower and slower to compute, the more previous states you have).

Problem solved :)

~~~
tatersolid
That's really expensive when your previous backup is on cloud storage

~~~
willvarfar
To compute a delta, you need the previous version. If this previous version is
computed from a single file - the previous snapshot - then that's actually
less data and effort than if its computed by taking an old snapshot and
replaying all the deltas upto the current time.

------
brunoqc
It's a shame Duplicacy is not free software.

~~~
acrosync
Author of Duplicacy here. To personal users, it is free software.

~~~
comice
It doesn't meet the free software foundation's definition of free software,
which I believe was the original point.

It lacks the freedom to run the program as you wish, for any purpose (freedom
0).

[https://en.wikipedia.org/wiki/The_Free_Software_Definition#T...](https://en.wikipedia.org/wiki/The_Free_Software_Definition#The_definition_and_the_Four_Freedoms)

[https://github.com/gilbertchen/duplicacy/blob/master/LICENSE...](https://github.com/gilbertchen/duplicacy/blob/master/LICENSE.md)

~~~
_wxn8
The FSF's definition of "Free Software" doesn't necessarily match the English
language's definition of "free software". Since we're speaking English, I
think it's reasonable to assume the latter meaning, like most reasonable
people who haven't encountered the FSF would.

I really wish people would capitalize things like this. "This is not Free
Software" would make the sentence unambiguous. You can't just go around
redefining the English language willy nilly and expect people to play along.

Languages do evolve naturally but this isn't natural. This is an organization
trying to influence the language to advance an agenda (though I believe it to
be a worthy agenda, it's still an agenda).

~~~
comice
"It's a shame Duplicacy is not free software."

If the commenter meant no cost it's more reasonable they'd have said "It's a
shame Duplicacy is not free". Or "It's a shame Duplicacy costs money".

Capitalising would have made it clearer yes, but I think given the language
and the context (hacker news) I was justified. But the author, @acrosync, was
also justified in assuming it meant no cost.

~~~
acrosync
Free or not free aside, my question is, does it matter to personal users
between this free-for-personal-use license and any of those more permissive
licenses like MIT, BSD, or GPL?

~~~
comice
It matters to me as a personal user because my use of duplicacy might change
at some point and suddenly I'd lose rights to use it (unless I pay). I'd lose
rights to any development contributions I might have made unless I pay.

And as a personal user, I can't use any code from Duplicacy in any other
project. I can't even, say, create a package for it and get it included in
Debian.

And aside from some of these practical issues, I'm a personal user who
supports software freedom so I don't want to use something encumbered in this
way.

And as a commercial user, any development contributions I make are no longer
my own and I have to pay to make use of them.

But the worst part of it is, your license isn't very well defined. As it
stands, you may at any point stop accepting license payments from a commercial
user and they'd lose the right to use it entirely - they'd lose access to
their backups (unless they used the software without a license).

You of course have the right to choose any license you like! I just wouldn't
use duplicacy myself under the terms of that license.

~~~
acrosync
Thanks for your feedback. The reason I don't like open-source licenses is that
I don't want for-profit companies to use my software without paying. The ideal
license would be the one that requires them to pay while being appealing to
personal users like you. I don't think these two goals are irreconcilable, but
unfortunately such a license doesn't exist yet.

~~~
comice
I did wonder if being fully free might encourage more users who might fund you
in other ways but Borg backup isn't making very much like that, so perhaps
not:
[https://www.bountysource.com/teams/borgbackup](https://www.bountysource.com/teams/borgbackup)

The AGPL license might be a step in the right direction (for your
requirements). It aims to at least ensure that if companies use the code to
provide a service to other users, they have to release their changes. You can
sell those companies a different license if they don't want to accept the AGPL
(you'd have to have a contributor agreement to assign copyright to you though,
to allow you to relicense code at your discretion like that).

Or there is the open core model (like nginx-plus), where you provide the code
under an open source license but provide some additional "enterprise" features
(like your vmware stuff) to only those that pay. I'm not a fan but it seems to
work for some.

Anyway, duplicacy sounds a great design. All the best with it!

------
crdoconnor
Performance bothers me an order of magnitude less than the potential for
obscure bugs which could lose me data.

------
dabeeeenster
Best thing about restic is not having to spend 45 minutes downloading and
manually bullshitting around with python dependencies and libraries. What Go
is good at IMO.

~~~
rkrzr
`attic` and `borg` are just one `sudo apt install attic` or `sudo apt install
borgbackup` away. No "bullshitting around with python dependencies" required.
It looks like they also have packages for other platforms:
[https://github.com/borgbackup/borg/releases](https://github.com/borgbackup/borg/releases)

~~~
dabeeeenster
I was talking mainly about duplicity which has always been like pulling teeth.

------
dom0
My two (possibly biased, much like the author's) cents.

\- No network-based tests; e.g. a typical fast internet connection (say 100/40
or 50/20 MBit/s) with a few dozen ms latency to some server or cloud service.
This is of course difficult because these tend to be bad on reproducibility.
For a network-based test, not only time is interesting, but total RX/TX as
well.

\- I'm really surprised at restic's performance. It uses _far more_ CPU than
Borg in almost all tests... and Borg is already notoriously inefficient in
it's CPU usage when looking at object throughput (restic: "fast, efficient"?).
I don't mean to bash, I'm just surprised.

\- restic's deduplication performance might hint at Rabin Fingerprints being
worse than Buzhash, but there might be other issue(s) leading to this result.

\- Besides CPU time, memory (peak) usage would be interesting.

> For instance, file hashes enable users to quickly identify which files in
> existing backups are changed. They also allow third-party tools to compare
> files on disks to those in the backups.

To be fair, Borg can calculate a variety of file hashes (MD5, SHA1, SHA2, ...)
on the fly with "borg list". There are "borg diff" (to compare two archives)
and "borg mount -o versions" as well, though the latter is generally
impractical for looking at a large number of archives.

> Again, by not computing the file hash helped improve the performance, but at
> the risk of possible undetected data corruption.

I can't deduce how the last part follows (", but..."). Care to explain?

------
beagle3
i think bup does support concurrent access; the concurrent deduplication
granularity is a backup set, so if two identical computers are backed up for
the first time at exactly the same time you will not get deduplication - but
that's inherently a dining philosopher kind of problem

Also, recent bup versions allow delete. No encryption IiRC, but you can
examine it with git tools which is a feature on its own.

------
e12e
It's a little odd to _not_ benchmark backup over the network - a backup taken
to the same physical disk as the source of the data isn't very useful - for
that use-case taking a filesystem snapshot[s] would probably be faster and
more useful. Perhaps in combination with a checksumming tool, like [c], or
with a filesystem like ZFS.

Also, it can be difficult in a lot of environments to sustain more than
100mbps write to a remote, off-site system - halving the stored data can be a
much bigger win then.

All that said, it's interesting to see that a) duplicity seems slow, and b)
very consistent in terms of speed. I wonder if there's some low-hanging fruit
for optimization there.

Personally I've had some luck using backupninja[b] in combination with
duplicity. It's one of the few Free alternatives that allow the backup-system
to encrypt "one-way" \- so that compromising the backup-system doesn't
immediately give read access to encrypted backups. It's a bit complicated to
set it up for separate encrypt-to and signing keys though :/

[s] Today I would probably recommend ZFS - but I've always wanted to give
NILFS2 a real test, especially on solid-state disks:
[http://nilfs.osdn.jp/en/](http://nilfs.osdn.jp/en/)

[c] [https://github.com/Tripwire/tripwire-open-
source](https://github.com/Tripwire/tripwire-open-source)

[http://aide.sourceforge.net/](http://aide.sourceforge.net/)

[https://github.com/integrit/integrit](https://github.com/integrit/integrit)
(Speaking of projects that might be fun/useful to redo in a safe language like
rust or go - it would appear this would be a prime example, btw. On the whole
moving integrity to the fs, as with zfs might be the better option, though).

[b]
[https://0xacab.org/riseuplabs/backupninja](https://0xacab.org/riseuplabs/backupninja)

~~~
dom0
> All that said, it's interesting to see that a) duplicity seems slow, and b)
> very consistent in terms of speed. I wonder if there's some low-hanging
> fruit for optimization there.

Duplicity is classic delta-backup. It always reads all files and calculates a
delta to a different version of the file, hence the fairly consistent
performance. Performance of deduplicating archivers is more difficult to
predict.

------
AdmiralAsshat
I would've been curious to see how BackinTime stacks up.

Duplicity/deja-dup (a GNOME frontend for duplicity) is pre-installed on most
GNOME-based DE's, which makes it convenient for end-users, but I found the
limitation of only being able to configure a single backup destination to be
too limiting.

By contrast, BiT supported multiple destinations and profiles, meaning I could
have one local, one off-site, one "Personal Data" backup, one "System" backup
to fall back if an OS update fails, etc. Its configuration options were much
more attractive.

------
amq
If you are using duplicity, use this moment to check if it works. There is a
serious bug which seems to affect systems with a lot of data:
[https://bugs.launchpad.net/duplicity/+bug/896728](https://bugs.launchpad.net/duplicity/+bug/896728)

------
luxpir
Could obnam be added? I use it and with a few tweaks it's very respectable in
terms of performance. Also has some good integrity checks.

------
whois
Has anyone run these on their computer? As they noted, they are the author of
Duplicacy.

~~~
stevekemp
I used to use attic to backup my personal Debian systems, since upgrading to
the new stable (Stretch) release it is no longer available, so I switched to
borg.

I had to juggle a few things around, but it works well. As does obnam, which I
use in a couple of other places too.

------
dpc_pw
I wonder how rdedup would compare.

