
BorgBackup: Deduplicating Archiver - colinprince
https://www.borgbackup.org/
======
rsync
We[1] built borg into our environment[2] as soon as it was stable, release
software. In the years since, it has (ironically) supplanted use of rsync as
the de facto standard that our users back up to us with.

As one of our users have said[2], borg is "the holy grail" of backups as it
does everything rsync always did, and produces remotely encrypted backups that
the provider has zero insight into.

It also does not have the inefficiencies that the older, duplicity software
has.

If you are willing to go without (borg specific) technical support _and_ do
your retention with borg instead of our zfs snapshots, there is a special,
discounted rate available.[4]

[1] rsync.net

[2]
[https://news.ycombinator.com/item?id=17408624](https://news.ycombinator.com/item?id=17408624)

[3] [https://www.stavros.io/posts/holy-grail-
backups/](https://www.stavros.io/posts/holy-grail-backups/)

[4]
[https://www.rsync.net/products/borg.html](https://www.rsync.net/products/borg.html)

~~~
blakesterz

       inefficiencies that the older, duplicity software has.
    

As much as I love duplicity, it does sort of suck when you get into larger
numbers of files and gigabytes of data. It's so darn slow sometimes it's
almost unusable. Good to know borg is better, I've been meaning to check that
out!

~~~
JackMcMack
It's incredibly slow (backing up 1.5TB took more than a day), but the built-in
support for PAR2 makes it worth it (for me).

Sadly, it seems like borg has no built-in support for any kind of redundancy
[1]

[1]
[https://github.com/borgbackup/borg/issues/225](https://github.com/borgbackup/borg/issues/225)

------
frio
There's so many backup systems out there these days, but none that are _quite_
perfect. The closest I've found is `rdedup`[1], which may no longer be
maintained. What I'd like in a backup solution is:

\- compressed

\- deduplicated

\- encrypted using asymmetric encryption, so that the encrypting machine
doesn't need to know a password

\- durable (using par2 or directly supporting repository repair from an
independent backup)/signed

\- composed of standard open source tools so that if the backup software goes
away, everything can still be retrieved

\- supports differing upstream data hosting providers

\- open source

There's lots of things that fill most of those requirements, but none that
tick all the boxes.

\- rdiff-backup does all of the above _except_ deduplication; it's full +
incremental which requires a management strategy

\- restic doesn't compress, doesn't use asymmetric encryption and doesn't use
par2 or similar tools for durability

\- borg doesn't use asymmetric encryption and doesn't use par2/durability
stuff; it's also pretty slow

I know people call borg the "holy grail", but I think we're still a wee way
off that emerging, personally -- even though I do really like (and use!) borg.

[1]: [https://github.com/dpc/rdedup](https://github.com/dpc/rdedup)

~~~
cookiecaper
Yeah, I've been searching for a great general archiver for a long time. I
tried borgbackup a couple of years ago and immediately hit a handful of issues
that put me off, though I can't really recall the specific technical details
clearly (I'm vaguely recollecting something about incompatible block sizes).

Whatever the details were, I came away with the distinct impression that the
borgbackup developers didn't really respect the gravity of archived records
and long-term storage. While that's probably not a totally fair conclusion
from a one-shot use, in my defense, I'll offer up the latest version of their
changelog [0], where:

* the first thing listed is an apparently serious data corruption bug that lived through several stable releases

* the second thing listed is an apparently-very-serious security vulnerability ("a flaw in the cryptographic authentication scheme used in Borg allowed an attacker to spoof the manifest ...")

* the third thing listed is another data corruption bug titled "Pre-1.0.9 data loss"

* the fourth thing listed is _another_ data corruption bug titled "Pre-1.0.4 potential repo corruption"

Note these are all post-1.0 versions. To be frank, I dare not scroll further.

Users are depending on this software to safeguard important files with the
assumption that it will be able to reproduce them bit-for-bit-intact some
years down the road. Any long-term storage software _requires_ developers with
a fanatical devotion to compatibility, longevity, and integrity if it's ever
going to be more than a toy.

I hate to pile on open-source devs who're trying their hand at making software
that a lot of people appreciate, but overall, it's hard for me to say I regret
the choice to pass it up and stick to combinations of ZFS, rsync, and the
trusty old tarball.

[0]
[https://github.com/borgbackup/borg/commit/75dcf9356334188276...](https://github.com/borgbackup/borg/commit/75dcf9356334188276d095d818e22cb4b437a456)

~~~
viraptor
I still don't know how to really judge software from changelogs, but I think
it's really hard to do and this post is not a great example either.

This is a changelog backup software which does data storage and encryption
mainly. If it ever has bugs worth talking about they're going to be in... data
storage and encryption. If we look at rsync for a change which is much older,
simpler and mature:
[https://download.samba.org/pub/rsync/src/rsync-3.1.1-NEWS](https://download.samba.org/pub/rsync/src/rsync-3.1.1-NEWS)

\- fixing .. traversal in V3 (how did that survive so long?)

\- issues with copying attributes

(There won't be much data corruption since rsync does 1:1 copies)

Time-to-fix could give us a better idea maybe? Either way, software doing X
has bugs in X is "normal".

~~~
cookiecaper
I mean, the issue isn't so much that _bugs exist_ \-- like you say, that's par
for the course, even among the best developers. It's about a large incidence
of severe eat-your-data bugs in such a short span of time (roughly a year) in
something that's quote-unquote stable and whose entire purpose is long-term
data storage and retrieval.

It's even worse because unlike rsync, which synchronizes filetrees from point
A to B and can be immediately confirmed to have either worked correctly or
not, borg uses a bespoke storage format that's not easily verified or operated
upon by standard utilities. You just have to trust it to pull the correct data
out when you need it. That's a heavy burden to put on a tool, and the borg
devs seem to be struggling under its weight.

Consider the standards expected of filesystem or database maintainers. borg
repos are not really that different -- they're a big, opaque chunks of bytes
that you expect to be able to produce specific data on demand with perfect
reliability.

These types of systems don't get marked stable when they're still in
primordial, eat-your-data development phases, and on the small handful of
unfortunate occasions when data loss bugs sneak in, they're a) usually limited
to some bizarre corner case; and b) taken _extremely_ seriously, almost to the
point of solemnity, and often result in major overhauls to a project's
validation and QA routine.

If any stable filesystem or database had 4 widely-applicable database
corruption bugs within a year, it'd be lights out for that project. All trust
lost, reputation irreparably ruined, angry letters to a variety of mailing
lists, permanently-increased scrutiny on any new projects or maintainers among
the same class of projects, etc.

All I'm saying is that based on my admittedly-murky prior experience, the
project's observed track record, and the high standard of care that must be
met to qualify a project for storing authoritative copies of data, I'm
_personally_ not comfortable trusting borg with anything important any time
soon. No one is obliged to share that evaluation, of course.

------
m3nu
I offer dedicated Borg hosting at BorgBase.com[1] from $5/TB. As opposed to
normal SFTP-based backup services, every backup repo is fully isolated. This
allowed to add features, like append-only mode and monitoring for stales
backups.

We have also developed a Qt-based desktop client[2] that runs in the system
tray and makes it easier to browse archives or do restores.

For headless deployments, I highly recommend Dan's Borgmatic[3]. You could
deploy it all together with our Ansible role[4].

1: [https://www.borgbase.com](https://www.borgbase.com)

2: [https://vorta.borgbase.com](https://vorta.borgbase.com)

3: [https://torsion.org/borgmatic/](https://torsion.org/borgmatic/)

4: [https://github.com/borgbase/ansible-role-
borgbackup](https://github.com/borgbase/ansible-role-borgbackup)

------
beagle3
I switched from bup backup to borg backup, because it generally works better
for my use cases; however, there are two things bup does do better:

1\. remote backup through ssh, especially de-duplicating between different
clients, especially concurrent client backups.

borg needs a compatible version installed on the other side (and compatibility
has been broken between versions in the past); bup uses bare-bones ssh+sftp,
so the other side can basically be anything.

borg will have a lot of download to client from server if multiple clients
back up to the same repository (essentially every time in most common use
cases); bup will have a minimal download.

borg maintains a repo lock, so multiple clients backing up to the same repo
will be serialized; bup does not, so it can be concurrent.

2\. Storage format

borg's format is it's own format; bup's underlying data format is basically a
git repo (which you can treat as such; you may need to manually apply "cat" to
rebuild files, bit "git" and "cat" are all you need).

I have heard good things about restic, but did not have a chance to evaluate
it myself.

~~~
frio
restic doesn't perform compression. Depending on your use case (I use it for
photography backups, which don't compress well), that might be OK -- but it's
something to be aware of.

------
dmd
In my testing (early 2019), NONE of restic, rclone, borg, duplicity, or
tarsnap was able to handle anything other than hobbyist workloads.

restic fell over _hard_ at around 100 terabytes; the others at around 500.

I'm backing up a little over 3 petabytes. I use Bacula. It's _awful_ , but I
haven't found anything else that can deal with that kind of volume.

~~~
JeremyNT
When I was backing up several 100 tb, I tried every open source option at the
time and found them all lacking.

What ultimately worked? Plain old rsync over ssh, to a zfs pool with snapshots
and compression.

As far as I could figure, the only notable downside of this is that the
storage device must be trusted, since it has access to all of the data, and
that you effectively needed root permissions on the storage you're copying to
for filesystem permissions which makes a multi tenant backup server cumbersome
(you could chroot or use containers or something but these solutions can
become fiddly, e.g. running multiple instances of ssh on nonstandard ports to
enable multitenancy).

~~~
dmd
I've been very tempted to dump bacula and do that, especially since my main
disk-based storage is an Oracle ZFS appliance.

But I also do backups to tape (LTO-8, in a Storagetek library), and I do like
that Bacula handles that for me.

------
bloopernova
Borg backup is amazing. I've been using it for years now, backing up a few
terabytes of data from a few dozen VMs.

The dedupe is downright magical. We've been able to remain super frugal on the
storage allocation for backups solely because of how wonderful its dedupe is.

Restores are also really nice and easy since they are just a FUSE mount.

If you have a Linux host or hosts, I can wholeheartedly recommend borg. It's
elegant, robust, and fast. And open source.

~~~
BlackLotus89
I benchmarked borg and restics deduplication on a few datasets and restic was
the winner (1-2 years ago). Did you do any comparisons with other
deduplicating backup solutions? (Benchmarked the efficiency not the speed.
Test data set was a few 100GB)

~~~
bloopernova
We were previously using Bacula, which was way too heavyweight for what we
wanted. I found Borg after looking into Attic, and wasn't aware of Restic at
that point.

Compared to the Bacula and BackupExec, Borg was lightning fast and its disk
usage was very frugal.

------
urgeblumbling
Use, love, and support[1] borg.

[1]
[https://liberapay.com/borgbackup/donate](https://liberapay.com/borgbackup/donate)

------
albertzeyer
I'm still not totally happy with most of the backup solutions, or maybe I want
too many features all at once (sth like Perkeep). Although, otherwise, if the
tools are simple, I have to use multiple tools to cover all the features, and
there will be lots of overlap, which is maybe not too much of a problem
actually.

I collected a list here: [https://github.com/albertz/wiki/blob/master/backup-
software....](https://github.com/albertz/wiki/blob/master/backup-software.md)

------
john37386
I'm using borg and I honestly like it. Not ready to only rely on it. I try to
use the 3-2-1 principle of backup strategy and borg is really a space and
bandwidth saver! My collection grew from few Gb to some Tb. It's already more
than a year old and I was wondering whether it's better to start a fresh new
repo every year or to let grow the original one until it reaches a certain
size? Any recommendations from ppl that used it for +2 years?

------
senotrusov
I use BorgBackup to make a local backup on linux and mac machines and rclone
to upload that backup to the cloud.

It took me some time to figure it all out. I wrote a script to not manually
repeat configuration steps next time. Although it's quite opinionated it still
grew to 600 lines.

Feel free to check it out, maybe it can help someone
[https://github.com/senotrusov/backup-
script](https://github.com/senotrusov/backup-script)

------
cmiles74
I've been using Borg for a couple of years now (and before that, Attic[0]) and
I have been very happy with it. It's seen me through three laptops and it has
been reliable and easy-to-use.

[0]: [https://attic-backup.org/](https://attic-backup.org/)

------
hikarudo
BorgBackup is great.

One limitation that isn't mentioned often is that when doing anything with
your borg repo, RAM use increases with the number of files you have backed up.
In my case I had 15 million files, and mounting the repo took quite a bit of
time (minutes) and used 11GB of RAM.

Restic also has the same issue.

------
rezgi
I use borg extensively as well. One thing I haven't found a good solution for
yet is monitoring.

How do you all keep track of whether your backups succeeded? I'd like to
receive an email if a scheduled backup didn't run. The only thing I have for
now is rsync's feature where they warn you if your data hasn't changed by X kb
in the last Y hours/days but I find this lacking because multiple machines
write to my rsync account. It'll only warn me if none made any change but I'd
never know if only a few failed. Same thing if there is nothing new to backup,
I get an email from rsync.net but I don't know if every backup job failed or
there is just no changes.

------
patchtopic
How does this compare against S3QL? Does it do multithreaded compression when
backing up?

So far S3QL is the least worst of all these
deduplicating/compressing/encrypting backup solutions I have tried, but I
havent tried borgbackup. Despite it's stated focus on cloud object storage,
S3QL also works great on NFS and local filesystems as a target as well.. and
sshfs..

------
cmiles74
I see that BorgBackup is available in Chocolatey[0] and it looks like the
current version. It's not clear to me if this is an official port or not, but
I am interested. :-)

[0]:
[https://chocolatey.org/packages/borgbackup#testingResults](https://chocolatey.org/packages/borgbackup#testingResults)

------
dano
Has anyone done a pricing comparison between borgbase, rsync.net, AWS S3, and
Wasabi for offsite borgbackup storage?

~~~
rsync
We try to stay current and competitive with pricing - currently at 1.5
cents/GB/month for the discounted, "no support"[1] "borg accounts":

[https://www.rsync.net/products/borg.html](https://www.rsync.net/products/borg.html)

What is s3 these days ? 2.x cents ? Plus traffic ? We don't charge for
traffic/usage/bandiwdth in any way ...

[1] You _do_ get technical support, just not specific support for setting up
your borg backups, which can be fairly complicated ...

~~~
frio
I (sadly) moved from you guys to Backblaze, as they're down at $0.005 (ie. 0.5
cents) per GB. They _will_ charge me (1 cent/GB) if I have to restore the
backup, but this is my extreme off-site everything-blew-up-including-the-
backups-closer-to-me backup, so, hopefully that won't eventuate.

~~~
rsync
We will always be more expensive than B2, Wasabi, et. al.

You're not ever going to get immediate, personal technical support from a UNIX
engineer at those services like you do at rsync.net.

~~~
frio
Fair enough. I still use and like your service for a second off-site backup of
~40GB of very very important stuff, but was unable to ignore the cost savings
for bulk data.

------
StreamBright
How does it compare against tarsnap?

[https://www.tarsnap.com](https://www.tarsnap.com)

~~~
croon
Is that $1/4GB/month? Plus bandwidth at the same cost on top of that? That
seems wildly expensive.

------
gouggoug
I had never heard of borg backup, seems like a great backup tool.

Now, I just installed it on my mac and running `borg --help` takes about 5
seconds before outputting the help.

Same for any other `borg ...` command.

I'm not quite sure why, but that's the only command I've noticed to be running
slow on my system.

I'm running borg 1.1.10.

~~~
tenebrisalietum
is it based on python?

~~~
hyperion2010
[https://github.com/borgbackup/borg](https://github.com/borgbackup/borg)

62% C 35% Python

~~~
blattimwind
Most of the C code is vendored third party libraries, there are only 2000-3000
lines of C that are actually part of the project.

------
ocdtrekkie
Does it care about the underlying file systems? I have one Linux machine I
need to backup a directory on, and dedupe would be a huge help on this one in
particular, but I'm backing up to an SMB share that's mounted to a folder
location.

~~~
Filligree
Not generally, but I wonder if the locking will still work in your case.

------
bigdubs
Curious if this name is going to draw the ire of Paramount's copyright
lawyers.

~~~
flurdy
Borg? Hardly, there is already a Google product with the same name.

But more importantly, it is the name for a castle in several languages. So
unless Paramamount is planning to sue a lot of very old places in Norway and
other countries...

~~~
capableweb
> there is already a Google product with the same name

Borg at Google is not a product per se, it's a internal tool/service. Pretty
sure you can name internal tools whatever you want.

------
clumsysmurf
I wonder if I'm the only one with this phobia: i see a tool written in a
dynamically typed language and automatically trust it less. Especially
something that is responsible for all my data. I wonder if this is a good
candidate project to be rewritten in rust.

~~~
blattimwind
It actually would be for several reasons, which I could outline if people are
interested (but I suspect they are not, so I will not go to the effort per-
emptively). Also yes, there have been a number of bugs specifically due to
both dynamic typing of the code and the dynamic data structures (msgpack,
basically JSON) used.

~~~
jmiserez
I used borg for a while but the bugs you mentioned and some of the discussions
in the Github issues gave me pause. For my taste, there is still too much
development going on for me to rely on it as my main backup solution.

