
Append-only backups with restic and rclone - edward
https://ruderich.org/simon/notes/append-only-backups-with-restic-and-rclone
======
m3nu
This bugged me too last year. So I built a hosting service for Borg backups
that can have append-only access keys.

So the client machine can't change old backups. Ever. In addition you can lock
your settings down with 2FA.

[https://www.borgbase.com/](https://www.borgbase.com/)

~~~
bartman
Great service, especially with the addition of the monitoring features.

How do you store the data on your end?

~~~
m3nu
Borg stores all data as plain files in 500MB segments. Every segment has a
checksum, which you can validate using the `borg check` command.

The actual data lies on a plain vanilla RAID array.

------
dorfsmay
This is great. I find myself having to explain and argue this point way too
often:

"One issue with most backup solutions is that an attacker controlling the
local system can also wipe its old backups."

~~~
regecks
S3 also makes this kind of protection pretty easy - enable bucket versioning
and allow only PutObject.

~~~
bpye
B2 can do this too if S3 is a bit too spendy.

------
jeltz
Borg Backup also has this feature, and so far my experience with it has been
great.

~~~
kklimonda
With borg, make sure that you read and considered drawbacks of the append-only
mode:
[https://borgbackup.readthedocs.io/en/stable/usage/notes.html...](https://borgbackup.readthedocs.io/en/stable/usage/notes.html#drawbacks)
I'm curious whether restic's append-only fares better here.

~~~
Tharre
How could that possibly be improved? Append-only means no deleting or freeing
up space by definition.

~~~
indigo945
Not necessarily, because the backup server could squash backups itself, even
when the client is not allowed to do so.

------
h1d
duplicacy is another alternative. It has paid plans but free for personal CLI
usage.

Their study on various cloud storage performance helped me decide
tremendously.

[https://github.com/gilbertchen/cloud-storage-
comparison](https://github.com/gilbertchen/cloud-storage-comparison)

I make backups using multiple implementations (restic and duplicacy) to avoid
unfortunate case of an implementation corrupting encrypted / compressed data
which may become unrecoverable completely or any other bug to partially
corrupt it.

I hate it when backups don't work when I need it and it's not easy to do 100%
verification on all backups regularly.

------
jsiepkes
Worth noting that if you run your own Restic server you can run it in append
only mode.

------
hendry
Seems a bit complex. How about the remote server adding the immutable flag on
the uploaded files?

------
ioquatix
Or just use ZFS and send/recv over the network? It's super simple, guarantees
reliability, and very easy to replicate, e.g. offline backups.

~~~
cyphar
The nice thing about restic is that you don't need to have a smart server, you
can store your backups anywhere. And it doesn't require a particular
filesystem on your host. Not to mention that deduplication is very cheap due
its design (and unlike extent-based dedup, content-defined dedup also helps
dedup some file changes a lot better).

ZFS, while it is definitely a great project, doesn't provide those features.

~~~
cryptonector
You can store zfs send output as a file. You don't need a ZFS filesystem to
receive into if all you're doing is backups.

~~~
akvadrako
That's not a complete solution unless you have infinite storage. You need the
ability to store 1 snapshot per month without missing any files, but they may
only be contained in one hourly diff.

~~~
cryptonector
This is true of all incremental backup systems.

~~~
cyphar
restic stores files as a Merkle tree of blobs, with snapshots just being a
particular root of the tree. This means that all snapshots are "full" (in the
sense you don't need any others to reconstruct the filesystem) but are also
"incremental" (in the sense that data isn't duplicated if it wasn't changed).

Yes, ZFS internally stores things in a similar fashion but I believe you can't
get that from zfs-send (which is totally fine -- that's not the use-case they
were going for, and zfs-send is incredibly useful for lots of other cases).
restic is also chunked using content-defined chunking which is gives you
_much_ better deduplication than extent-based chunking (which is what ZFS, or
any in-kernel filesystem has) -- and ZFS deduplication is well-known to be
very expensive to enable while restic's deduplication is (almost) free. Of
course, restic doesn't get atomic snapshots without using a filesystem like
ZFS -- so so there's obviously benefits to using both together.

~~~
cryptonector
ZFS is a Merkle hash tree. zfs send uses this.

~~~
cyphar
Yes, but if you are storing the output of zfs send for each snapshot
(incrementally) you won't get the benefit of it using a merkle tree on the
storage side of things (obviously it's used among other neat algorithms to
figure out what the delta between snapshots is).

If you are using zfs recv on the remote server you will get basically the same
features as restic (minus content-defined deduplication, and full-repo
encryption -- ZFS has extent-based dedup and its built-in encryption is not
"full-disk" since it reveals ZFS-level metadata). And you get real atomic
snapshots which is better than what restic can give you because it's a
userspace tool (though you can always use restic with ZFS).

I'm not sure we're actually in disagreement on how ZFS works, it's a question
of whether you can get the practicalities of the benefits without having a ZFS
server which holds your backups. If you just store the out of zfs send then
it's also hard to expire old backups, and restoring would require applying all
of the saved send payloads rather than just doing one 'zfs send' from the
remote server.

~~~
cryptonector
Sure, zfs send does not actually send a section of a blockchain. It could
have, but that wouldn't have been as space-efficient.

~~~
cyphar
In order for ZFS send to be able to provide the same features as restic it
would need to output a representation of the zfs merkle tree as a flat
filesystem (but encrypted) -- which would allow a dumb server to deduplicate
the tree (and ZFS is clever enough to already know what blobs exist on the
remote side). I guess this was not done because a ZFS send stream might be
more efficient for transfer (as you said). But this means that it's main use
as a backup system requires having a ZFS server on the other end (in order to
be efficient and useful as a backup store).

Again, I'm not bashing ZFS. My whole point is that restic is a neat and
interesting project specifically because it doesn't require a clever server to
provide its features -- that doesn't mean ZFS isn't a great project (far from
it).

I use ZFS on my servers and love it, and I use restic for backups.

------
casylum
Does restic support sparse files yet?

------
nine_k
Noting the obvious: if you're backing up customer data, an append-only backup
is not GDPR-compliant.

~~~
Xeago
An append-only backup is not by definition non-compliant with GDPR. It's
important that the individual can be assured that their personal data will not
be restored back to production systems (except in certain rare instances,
e.g., the need to recover from a natural disaster or serious security breach).
In such cases, the user’s personal data may be restored from backups, but the
controller will take the necessary steps to honor the initial request and
erase the primary instance of the data again.

For example: [https://www.acronis.com/en-us/blog/posts/backups-and-gdpr-
ri...](https://www.acronis.com/en-us/blog/posts/backups-and-gdpr-right-be-
forgotten-recommendations)

