Hacker News new | past | comments | ask | show | jobs | submit login
Restic – Backups Done Right (restic.net)
593 points by tambourine_man on Oct 31, 2019 | hide | past | favorite | 177 comments



Also check out Borg: https://www.borgbackup.org

And some resources on how they're different:

- https://github.com/restic/restic/issues/1875

- https://stickleback.dk/borg-or-restic/

- https://sysadministrivia.com/episodes/S4E5

The general concensus seems to be that restic is borg with more whistles (backing up to various places), but borg is the more trusted tool with the longer history (just use SSH and be done with it). I personally recently used borg for a migration between computers and it worked great for me.


There is also Snebu (www.snebu.com, Github repo at github.com/derekp7/snebu). I wrote this about 8 years ago for my personal needs, because rsync / snapshot backups were running out of steam for me. What's different from Restic or Borg/Attic, is that the main "smarts" lives on the backup server (the Snebu binary), whereas the client side just uses find / tar to serialize the data. Backend side compresses, deduplicates, snapshots, handles a number of systems easily, and is fairly fast due to using SQLite for the metadata catalog.

A couple weeks ago I finished a client-side encryption module, got another weekend's worth of work to integrate encryption support into the back end. The client-side encryption module is already in the Git repository, but there is no documentation for it yet. My next project is a Web based GUI (maybe an Electron based client-side GUI too, depending on what makes sense).

Oh, and the most recent release supports granular user permissions, so you can grant a host (via an SSH account on the backup server) permissions to create backups, but not delete them. Or have an administrative user that can expire old backups but not read backups, for example. Can help with thwarting crypto viruses that try to delete backups.


> […] the main "smarts" lives on the backup server (the Snebu binary), whereas the client side just uses find / tar to serialize the data. Backend side compresses, deduplicates, snapshots […]

So am I right to infer that there is a ridiculous amount of overhead in the data transfer, as opposed to some of the other software mentioned? Would you use this over the internet? Or an untrusted network connection?


The communications is handled via ssh, so yes I would use it over the internet as the transfers are streaming. A full manifest (list of file names and metadata [file owner, mod time, size, etc]) is sent over the wire, and a return list of files that are needed to complete the snapshot (essentially changed / new files) is returned to the client. The client then tar's this smaller file list to stream to the remote server. Tar, in this case, is used to serialize the data, which gets extracted on the remote end (compressed and stored as sha1-named files [will be switching to sha-256 in the next major release]). Metadata gets put in the SQLite DB.

For data encryption (the main part is completed, but need to expand the backend to recognize encrypted data, should be finished shortly) -- the output of "tar" is piped through "tarcrypt". What tarcrypt does is it takes a standard tar file input, compresses/encrypts the file data, and outputs a tar file with some extended headers that contain info about the compression/encryption, including the RSA public key fingerprint used to encrypt the data, an HMAC, and the encrypted (passphrase-protected) private key (this can be made optional). The idea is that the encryption itself is AES-256-GCM, with a random key, which is encrypted with RSA public key. That way you can have encrypted backups without needing to have a password sitting in plain text on the client. And the RSA private key is passphrase encrypted, and sent along with the tar header to the server. On restore, you will be prompted (client-side) for the passphrase. This way you can restore a client even if the keyfile is destroyed.

I plan to make the encrypted key storage optional, but that would require that you manage the key file backup separately, and doesn't get you much more security (assuming you have an adequately strong passphrase).

Server requirements are a server with ssh access, and the snebu binary installed (optionally suid to a non-privileged backup user account, so that granular permissions can be employed for other accounts). And since Snebu is written in C, with only liblzo2, libcrypt, and sqlite2 as dependencies, it is easy to get it to work with a wide variety of systems (and the client side only requires a modern enough version of GNU "find" and "tar", unless encryption is used, which would require "tarcrypt" also -- modern in this case means withing the last 10 years, the "find" command needs to support -printf with the appropriate parameters).


With the exception of being able to upload to different targets (which can be piggybacked onto borg with something like rclone), borg is definitely the one with more whistles, not the other way around. Restic is lagging behind borg in most features, they always come up in borg first and then after a year restic maybe implements them. Important features currently missing in restic which exist in borg: compression, append-only backups, extensive include/exclude patterns, flagging any directory for exclusion directory exclude with a tag file inside it, setting "--chunker-params" - thus being able to adjust deduplication settings (like size of chunks, how hard it tries to look for duplicates etc) to the nature of the files and the limitations of RAM and CPU during backup. (and other minor ones).

Other differences which some might consider good and others bad include:

* Borg is in Python, Restic is in Go? Python is a more widely known language, the chance that more developers will be able to maintain it is higher, the chance for new features is higher, the chance that bugs will be found is higher. Both still provide a static binary download so deployment is very easy for both.

* Borg allows setting names for archives (which can still be templated with things like {now} or {username}), while Restic automatically names them with random ids. Borg's approach seems more user-friendly.

* Borg allows more choice in what encryption is used, including authenticated/non-authenticated, and including using no encryption at all (might be useful in some cases). (https://borgbackup.readthedocs.io/en/stable/usage/init.html#...)

* Borg allows specifying "max_segment_size = xxxx" on a repository, thus allowing to somewhat adjust the size of archive ("segment") files while they are stored on the server (default being 500 MB). Restic uses its own sizing and (if I remember correctly) more often than not it produces very small files (a couple of MBs) which crowd the filesystem. In general, archive file sizes can affect the filesystem and the uploading them to the cloud, where many cloud providers API operates on individual files, not on parts of them.


append-only backups: It supports this with the REST backend.

flagging any directory for exclusion directory exclude with a tag file inside it: It has this. It's called --exclude-caches and will exclude any directory with a "CACHEDIR.TAG" file in with the correct content.


Ok, that's certainly a good thing that it does support it now. Borg also allows to specify the name of that file, it does not restrict it to being "CACHEDIR.TAG". Giving an option to support both "CACHEDIR.TAG" and any other filename is a good thing. (One might want to exclude other directories that way, unrelated to cache directories.)


Actually, "--exclude-caches" in restic is just an alias equivalent to the following [1]:

--exclude-if-present "CACHEDIR.TAG:Signature: 8a477f597d28d172789f06886806bc55"

You can specify the tag filename, and optionally append ":initial content" (the tag file must start with this content to avoid false tagfile matches).

[1] https://restic.readthedocs.io/en/stable/040_backup.html#incl...


Like I wrote in the original comment, borg's additional functionality is that it supports both CACHEDIR.TAG and any other filename if that is what user wants.


As I wrote in the previous comment, you can do that in restic with the "--exclude-if-present" option where you can specify whatever filename you want as well as optionally some content it must start with to avoid false positives.


Now I finally get it, thank you :). So Borg and Restic are basically on par considering that feature after all!


CACHEDIR.TAG seems to be based on a spec that multiple pieces of software are starting to use. Give it a quick google.

I can't think of any reason why you would need to customise the name of the file, but I'm sure there are use cases out there.


This is also a problem.

I might like to mark certain things as not backed up because they contain sensitive or otherwise unrelated information. But I might not want other tools treat them as cache data, that is, transient and safe to delete.


Then you can just add `--exclude filename` to your backup job.


> Borg is in Python, Restic is in Go? Python is a more widely known language

One doesn't need an interpreter + supporting libraries to run a backup program. With a Go program one just drops a statically compiled binary in $PATH and it just works.

Anyway, I'm still using duplicity. It backups on rsync, ftp, you name it. Written in Python as well, unfortunately. Don't care abot S3 etc as I don't like loss of control over my backups.


To repeat my original post, both tools provide a static binary download, making it easy to deploy. Both "just work". It's a single file that you chmod and run.

Duplicity, as you may know is not a deduplicating backup solution, it is an incremental backup solution, suffering from all the flaws and limitations thereof. If you happened to not realize the difference, I highly recommend this short but informative blog post from Backblaze: https://www.backblaze.com/blog/backing-linux-backblaze-b2-du...


Static binary as the whole interpreter and modules bundled in a fat binary? No thanks.

I know the difference, but this way we have 4G volumes we could eventually burn to DVDs if we'd want to. No backup repositories, cloud storage solutions or new and obscure network protocols involved. Deduplication is also uninteresting for us because the largest part of our backup are SQL dumps. This is why we make full backups every time.


You don't have 14 MBs to spare for a static binary? What is this zealotry against interpreter languages? You know that your Go static binaries are basically the same thing - one binary packaged with all the bundled modules into a fat file? Where is the logic here?

Also why would deduplicating be uninteresting when the backups are SQL dumps? That is EXACTLY the scenario in which deduplication would shine, because it would take each dump and find parts of it that are the same as the latest backup, finding the individual small chunks that have changed in those dumps, and only backing them up. Do you actually know how deduplication works? You might want to look that up more properly.


No zealtory. I'd rather have the interpreter installed from the distribution and drop the fatpacked program with bundled libraries and everything in a bin directory. If it needs installing on Windows then the fatcpacked binary with bundled interpreter is also fine.


> With a Go program one just drops a statically compiled binary in $PATH and it just works.

Until one of the statically compiled dependencies needs a security fix.

Then people realize why shared objects where invented.


That's not why they were invented, and it's not even a good reason to use them.


I would also say that Borg is more feature-complete and mature, not the other way round.

If you don't want to run your own server, I offer a hosted Borg Backup service: https://www.borgbase.com

This would give you some Borg-specific features, you can't easily get with your own VPS, like monitoring for outdated backups and restricting certain keys to append-only mode.


Borg's encryption is questionable at best. Performance wise it's not particularly good and probably will never get better because of the large, somewhat complex Python codebase.


What's questionable about it? It seems that they use modern AEAD ciphers in a reasonable way.

I'm more concerned about the repository format and config file (i.e. attack surface, since the repo is potentially untrusted).

Performance is actually better than Restic, and performance-critical parts of Borg are written in C or use C libraries.


> What's questionable about it? It seems that they use modern AEAD ciphers in a reasonable way.

No, using one key per repository and a persistent message counter is not a reasonable design.

https://borgbackup.readthedocs.io/en/stable/internals/securi...


For what threat model does this matter?


"When the above attack model is extended to include multiple clients independently updating the same repository, then Borg fails to provide confidentiality (i.e. guarantees 3) and 4) do not apply any more)."

Edit: I've posted this a bunch of times here, pretty much every time it caught my eye when someone said this tool has good crypto, and by now I'm used to people just downvoting it and saying it doesn't matter because obviously no one ever would use it like that and the design is fine etc. (isn't the point of deduplication to save disk space?)


From OPs link:

> If we perform similar backups to the same remote destination over SSH (Borg) and SFTP (Restic), the initial backups take roughly the same time, but the subsequent incremental backups take something like 10x longer with Restic.

Also I'm wary of encrypting personal backups. I think the chances of me forgetting a password that I used literally once are quite high, and what's the point of a backup if you can't actually access it?


I'm worried about that too, but you can deal with it by making the system that automates your backups also test your memory of the password (while still creating a non-interactive backup even if you ignore the prompt). Alternatively, reuse a password that you regularly use offline, like a disk encryption or SSH passphrase.


Or use a password manager..


I would also like to know more about why you think Borg's encryption is questionable. I'm not familiar with Borg's internals, but there's not much one could do wrong here, as it should just be simple symmetric encryption. Does it not use a strong cipher? Is the cipher used incorrectly? Is the key derivation method weak?


restic is simpler than borg, because it doesn't require a special borg server (it just writes out to files - or block storage).


This is somewhat misleading: borg absolutely does not require its server software to be installed on the target server. It supports (and works well) with a simple sshfs mount.


This is important -- it means that restic can trivially be used with e.g. S3.


It also means that the sending client has to all the work. restic is enormously resource hungry compared to borg. It can be hit by the OOM killer on small VPSs.


I've loved using restic sofar. I've got about 3TB of stuff backed up to s3 using the restic cli. It is simple and just works.

If anyone though knows how I can more easily restore a file that would be great. I have to supply the restoration snapshot ID right now, and I'd rather just do 'latest' and have it find the newest version of the file in all of the snapshots and restore it. Is this possible ?

Something like:

   restic restore -i '/path/to/file.txt' -i . latest
instead of:

   restic restore -i '/path/to/file.txt' -i . search-file-snapshot-list-for-big-sha-here


you could always mount the entire backup, using

    restic mount backup/
and pick the file(s) you are looking for from the `backup/snapshots/latest/` directory.


I use borg on a small VPS that hosts my side project to get free backups.

Borg runs doing incremental backups to a local directory. I sync the borg backup folder to a free BackBlaze B2 bucket. The whole thing comes in around 5GB of backup including database backup and all the hosted files and configuration files.


I started using Restic recently. It's good and I'm going to continue using it. That said, there are a couple of bad problems with it:

Firstly, if you want to prune old backups, e.g keep the last N1 hourly backups, and the last N2 weekly backups, etc, then it has that ability, however whilst it's doing it, the client has to download and upload a tonne of data in order to repackage the backup files that contain some data that needs removing and other data which doesn't.

Secondly, I've set up an "append only" system, where my various hosts can append to their own backups, but not overwrite or delete them. I wanted the backup server to be unable to read the backups (easy enough, don't supply the encryption keys to the backup server), however at the same time I wanted the backup server to be able to automatically prune old backups. It can not do that without the key. I don't want to give it the keys to the backups as then a compromised backup server means all of my hosts data are suddenly compromised.


How would you expect the backup server to be able to prune old encrypted backups if it can't decrypt them? How should it know what to delete?


ZFS does that with snapshots already. (ZFS is not encrypted, but the principle is the same)

I believe immutable trees would be the correct algorithm here.

The algorithm relies on a tree structure, where the leaves are the files. Every time a new snapshot is written, every leaf node changed writes a new path to the root node, while leaving the rest of the tree intact.

When a snapshot needs to be deleted, you just remove all the old root nodes and have a garbage-collection like-process delete all the old files.

EDIT: Look at example here: https://en.wikipedia.org/wiki/Persistent_data_structure#Tree... . When xs is deleted, nodes a,c,f can also be deleted.

EDIT2: I feel like I can never make posts like these while keeping it short and sweet, and adding enough details. ZFS can be encrypted, but accessing any files is right at its fingertips, not over a slow network connection. And there are caveats about how to encrypt the metadata, but you do not need to decrypt the whole backup to figure out what to do delete.


AFAIK that is exactly how restic works. You use `restic forget` to remove the roots (snapshots), then `restic prune` garbage collects unreachable blobs.

ETA: However, nearly all data in restic is encrypted. This includes the index files. So you still need to have the encryption key to look at snapshots and walk their trees.


If the backup server has access to the tree structure, then the backup isn't fully encrypted, or at least the metadata isn't. Now you need to have a conversation about what metadata is allowed to be stored unencrypted on the server and what metadata isn't. If you want a fully encrypted deduplicated backup system, I don't see how it's possible to prune without the ability to decrypt, because there's no way to know what needs to be pruned.


> ZFS is not encrypted

It certainly can be (as of zfzol 0.8, and oracle zfs has its own encryption scheme).

https://github.com/zfsonlinux/zfs/pull/5769


> (ZFS is not encrypted, but the principle is the same)

Neither.


The backup server should have information about when a particular backup was made, it should be able to prune based on backup date without having the ability to read any of the data inside the backup.


This doesn't make any sense when the backups are deduplicated (or incremental). A given file/block/whatever unit of data isn't uniquely associated with a single backup.


I assume the backups are incremental, so you cannot just toss old encrypted files without losing backed up files.


Worse, it's deduplicated, using a similar algorithm to git. You can forget old backups and theoretically that could be done without decrypting but deleting the actual data will require a decrypt.


It should be possible to have an index that says "this encrypted blob belongs to this particular snapshot". It introduces a "plain text" component of course, in that the snapshot information (name/date or whatever) is readable.

But then, I assume the way it is written, there are some blobs that belong to multiple snapshots, and some bits of blobs that belong to one snapshot but not another etc.

So given the implementation, it may not be possible, but it's still a drawback of this particular implementation that people should be aware of.


I'm no encryption expert, but depending on how the data is partitioned it can be easier to securely prune encrypted data than unencrypted data. For example, securely wiping an encrypted drive is as easy as tossing the key.


If you wish to "securely" wipe an encrypted drive, it's best to write over the old data a few times using randomly generated keys, tossing the keys each time you do a write over.


That sounds pointless. If you have real crypto, tossing the key (in a way that it cannot be recovered) is absolutely enough.


It's not pointless - it's pretty standard practice.

For example, although drives in Google Cloud are encrypted at rest, once they are decommissioned the drives are physically destroyed: https://cloud.google.com/security/deletion/#ensuring_safe_an...


> When a hard drive is retired, authorized individuals verify that the disk is erased by overwriting the drive with zeros and performing a multi-step verification process to ensure the drive contains no data.

Thanks. I guess the reason is to avoid future decryption due to either a new crypto attack or to avoid being vulnerable to potential errors in key generation.


This is massive overkill. Once is enough. If the drive is already encrypted, just wipe the key.


Couldn't the client just tell the server what to delete? Which would require only downloading an (encrypted) index of what data is in use, not downloading and reuploading the data itself.


I think that's what it does, but that downloading of encrypted indexes and comparing them takes time. Which is what was complained about.


Perhaps by using homomorphic encryption?

https://en.wikipedia.org/wiki/Homomorphic_encryption


> Secondly, I've set up an "append only" system, where my various hosts can append to their own backups, but not overwrite or delete them.

To my knowledge that's not possible with restic. You can either have append only (using for example restic/rest-server) or pruning. But not both.

My solution so far is to use a backend like rsync.net. The backup uses the normal backup + prune routine. And rsync on their server side creates daily undeleteable snapshots of all files and deduplicates the encrypted files in the process without the need to look into them. For me that's seems like a reasonable compromise for now.


I created my own restic backend using Nginx, which allows two different users to authenticate against a repo. The host being backed up knows the credentials that allow it to append. A different host knows the pruning credentials and performs that task.

I posted this in another comment, but I did this as follows (I'm the author) - https://www.grepular.com/Nginx_Restic_Backend


Another major problem is that it does not support compression: all files are stored uncompressed on the backup server (even though they are deduplicated) (sic).


You know, I actually assumed it had compression. It looks like it will get it eventually though, so I'm not too worried. That said, I can get some benefits by compressing some of my backups myself. For example, I currently backup my postgres dbs with cron jobs like:

$ pg_dump dbname | restic backup --stdin --stdin-filename dbname.dump --tag dbname -q

With restoring as simple as:

$ restic dump latest --tag dbname dbname.dump | psql dbname

I should probably stick a gzip/gunzip between those two commands.


If you stick a cheese up in there, restic will probably lose some of its ability to deduplicate.

If it does compression, then it's not an opaque process to it. But if you do it, you are essentially obfuscating the structure of the data.


gzip with the —rsyncable flag will still allow for deduplication. It uses a content-aware rolling checksum and resets the compression dictionary whenever that checksum has a certain number of zeros.

The result is a valid gzip file with a few percent less compression, but friendly to deduplication (which is what rsync does).


Good point. I'll stay away from cheesing up my commands and wait for it to be built in.


What is "cheesing up"?


It's what Google speech-to-text thinks I meant when I said "gzip" into my phone.


Ha, I was thinking there's a new gzip competitor I needed to track down called cheese.


If this is a concern, why not delegate this to the file system level - just stick backups on a compressed volume and be done with it, no?


No.

You cannot stick a compression step before restic, because it will mess up the deduplication - the chunker will not find the same chunks, and will not be able to find out which files have same contents (not as easily at least).

And you cannot stick it after restic, compressing the backup archives (or storing them on a compressing filesystem), because those are encrypted, and any encrypted file has high entropy and does not compress. (It doesn't help that restic makes the encrypotion mandatory).

So no, compression really needs to be an integral step in a backup system. Restic is definitely lacking in this regard.


Ah, you are right. I forgot it was encrypting the backups.


Yes, that is the one thing that keeps me from using Restic as my backup solution.

Unfortunately it does not seem like it's a prioritised feature given that it has been planned since late 2014.


There's an open PR to optimize prune-- essentially it lets you decide a percentage of waste per chunk you'll allow before repacking it. https://github.com/restic/restic/issues/2162



Restic is probably my favorite pure open source backup solution, but I've been using Duplicacy for years now and have been very happy with it. On the GitHub page (https://github.com/gilbertchen/duplicacy) for Duplicacy you can find a comparison to Restic (along with other backup solutions), which I found informative.


Unfortunately the primary author has mostly moved onto other things, and hasn't elected any maintainers. It's been three months without any commits, and there's 60+ open PRs.

The biggest missing feature of Restic is no support for compression.

Ah, well.


As far as I'm aware from the IRC channel the primary maintainer is mainly busy moving.

I do concede it is annoying having to fold in upstream PRs to my own build of it.


Discussion from September, including other committers and lack of PR attention: https://forum.restic.net/t/is-restic-dying/2103/5


There's a recent open PR adding compression support: https://github.com/restic/restic/pull/2441


That’s exactly how Borg started. After it was evident Attic was going to be abandoned. Though I’d say it’s way too early for Restic to land into such a predicament.


This seems very similar to Borg Backup [1], so I'm interested to hear from others who have used both on how it compares.

More generally, I've been looking for a solution that helps distribute backups in a peer-to-peer way. I have a few friends with their own home servers, and we want to replicate backups across each other's servers for geographical redundancy. Currently, I have a script that uses rsync to copy some tar archives over daily, but this doesn't scale well as more peers want to join our backup-sharing group, since it requires them granting me SSH access.

What I need is a decentralized network to share and retrieve backups from peers. I tried using dat [2] with a Borg Backup repository inside it, but ran into some nasty issues with dat which would cause it to regularly crash and one time even corrupt the data.

Does anyone have any suggestions for such a situation?

[1] https://www.borgbackup.org/ [2] https://github.com/datproject/dat


This might not be quite what you're looking for, but Syncthing[1] is a popular P2P file sharing solution. You could use Restic to make backups to a shared Syncthing folder.

[1]: https://syncthing.net/


Thanks for the suggestion. I've tried Syncthing, but it seems to still require that users are explicitly added (i.e., no public access [1]). I'd prefer a solution where anyone could decide to start helping replicate backups, without me having to add them in some way.

[1] https://github.com/syncthing/syncthing/issues/1942


Then you should look into IPFS :)

https://github.com/ipfs/ipfs


As far as I know, IPFS objects are immutable. Is your suggestion that I publish some sort of index containing all of the IPFS links to my backups, and then my friends can automate pinning those links? I think that could work, but it would be pretty bandwidth intensive since there would be no deduplication (I'd have to also encrypt since I wouldn't want everyone to have my files).


I'm not suggesting anything specific. Personally, I think allowing permanent public access to your server backups sounds like a terrible idea, but it's your data and you choose your threat model.


Crashplan home used to support this, but alas I've found no me alternative.


I've been looking for similar. I've coded up a few pieces in go and have trying to fit all the pieces into a coherent model before starting any serious coding.

My plan was to use gRPC + go for communications, a DHT to find peers, and github's klauspost/reedsolomon for adding redundancy.

I wanted to support a simple client in go, tracking all filesystem state locally in something like sqlite and of course encrypting before upload. The local state of course would be backed up as well.

Encrypted blobs would be offered for upload to a peer 2 peer server (or in small setups it could be the same machine) and accepted if they were unique. If not unique, the client would be subscribed to that blob.

The server would then chunk up 1GB or so of blobs, run the reedsolomon to add the desired level of redundancy and start trading those chunks with peers. No peer would know if you trusted them, you might well set your server to only "trust" peers until they have a 95% uptime and 95% reliability when challenged over a month. Reputation would be tracked for the peers you trade with, but only directly. Much like torrent's tit for tat strategy.

The p2p server would accept uploads from any trusted clients and work to ensure the configured replication across any peers it could find.

The peer challenges would be something like ask for the sha256 of a range of bytes of a blob the peer stored for you. Maybe 100 random challenges every few hours.

The general goal is something that would "just work", create keys, get nagged to print the keys out, and have sane defaults for everything.


Borg is great as well. I use both. Main difference to me is that Restic supports s3 natively but doesn’t have compression (last time I checked).


I've been using Restic recently for backups online (to Backblaze, which it natively supports) after using Duplicity for a while. I'm really, really impressed so far; the hash-based deduplication means:

- I don't have to think about deduplicating my own data, which works well with my packrat tendencies (e.g., multiple copies of music library on snapshots of old laptops);

- Full-disk backups of multiple machines will share a lot of storage (all system files from my desktop and laptop, both running the same Linux distro);

- Restarting a long upload doesn't depend on some finicky state regarding an interrupted previous backup; it can just rescan the whole disk and skip uploading blobs that are already there, which feels much more robust to me.


FYI those are all functions of any deduplicating backup software, of which there are several out there, Restic being only one of them.


Yes, indeed; Restic just happened to be an easy-to-use deduplicating backup program that supports various cloud storage backends, including the one I happen to use (Backblaze). Not needing to patch together some sort of pipeline of dedup --> upload to remote storage is a nice selling point.


There's also Duplicacy which I feel is a bit more mature than restic (I felt it more performing and it also has a web interface but the lack of mount capability that restic has is what I miss and retention policy is quite easier to specify for restic too), though it's a paid software for non personal usage.

But paid is good that the author has less reason to move away from it. (As mentioned elsewhere, restic hasn't been updated in a while but they recently changed their pricing to be much more agressive which I hated. I wish they give it a softer pricing model than cost per machine.)

https://duplicacy.com/

Their comparison of cloud storage providers gave me a good insight on what to choose in terms of performance.

https://github.com/gilbertchen/cloud-storage-comparison/

I use both restic and Duplicacy, so my backups are done by multiple implementations toward multiple destinations not to get bitten by one of their bugs to avoid the saying "backups aren't working when you need it the most".


This looks great. I've been using Arq for years and no complaints but I might switch.

I've also been using Syncthing for syncing machines. It's pretty great. i've got it setup so 2 laptops sync to a server and no issues so far.


I'm a long time Arq user too. I looked at Restic but was a little disappointed it didn't support compression. There is an old, still open issue with a ton of comments: https://github.com/restic/restic/issues/21


Be aware that Restic does not support compression at all currently, before you might switch.


Filippo Valsorda (Go's current crypto maintainer) took a little bit of time to look at the cryptography used by restic: https://blog.filippo.io/restic-cryptography/


Glacier Storage/Cold storage, which usually takes ~6h to retrieve files, on Cloud providers are really cheap (I know Azure has $0.99/TB/month), but I have a hard time finding any free backup solutions to support the very-high latency natively.

I remember reading a long time ago that Restic was not going to natively support cold storage solutions. Has anything changed?


You should not backup directly to Glacier, you should backup to S3 and move from S3 to Glacier by retention policy configured for an S3 bucket.


And when I have to do incremental backups or retrieve a single file, do I have to move the whole backup from cold storage, and then back again?


I did not understand "back again" part. Glacier allows you to retrieve individual files.


You should not do this, because getting your data back out of Glacier will be painful. You cannot use S3 tools to restore data from Glacier. Author of Hashbackup, and I've implemented and then depreciated Glacier support because it is very difficult for anything other than WORM-type backups.


It's not painful, it's the way this specific product works.


There are a few downsides to using restic:

No support for compression yet

No support for deleting data from snapshots

No support for continous backups (restic walks the directory tree for each backup).

No support for resilience from disk errors using par2 or similar

Backing up millions of small/empty files uses a lot of memory


Some more:

Not too many storage backends.

No GUI tool.

Hard to implement lifecycle policies like "keep N last hourly backups and M last weekly backup"; and not optimized for this usecase (needs quite a bit of data transfer to accomplish due to the zero-knowledge server-side encryption).


Restic works very well for me, and I've successfully used it to restore lost files.

The one issue I've encountered with it: it uses a lot of memory, proportional to the size of the repository indexes, so if you're backing up a lot of data on a machine without a lot of RAM (such as a virtual server), you may run out of memory. Setting GOGC=20 can help slightly, but ultimately, restic needs fixing to support working on indexes larger than memory.


Borg Backup (for similar reasons as Restic) essentially shifted backup from my "I-don't-but-I-should" list to my "solved problems" list. Is there anything similar for binary distribution?

Here's what I mean. I develop a software/firmware stack that is typically delivered to users as a set of large (100M) tarballs and binary images. Even though the vast majority of the payload does not change from release to release, it really has proven necessary to distribute just the final blobs.

Technologically, it's almost identical, but distribution implies a different set of access controls and an Internet-friendly user interface.

I see the idea floating around in e.g. NextCloud forums but this seems like a relatively compact problem without a obvious candidates to solve it.


I'm not entirely sure I understand what you're asking for, but it sounds like zsync (http://zsync.moria.org.uk/) is what you're looking for.


There are some programs that do this: https://github.com/systemd/casync . See the readme for similar tools.


Perhaps something like OSTree?


Thanks, OSTree sounds like a good place to start.


There are a few products in the autoupdater/patcher space that do something like this ("delta update" is the name of that feature).


I’ve had success with rsync and the “rsyncable” gzip flag in a similar situation


I use and like Restic but I wish I could easily have "write only" keys like I do with tarsnap so an intruder can't delete all my backups.

I heard there's a way to have an "append only" backup or something like that. Is it possible to still prune old backups from time to time?


Restic can operate over (among other methods) plain old SFTP[1]. Therefore, it does - and always has - work perfectly with an rsync.net account[2].

I personally find it reassuring that even though I might be creating and maintaining backups with sophisticated tools like borg or restic or duplicity or rclone ... at any time, and from any system I can grab those backups with dumb old SFTP.

[1] https://www.rsync.net/products/sftp.html

[2] https://forum.restic.net/t/restic-commands-for-rsync-net/216...


For a very good introductory read on what all this deduplicating backup software is all about and how it is a "new school" to the "old school" of incremental backup softwares (where duplicity (which you might want to either also check out, or migrate from) is the king) and its full/incremental model, and about the advantages and disadvantages of the two, I recommend this surprisingly well written guide from Backblaze: https://www.backblaze.com/blog/backing-linux-backblaze-b2-du...


Is backing up still a thing? I prefer to keep my laptop stateless:

* I use Ansible to configure a vanilla Ubuntu 18.04 into my workstation [1]

* I keep everything that is "source code-y" in some Git repo

* I keep non-source-code-y stuff in Dropbox or Seafile (both have a restore to previous version)

I prefer everything else to be lost (e.g. some AWS, K8s credentials).

I wipe my laptop between customer projects and it works great.

[1] https://github.com/cristiklein/stateless-workstation-config


You're just outsourcing your backups, trusting github, Dropbox, etc. to have good backups.


My general rule is to be OK with any singular thing I use failing. This can be a hard drive or a company's sync product - mistakes happen even at companies. To do this I ensure for anything I care about, there are at least 2 copies that no single "thing" can touch both of.

So if some of your git repos are only hosted by a company and you have no local clones, that's a bad position to be in if the company terminates your account for whatever reason they might decide. But if you have local clones, it's fine. An ansible script can easily cover this (clone every repo you have).

If there are some files only in Dropbox, and you only have 1 computer sync'ing with Dropbox at a given time, all it takes is for Dropbox to screw something up and those files are gone. I wouldn't personally be OK with this. You might not care that much about what's in Dropbox though.

Beyond that, there are some files I don't want any service to have unless they are encrypted locally first. So I don't use Dropbox for that. Anything like that I keep in my regular documents folder. But those are mostly for my main "personal" PC, and don't need to sync those to things like laptops.

Note that recovering files in syncing services tends to have a limited time. So hopefully you notice before that time runs out. I have a friend that lost his military service documents while still using a file syncing service. We assume he accidentally deleted them years ago. Couldn't recover it.


I do this too, and am surprised not to see more recommendations for it. It's simpler than a traditional full system backup, lessens security risks like copying credentials into a backup, and IMO keeps things better organized as well. I know Dropbox or git repos are just syncing the files I care about, in a way that I want them organized and can browse on a web interface, on mobile, sync to another device even if it doesn't have all the same software installed, etc. Whereas with a full system backup, there's not always a clear line between system configuration and data files, so depending what apps you're using the data that you're backing up might only be able to be accessed again on an exact copy of the machine running all the same versions of the same apps, which is often not what I actually want.


I don't back backup my machine states, just home directory and other data.


Your laptop is not stateless, it is backed up.

What is the difference between running the Dropbox client or any other backup client?

Restic or Borg has fewer sharp edges than Dropbox does on Linux and is generally going to be lesser maintenance.

A stateless system is one that does not keep any (persistent) state.


I'd rather have some local backup of my photos, documents, conversation logs, emails than trusting Dropbox or Seafile to always be available to me. Even if I'm using a cloud backup it's still "a thing" to keep local backups, even if it's just for the speed of the restore.

A multi-layer backup strategy with local snapshots, TimeMachine, cloud backup seems like a more sensible approach.


If your git repo provider closes your account, would you lose your data?

You should probably be backing up monthly or quarterly so that you have a local copy of all those repositories.


A backup is a coherent copy of data on independent media. Seems that what you're doing is that.


That's pretty cool! I was thinking about this for a while, and just started converting my dotfiles repo to Ansible which would be a step forward.


That is not "stateless".


I moved from borg to restic because of the native support for B2 buckets (a lot cheaper than dedicated rsync/ssh type cloud file systems). My use case is backing up daily snapshots of my Onedrive (synced locally via rclone). As my important files are mostly immutable (photo library, pdfs), I don’t have to prune snapshots. Pruning is really slow compared to borg.

I never backup workstations or home directories, as I can always generate them again using NixOS and home manager. For source code, nothing beats git repos.


I've been using Duplicati, which has been doing a great job at keeping my backup sizes low and has a handy interface.

I've been keeping encrypted remote backups for ~300 Gb worth of data, which occupies ~150 Gb. On Backblaze B2, it's costing me ~$10CAD/mo, which is a _lot_ cheaper than Tarsnap.

I might try Restic (always good to have a backup backup system), but I'm not sure how ergonomic it'd be on Windows.

https://www.duplicati.com/


My duplicati experience has not been great - I found that it couldn't recover well from backup corruption (multi-day recovery times for ~1TB of data when everything was local and USB-connected!).

The web interface is what keeps me on it.


Same here. Duplicati works great until you actually have to restore. It took me 3 hours to restore a 12GB backup. After that I quit using it.

Now I just made my home folder a shared folder on Syncthing, in send only mode, so it constantly backs up to my Nas. Much faster and the backup is directly readable.


Agreed, backup is slow (although recent versions are leveraging multi-core CPUs, so that's given a modest speed-up).

Syncthing in send-only is not a sufficient back-up strategy for 90% of use-cases, however. Without incremental back-ups, you're still susceptible to ransomware and data-loss should syncthing update after the event. I do, however, like to keep a "verbatim" copy in addition to my incrementals since they are less likely to have a "restoration" problem.


I've been using restic to backup my workstations for quite some time now and it works well enough.

At work I have a systemd service which runs every 10 minutes and use a nfs repository. Performance is good. It has saved me once already after a botched Ubuntu upgrade.

At home I have a identical service but it runs every 6 hour and use a backblaze b2 repository. Performance is not great. However I've been backing up ~20 GB for over a year now and it has cost me less than $2 _in total_ so I'd say it's worth it.


I recently set up Restic to backup my server to my local Synology and it was actually surprisingly easy. Before that I was using Duplicity which broke after a while and it all felt a bit more tricky to get going.

Documented it a bit here, mostly so I can go back and copy / paste commands if I have to:

https://blog.notmyhostna.me/backup-cloud-server-to-synology-...


I've been looking for a solution for something like this and not sure if Restic can do this, so please chime in.

Does anyone know of an open-source solution that acts just like Dropbox/GDrive where it detects for any changes in a specified folder and then once detected it automatically uploads to an S3 folder?


Syncthing and s3-fuse to create a local pipe to s3.

https://github.com/s3fs-fuse/s3fs-fuse


S3-Fuse has been really unreliable for me (hard-hangs, and similar) ... is that your experience as well, or has it been reliable?


I haven't used this combination, I haven't used FUSE in awhile, and I forgot to add a disclaimer. FUSE in general has been flaky for me, too. Syncthing has worked pretty well for me though, so I looked to see if Unix Philosophy™ could be used to smash a couple things together. I think that at the end of the day, they are both deceptively complicated technologies.



The biggest problem I've had with backup software is being able to verify when using aws s3 glacier with and without a data pull. Is this possible and efficient with restic? The only software I've found that does this well is Arq


I've been researching this use case and it sounds like Restic doesn't handle it well without adding more layers.

https://forum.restic.net/t/restic-and-s3-glacier-deep-archiv...

I've largly decided on Duplicacy over Restic, Borg or Duplicati.


I've been using this and it is the best I have found so far.

The only issue I encountered was with trying to back up 2.5TB of new data over a shitty German internet connection, which first took forever (restarting 20 times because they force your IP address to change once a day), and then when trying to prune the backup, ran out of RAM memory and actually corrupted it.

I'll continue to use it, just not repeating those specific steps, but it was not immediately apparent that the backup was completely unusable. Always test your backups (regardless of whether it's about restic, borg, examplecorp expert enterprise backup, or something else)!


Restic has that option with “restic check”, I have it run weekly and email me the results.


That sounds like a good system! I check every now and then by mounting and checking that a recent file is there, not sure why I didn't think of doing this instead.


I understand that Restic is using deduplication but what's the advantage over rsnapshot? There's AES encryption but how do you handle the credentials with automated backups?


I've been using rsnapshot for a decade or more. For a modest amount of data (a few TB) the performance is adequate, and it is easy to verify the backups. It is also easy to restore files, just go into the directory corresponding to the date of the backup. I could try to run compression in the filesystem itself to save space, but haven't done that so far.


What you think about Restic vs Duplicati? Any personal experience?


I've been using Duplicati 2.0 beta (aka "stable") builds, not canary/testing, for 2+ years, and at least three times, on three different machines each with different backends (two using ZFS, scrubbed regularly with no data issues), I've encountered issues with database corruption which wasn't fixable with repair and rebuild. There are lots of things I really like about Duplicati (OSS, backend flexibility, dedup, encryption, active development, etc.), but I'm going to be moving to another solution for a little while until a super stable release is available. Restic was at the top of my list even before this most recent HN nod.


Check out Duplicacy. It seems very solid, none of the Duplicati like issues.


At least on Windows, the biggest difference is that Restic doesn't support VSS snapshot so you can't use it to backup all files in a consistent manner. This makes it a lot less useful than other solutions on Windows.


Tangentially related but I’ll lay out a problem I have for a small office I run. I want to backup and sync my files cheaply somewhere, say GDrive. I want a Uzi that gives me snapshot/versions of files it backed up. Is there a good open source solution for this? Restic and friends are a little too technical for my non-technical office. Thanks


How does it compare to Rclone? https://rclone.org/


Afaik, restic does incremental backups (diffs) and encryption. This is great if you need to go back in time and it's safer if something for example corrupts your files (the backups don't get overwritten with corrupt versions). But it is more difficult to restore, the files are split into chunks for deduplication, so you are not able to actually see your backed up files — you need to run the restore process first. I am using rclone right now, and have it set up to archive old file versions (--backup-dir). It's simpler and covers my needs for now.


Actually you can use rclone as a backend to send your blocks to wherever rclone supports. quite nice.


I really would like to use Restic, but it seems that their repos randomly break and it seems that it is still not resolved. https://forum.restic.net/t/recovery-options-for-damaged-repo...

Any news on this?


Solid piece of software, I use this every day!

One small issue which I need to figure out is the hosts directory being empty when using “restic mount”: https://github.com/restic/restic/issues/1869


But does it come with sharepoint integration? Because if you claim to do that right, then this is surely snake oil.


I put a tutorial together for this not too long ago, a very good tool and great when paired with Minio/s3 - https://github.com/alexellis/restic-minio-civo-learn-guide


I recently set this up using Nginx as my restic backend server. Not as a proxy to a backend server, but the actual backend server. No other applications required:

https://www.grepular.com/Nginx_Restic_Backend


Fantastic software. Fast, locally encrypted before being shipped to the cloud (or a usb disk), just works.


I've been using it for about 6 months now, backing up my home directory 3-4 times per week to an external HD. Would also agree that it's fast, easy to use, and also easy to recover files. I'd add that the documentation is really well written.


I would argue that the 2nd most important part of a backup is the restoring. Many products don't focus on that enough because it's not the part that most customers will experience until there is an emergency.


from what i read restic has memory scaling issues for certain areas (large amounts of small files, pruning operations).

that said, i do use it personally for private and small stuff.


I like HashBackup. It's not open source unfortunately, but I like the incremental encrypted backups while also being able to pull out a single file easily.


> Easy: Doing backups should be a frictionless process, otherwise you are tempted to skip it. Restic should be easy to configure and use, so that in the unlikely event of a data loss you can just restore it. Likewise, restoring data should not be complicated.

So does it have a GUI Version?


Yeah also interested in a GUI?


I was unable to find one so I guess it's just "easy" for a certain narrow group of people.


Learned about this program when pwning Registry on HackTheBox


Metadata?


Just so I understand right and out of curiosity, what does that offer more than the likes of Nextcloud: https://nextcloud.com/ or Duple: https://www.duple.io/en/

Since if I understand correctly both provide backup in addition to other functionalities?


That's like asking what a markdown renderer offers that Wordpress doesn't.


Nextcloud is a different product. restic is for backup.


May you elaborate? What does it offer more? Since Nextcloud and Duple do offer backup as well.


Nextcloud is a web app that lets you work with files and offers lots of other plugins too, like e-mail, calendaring, office, contacts. It's not a backup program. See https://nextcloud.com/ or https://en.wikipedia.org/wiki/Nextcloud


You might, for example, use restic or borg to back up your Nextcloud's installation and storage.


or, restic is your NAS backup while Nextcloud is your open source dropbox, roughly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: