Kopia is great, though it's worth noting for folks on Linux: non-UTF-8 paths aren't stored correctly [1] and xattrs aren't stored [2]. While most folks probably won't care about the former, the latter can could cause issues (eg. losing SELinux labels makes it difficult to restore a backup of the root filesystem on distros that use SELinux).
> losing SELinux labels makes it difficult to restore a backup of the root filesystem on distros that use SELinux
True, but one should get into the practice of creating persistent policies rather than ad-hoc `chcon`'s all the time, so that a `restorecon -FR /` always restores all SELinux labels to the intended state, then the biggest hassle is just booting into initramfs shell to mount rootfs and `touch /.autorelabel`
My backups includes photos and files form the late nineties when I used dos/windows. At some point, their encoding got messed up and many file names are now not UTF-8.
I can’t be bothered to fix the names (it’s not realistically a problem), but if backup software can’t handle them, then it can’t handle backup up my data.
I wish I could upvote this more than once. Those are omissions that could lead to important information being lost in a way that you probably wouldn't notice when checking your backups.
> Yes, there's a whole bunch of things currently not captured at the filesystem level, including setuid/gid, hardlinks, mount points, sockets, xattr, ACLs, etc.
That was three years ago and it sounds like things are only slightly better.
These people are incompetent at making backup software.
We are trying to use it for large backups of a production item, and it has not been a complete smooth ride all along.
We have many files (millions) and lots of churn over ~80Tb total.
Kopia has exhibited some issues:
- takes about 120GB (!) of ram to perform regular maintenance & takes about 5hrs to do so. There are ideas floating around to cherry pick the large inefficiencies in the GC code but it’s yet to be worked on. I’ll try to have a internship accepted to work on this in my company.
- there’s a good activity on the repository but the releases are not quick to come and the PRs are not very fast to be examined
- the local cache gets enormous and if we try to saddle it, we have huge download spikes (>10% of repo size) during maintenance. Same as above: pb is acknowledged but yet to be solved
- the documentation is very S3 centric, and we discovered too late that the tiered backup (long term files go into cold storage on s3) is only supported on S3, while we use azure. We contributed a PR to implement it in June, yet to be merged (see point 2)
So, not too bad, especially for a small-ish project maintained by mainly one person (from the looks of my interactions on slack and seeing the commit log). The maintainer is easy to reach and will answer, but external prs are slow. If I could use zfs cheaply on azure via s3, I’d use it over kopia, but as of now, it works.
Well if you drop the Azure part (which really just means the Azure Storage S3 compatibility layer), that's a thing. Or at least some people were trying to make ZFS on object storage a thing. It'd be good as an offsite backup.
Not really dumb. I do use them too but with Borgbackup on the top (since they support it natively).
I found Borgmatic ( https://torsion.org/borgmatic/ ) to be the best way to run my backups. It takes care of everything from pruning to verifying the checksum etc... and it integrates with some monitoring (like cronitor).
I am using rsync to rsync.net from multiple different hosts with different configurations. I run the same command on every host running variations of *nix, no messing about with different tools needed.
I need to store on Azure for DRP reasons: we would DRP to azure and need the bandwidth.
Also, storing the ZFS snapshots on Blob storage would still require us to retrieve the entirety of the 80TB before being able to use it. I need native ZFS at Blobstore-competitive prices
I've been using Kopia for my personal use and for products I have helped build at a couple of enterprise backup companies! It's also used by other open-source backup projects that focus on specific ecosystems (Velero and Kanister for Kubernetes, Corso for Microsoft 365 backup).
I am obviously biased but it's pretty amazing. AMA.
Do you (sorry, but just checking) repeatedly test backups? Eg pull monthly and bit verify that they're correct? Are you aware of anyone testing in this way?
I can only compare it from a user experience point of view. I tried duplicati for my windows laptop and was never quite happy. Kopia just worked from day one. The front-end still has a few bugs here and there particularly if you on windows electron eating sockets, WebDAV mounts not always working), however the backend seems very reliable (only did one full restore, but I also did not note any reports).
It still has a lot of potential, IMHO. You e.g. find some hints how to use it with AWS storage tiering in the docs.
I haven't looked at duplicati in a while and, it has evolved. While Duplicati's feature set looks similar now, I would need to benchmark it both for efficiency and final backup sizes.
And, while not directly, I know a number of companies, including mine, do test restores all the time.
Not the previous poster, and I don't use Kopia, but after reading Kopia features and docs, they seem to be in par. I use Duplicati quite extensively for personal backups, and didn't really have any issues.
Duplicati has a web interface, so with a proper authentication in place, you can use it to remotely monitor and manage backups.
Duplicati doesn't keep a local cache. It uses SQLite files for file meta data, but not for the content themselves.
I like Duplicati's snapshotting mechanism. You can specify how long or how many snapshots to keep, and my anecdotal evidence is that it's archival storage-friendly. I imagine S3 and it's lifetime management rules can bring a decent and cost effective backup solution.
I'm using Google Drive 2TB plan, and I didn't see Kopia supporting Google drive out of the box.
Hmm. I'm asking because I've had some trouble with Duplicati. I use it on a laptop, and it does not like being interrupted during a backup. It also doesn't fail the backup, which would be fine; instead, it gets jammed on files, particularly the large (multigb sqlite) file it generates to track state. It remains jammed, even once network is restored, and measures upload speeds at single-digit bytes per second. I end up having to force kill it after multiple abort requests fail to stop the jammed backup, and there's multiple warnings that this can corrupt data / you shouldn't kill the process...
So anyway, I'm looking for alternatives.
Duplicati also, somewhat annoyingly, nails 100% cpu for a while during backup which spins up the fans and gets my laptop very hot. I've been meaning to see if there's a simple way to modify the code to prevent this, but I'm very unfamiliar with C#.
Do yourself a favor and use zfs as your primary backup, even though it means you'll have to replace your filesystem, it's just that good.
Faster than any other backup software (because it knows what's changed from the last snapshot being the filesystem itself but external backup tools always have to scan the entire directories to know what's changed), battle tested reliability with added benefit like transparent compression.
A bit of explanation on how fast it can be than external tools. (I don't work for the said service in the article or promote it.)
The term "best" apparently means reliable for backup and also they don't start choking on large data sets taking huge amount of memories and roundtrip times.
They don't work against your favorite S3 compatible targets but there are services that can be targeted for those tools or just roll your own dedicated backup $5 Linux
instance to avoid crying in the future.
With those 2, I don't care what other tools exist anymore.
Bupstash (https://bupstash.io/) beats Borg and Kopia in my tests (see https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtm...). It is a modern take very close to what Borg offers regarding the feature set but has a significantly better performance (in terms of resource use for running tasks, the backups were slightly larger than Borg's in my tests).
Lacks features like mounting a filesystem as read-only to restore though. I find that makes restoring files much simpler. I'll look into bupstash some more perhaps but right now am very happy with Vorta/Borg.
Personally I use borg with BorgTUI (https://github.com/dpbriggs/borgtui) to schedule backups and manage sources/repositories. I'm quite pleased with the simplicity of it compared to some of the other solutions.
I can't give you a meaningful comparison between all of those, because I haven't used all of them, but I can say that I've been pretty happy with Restic in the time I've been using it.
Do you have any odd requirements that one might serve better than the rest? If you just want bog-standard backups, any of them will probably do.
Do you know how Bacula compares to Bareos? Bacula is on my to-do list to look at (also because I need tape backups), but the Bareos fork seems to have a more modern interface - but I've not stress tested either solution. The fact that Bacula has a Debian package and Bareos does not pretty much settles it already, but just curious if someone has actually tried both.
I found Restic slower in general, though Kopia is not super fast either when you have many backup sources (in my case I have 30+ separate directories I am backing up; Kopia works amazingly fast per single directory but make them 30+ and it's not as fast because it's basically going through them one by one and it's NOT going through them in one pass, sadly).
Used it for a while, recently tried to restore some things and it failed, taking a really long time to restore some snapshots compared to other things I've tried. Switched to restic instead. Really like what kopia is but I'll wait a few more years before considering it for something, but right now I'm happy with restic.
Too bad no one besides kopia does ecc, which is the reason I switched, but when I checked out why restic didn't do it, it was because they saw what other people did and apparently it's way too easy to make a bad implementation.
I tried to restore a ~200 GB file (stored remotely on a Hetzner Storage Box), and it failed (or at least did not finish after being left for ~20 hours; there was also no progress indicator or status I could find in the UI).
I also tried to restore a folder with about ~32 GB of data in it, and that also failed (the UI did report an error, but I don't recall it being useful).
Also, in general use, the UI would get disconnected from the repository every few days, and sometimes the backup overview list would show folders as being size 0 (which maybe indicated they failed; they showed up with an "incomplete" [or similar] tag in the UI).
yeah, I had some weirdness with the UI and disconnects as well. My takeaway from trying it was that I wouldn't want to use it for something if I need peace of mind for my data.
Just for fun, since I still had it installed and haven't gotten around to cleaning up the remote data, I updated to latest (v0.14.1) and tried the restore tasks again.
Both the single file restore and the folder restore worked, though the single file restore still didn't have any progress indicator I could see.
Looking through the changelog, nothing really stood out to me as something which would have fixed this. Not really sure what went wrong the first time around, perhaps it was network issues with Hetzner?
Any comparisons to Restic? Looks like basically the same thing but with a GUI available.
Edit: Found this very ad-hoc "benchmark" from over a year ago claiming that Kopia managed significantly better deduplication than Restic after several backups (what took Restic 2.2GB, Kopia did in <700MB). No idea if the advantage falls off outside of this particular benchmark, but if it doesn't then that's a pretty big improvement. https://github.com/kopia/kopia/issues/1809#issuecomment-1067...
Edit Edit: Never mind, this benchmark was from before Restic supported compression, which is the why its size is so much larger. Feels like that should have been mentioned.
Restic only (relatively) recently implemented compression so I'd run my own tests before making a decision. I'm currently thinking about migrating off borg to restic and disk space usage is very similar based on my testing.
UX is very important for backups (considering that users can be very lazy, so the least friction, the better), so a GUI is an important component; those who don't care about UX surely have infinite ways to perform their backups.
My dealbreaker with Restic was near-realtime backup - the discussion has been open for 5 years now: https://forum.restic.net/t/continuous-backup/593); this is also a UX problem. I haven't checked if Kopia supports it (or has better support, anyway), though.
> ... was from before Restic supported compression, which is the why its size is so much larger
deduplication != compression...
So we don't actually know which has better deduplication? Compression algorithms are well-established and you can find a million people who benchmarked them all for different purposes, but deduplication algorithms I never saw a comparison of. I don't even know if these things have proper names or if people just refer to them as "rsync-like"
I just switched over to Kopia as I was not entirely happy with restic, and so far one feature that actually got me by surprise is that Kopia supports reading .gitignore files! So you can tell it to back up your projects folder and it will automatically respect whatever each project has in gitignore. Now that may be possible with restic, but I spent way more time trying to configure scripts and yaml files than actually running backups there.. The GUI does help!
I've been using Kopia for about a year to backup sensitive data multiple times a day into an off-site encrypted storage and it's worked like a charm so far.
The presence of a WebUI is so nice compared to CLI-only tools.
I have also used it for a year or so and restored some files individually without any problem. I use a remote ssh storage through tailscale, just very stable interestingly! Only small problem was that my server broke during an uptade because I created a systemd service to start it and the parameters changes to start it. Apart from that, very stable for now.
I use Restic for personal backups and at work, but I thought I'd try Kopia to see if it'd be a good fit for my less techie relatives. This was about a year ago.
The UI didn't seem like a good fit for those who are less technical. I don't remember the specifics.
Does anyone have recommendations for backup services for the average user?
I used to use Restic many years ago, when it wasn't clear what the future direction of Borg (a.k.a Attic) would be since development had mostly stalled. At the time it was missing some features and Borg managed to fix its issues quickly enough that I ended up using that.
These days, from the minimal time I've spent looking at this, it seems that Borg and Restic offer basically the same feature set with similar performance. I'm curious if you (or anyone else) considered Borg and what set Restic apart for you.
Ditto for Kopia, I guess. I've never heard of it before.
> I thought I'd try Kopia [for] my less techie relatives. This was about a year ago.
> The UI didn't seem like a good fit
Didn't seem like, or wasn't based on your trying it? Did you end up trying it? If not, what do your relatives use currently, is that better (even if not ideal, since you're still asking for recommendations)?
Kopia also means a copy in Polish and the author is Polish. The first paragraph in the software's Github page also confirms the Polish origin of the name: https://github.com/kopia/kopia/
Tangentially, as far as OSS names of Polish go, kopia is pretty tame. A popular deduplicating app is named czkawka (hiccup). Now that choice is just mean towards non-Polish speakers. :)
Oh my, this is a fantastic view of life - from the czkawka github page:
Czkawka is a Polish word which means hiccup.
I chose this name because I wanted to hear people speaking other languages pronounce it, so feel free to spell it the way you want.
This name is not as bad as it seems, because I was also thinking about using words like żółć, gżegżółka or żołądź
To be more precise it means "copy" in the noun sense, not the verb. It's a distinction that a lot of bad translations from English fail to make, like when X changed "tweet" to "post".
Where does one even acquire a VPS that makes this worth it? Most VPS pricing I've looked at is significantly more expensive than something like BackBlaze or IDrive. So what even is the point of rolling your own backups if you can't get cheap terrabytes in the cloud? And no I'm not going to consider something like S3 because Amazon's pricing is obnoxious and confusing. Edit: $70 / month for 3TB of S3. Significantly more expensive than all of the managed SaaS backup providers.
AFAIK kopia has a S3 backend so can backup to idrive E2. That said I have a vps from greencloud with 2 TB (and 4 cores) for $80 a year which is very price competitive. There are actually lots of smaller vps providers that offer cheap storage vps. Lowendtalk.com is a good place to find out about offers in particular around black Friday.
Borg 2 has been in development for nearly a year and a half [1] and may probably be released early next year, i.e., early 2024 (just a guess, seeing that even RC1 is not yet released and seems to have a lot of work to be done).
Does anyone know how Borg 1.x and 2 would compare to Kopia?
Borg 2 is usable in my experience. It’s a breaking change. So tgr developer seems to be mulling over whether to release what is available or add in more breaking changes at this point.
I tried borg and liked it but I couldn't see how to purge 'file.big' out of prior backups. i.e. one you knew for sure you didn't need any more and would achieve disk space recovery.
Love the asciinema demo. More projects should use it. Sometimes I have the urge to make a pull request on Github projects without any screenshot or video.
> Kopia does not currently support cloud storage that provides delayed access to your files – namely, archive storage such as Amazon Glacier Deep Archive. Do not try it; things will break.
Sigh… and unfortunately all too common for there to be no cold storage support.
I have been using Kopia since June 2021 since I first posted it on HN [1]
My use cases are very basic, but it just quite works all the time. My total backup is roughly 1TB with more than million files at multiple locations like local, remote SFTP and Amazon S3.
I like being able to extract the selected data(directory/file) from snapshot without restoring the whole snapshot.
de-duplcaition is icing on top.
The developer Jarek is very responsive on Kopia Slack as well.
Object lock support on S3-compatible storage does it for me. That was my only concern with restic, in the event of ransomware and compromised bucket credentials.
https://kopia.io/docs/advanced/ransomware-protection/
No. Didn't seem to be amenable to setting it via the GUI, and all in all just a lot of frigging around. Haven't personally had to use Windows for a decade or so, so argh.
Uranium backup charges extra for windows shadow copy support, so I always assumed it was tricky to do. Kopia seems to do it with a “policy” that runs a little PowerShell script before and after entering a directory. Does it work without problems? (I just need to try it myself!)
I like bupstash but it doesn't compress well. Admittedly I haven't tried it in several months but Borg absolutely eats it for breakfast with zstd compression.
Very different products and can't compare them. Veeam is enterprise-grade and used for a larger variety of mission-critical workloads. Kopia is meant for end-user backups (though folks use it for a bunch of other things too).
rclone is way underrated and should be used by a lot more people. But the various UIs are still a bit technical. Though if I were writing backup software nowadays, I'd probably write a skin over rclone.
The state of backup tech is surprisingly bad, and runs OS deep. Even with modern solutions like restic, you have no guarantee of a consistent backup[0] unless you pair it with a filesystem that supports snapshots like btrfs, ZFS, etc, which basically no one other than power users are even aware of. Interestingly, Windows ships snapshot support by default[1], but you need admin privileges to use it.
Also, it's unclear to me what happens if you attempt a snapshot in the middle of something like a database transaction or even a basic file write. Seems likely that the snapshot would still be corrupted. So for databases you're stuck using db-specific methods like pg_dump.
All this complexity makes it very difficult to make self-hosting realistic and safe by default for non-experts, which is the problem I'm having.
The advantage and disadvantage of Linux simultaneously is that you need to pick such a filesystem or work around the limitations, but it's your choice. The OS underneath really should not be responsible. Apple solves it with APFS snapshots and Microsoft has Volume Shadow Copy (which requires NTFS or ReFS).
I personally use compose for all my services now and back up my compose.yaml by stopping the entire stack and running a restic container that mounts all volumes in the compose.yaml.[1] It's not zero downtime, but it's good enough, and it's extremely portable since it can restore itself.
Unfortunately, this is not true. You need to grab all the DB files (WAL, etc.) in a consistent manner. You can't grab them while writes are in progress. There are ways though. Look at what Kanister does with its recipes to get consistent DB snapshots to get a sense of the complexities need to do it "right."
> Unfortunately, this is not true. You need to grab all the DB files (WAL, etc.) in a consistent manner. You can't grab them while writes are in progress.
Perhaps you could be more specific, because the former is exactly what a filesystem snapshot is meant to do, and the latter is exactly what an ACID database is meant to allow assuming the former.
> Look at what Kanister does with its recipes to get consistent DB snapshots
I looked at a few examples and they mostly seemed to involve running the usual database dump commands.
FreeBSD had a pretty decent option in the base system two decades ago - FFS snapshots and a stock backup tool that would use them automatically with minimal effort, dump(8). Just chuck `-L` at it and your backups are consistent.
Now of course it's all about ZFS, so there's at least snapshots paired with replication - but the story for anything else is still pretty bad, with you having to put all the fiddly pieces together. I'm sure some people taught their backup tool about their special named backup snapshots sprinkled about in `.zfs/snapshot` directories, but given the fiddly nature of it I'm also sure most people just ended up YOLOing raw directories, temporal-smearing be damned.
I know I did!
I finally got around to fixing that last year with zfsnapr[1]. `zfsnapr mount /mnt/backup` and there's a snapshot of the system - all datasets, mounted recursively - ready for whatever backup tool of the year is.
I'm kind of disappointed in mentioning it over on the Practical ZFS forum that the response was not "why didn't you just use <existing solution everyone uses>", but "I can see why that might be useful".
Well, yes, it makes backups actually work.
> Also, it's unclear to me what happens if you attempt a snapshot in the middle of something like a database transaction or even a basic file write. Seems likely that the snapshot would still be corrupted
A snapshot is a point-in-time image of the filesystem at a given point. Any ACID database worth the name will roll back the in-flight transaction just like they would if you issued it a `kill -9`.
For other file writes, that's really down to whether or not such interruptions were considered by the writer. You may well have half-written files in your snapshot, with the file contents as they were in between two write() calls. Ideally this will only be in the form of temporary files, prior to their rename() over the data they're replacing.
For everything else - well, you have more than one snapshot backed up, right?
> Also, it's unclear to me what happens if you attempt a snapshot in the middle of something like a database transaction or even a basic file write. Seems likely that the snapshot would still be corrupted.
You just quiesce the database first. Any decent backup engine has support to talk to a DB and pause / flush everything.
Depends on your database size, type, change rate, etc. Dumping the database to a file is fine for toy and small cases, but not for a 1+TB store that's under heavy writes.
I know that Tape Backups are not hip and sexy, but CloudNordic showed us just last month why they still matter even in 2023 and beyond, so you'd definitely want to look at an additional solution for your large servers, with a proper rotation/retention strategy (e.g., GFS). You _need_ offline backups, if you think you don't, you just got lucky for now - or have data that can be recreated from other sources.
For an online hot/warm solution, I'd use sending ZFS Snapshots into a backup server to then compress and encrypt them there, though keep in mind that for running systems, it may still not be enough (e.g., backing up a running Postgresql server through a file system snapshot may not be enough - there's an entire section in the documentation about backup options).
That said, it's good to have more options, and you really want to use something for your personal stuff as well, so the more options there are, and the more user-friendly/turnkey they are, the better!
Just be aware that backup solutions in a corporate/network environment are more complicated than just copying some files across. And also remember: Good companies test their backups - but great companies test their restores.
> backing up a running Postgresql server through a file system snapshot may not be enough - there's an entire section in the documentation about backup options
> An alternative file-system backup approach is to make a “consistent snapshot” of the data directory, if the file system supports that functionality (and you are willing to trust that it is implemented correctly). The typical procedure is to make a “frozen snapshot” of the volume containing the database, then copy the whole data directory (not just parts, see above) from the snapshot to a backup device, then release the frozen snapshot. This will work even while the database server is running. However, a backup created in this way saves the database files in a state as if the database server was not properly shut down; therefore, when you start the database server on the backed-up data, it will think the previous server instance crashed and will replay the WAL log. This is not a problem; just be aware of it (and be sure to include the WAL files in your backup). You can perform a CHECKPOINT before taking the snapshot to reduce recovery time.
Yeah, I think I really didn't like the "The server thinks it's crashed and will run a recovery" part, but they do go out of their way to call out "It's fine, just make sure the WAL is also backed up".
Can't even claim that this is a recent addition, since it's documented like that since Postgresql 8, which was released in 2005.
[1] https://github.com/kopia/kopia/issues/1764
[2] https://github.com/kopia/kopia/issues/544