Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Best Centralized Backup Solution
133 points by bkgh on Dec 16, 2018 | hide | past | favorite | 67 comments
What is best free software for managing and scheduling backup procedure of folders/databases in many clients? I want a centralized system that have fine-grained schedule and can show backup status of folders and databases in different clients. It's better to have plugins for supporting backup using different kinds of tools like PostgreSQL(pg_dump), MySQL, Redis, SSH ,... . Backups can be stored on different places like S3, Dropbox and local folders.

ZFS snapshots!... I _wanted_ to use "rsync.net" for this, they use FreeBSD with ZFS snapshots in a jail, the idea is that you get your stuff onto the account however you like over SSH (i.e rsync), then the snapshots serve two purposes: snapshots through time, and read only access incase you get 0wned.

I couldn't use rsync.net though because all it's datacenters are outside of EEA, i'm currently looking into doing this myself on a simple VPS, ZFS for linux has matured quite well it seems. I'm a bit new to doing chroot jails on linux but the ZFS snapshot part is very easy if that's enough for you.

    apt install zfsutils-linux
It's pretty trivial to make a pool put it in a user directory and then make snapshots... you could easily make a script to do schedule the snapshots, or there are at least two tools already around to schedule this for you via either cron or systemd timers: zfsnap or zfs-auto-snapshot respectively.

RE databases, with some extra work you could also use ZFS on the source server and take a snapshot of the database (once you invoke the correct commands to lock it), rather than do a dump, this would be very fast because it prevents the duplication of a dump, and therefor could be done much more frequently, you however have the additional complexity of then syncing the snapshot to another servers ZFS pool, although there are tools for this I haven't bothered going this far.

Yes, this is the part of our Zfs snapshots that many people don’t appreciate – they are immutable from your perspective. If an attacker gains access to your rsync.net credentials and destroys your local copy and your remote copy you still have seven days (or more) to discover this and access your historical snap shots which are online live and Browsable.

> I couldn't use rsync.net though because all it's datacenters are outside of EEA

Honestly curious, what's wrong with Switzerland? It's not technically in the EEA but it's part of EFTA and the single-market so has to meet the same requirements as an EEA country.

You are absolutely correct, this is essentially for meeting GDPR requirements and Switzerland (as far as I can tell) is under the same legal requirements in that regard despite not being in the EEA.

The problem is our customers are not end-users, they have their own policies and a significant number of them have a much more rigid interpretation of GDPR data storage rules which we've been forced to integrate into our own. This is one of the many problems of overreach due to the fear that GDPR creates IMO.

Note that chroot is not a security feature, it can be compromised in many ways. You probably need containers like lxc/lxd

I thought it was reasonably secure (consider this would only include minimal binaries required for rsync without an interactive shell), but I am relatively ignorant in this area, would you care to elaborate?

LXC/D was on my radar but all of this is a little more complexity than I was anticipating dealing with myself (the security side of locking down the rsync account at least). I may even settle for "restricted rsync". This is why I value rsync.net, they do this for me - and for a much lower cost I might add (I am not a security expert).

Indeed. For clarification, FreeBSD jails are considered safe, since they add capabilities checks on top of the chroot. Chroot itself, on the other hand, is relatively easy to break out.

Not sure if you have ever used, but you might enjoy FreeBSD :) using jails is not at all that difficult. To be honest I find it easier to use jails than docker. Never used LXC so I don't know how it compares with regards to ease of use.

Thanks. I've used FreeBSD before, but have far more experience using Linux in production, I may have to give it another go some time for specific purposes like this.

BackupPC is the best one: https://backuppc.github.io/backuppc/

You get every professional features out of the box (full/inc backups, deduplication, compression...) & everything is automated.

Wow. Open source, free, compression & deduplication, web interface, no client software needed, actively developed... WHY DIDN'T I KNOW ABOUT THIS BEFORE?! I need to try it out!

I have used backupPC for quite a while. It works great but doesn't scale very well when you are getting to the scale of TBs upon TBs of files, 100s of millions of files. I ended up having to run multiple servers.

If you want something along these lines that scales better, rsnapshot works wonders. Rsnapshot is much more simple, as it doesn't create a large pool of files to compare and de-dupe against, it uses rsync's built in deduplication features (so, it doesn't create a "pool" like backuppc.) Techincally, backuppc uses less storage space, so if that's the concern, use it. but rsnapshot is my tool of choice.

EDIT: I will mention that my experience is based on having to deal with very large sets of data across numerous backup targets and at least 4 years ago, I have not tried any of the latest updates that may make this solution better or make performance better. Most people that were in my position probably would have had the funds to leverage an enterprise vendor solution at that scale. For what backupPC does, it was pretty amazing to work with and rock solid in terms of functionality at smaller scales.

No idea why this was downvoted. I don't have recent experience setting up something like this but 10 years ago BackupPC was the answer, and I guess it is still good.

Does anyone have a different experience?

I used backuppc at a place ohh, 8 or 9 years ago now and it was pretty rock solid then (90/10 mac/pc clients) to a linux server with some applescript to manage keys and such.

> No idea why this was downvoted


How can you downvote a reply?

You need a minimum karma, 500, I believe.

It's 501 for some reason.

No client software, so I would have this installed on a VM on our server and it just goes to crawl and pull files from all the PCs?

Then does the hard part with diffs and etc?

Isn’t this already built in to Windows and I assume Linux(es)?

(I apologize for my ignorance here, not a sysAdmin)

It’s been around since 2001. No idea why it isn’t more popular. (Haven’t tried it myself.)

backuppc works really well, the drawback is that it is kind of slow if you have lots (millions) of files.

To be fair that's because it uses rsync, and rsync is itself very slow when dealing with millions of small files.

I think you're looking more for the "scheduling, managing pull from different clients" than the ability to backup to S3, Dropbox, local folder etc.

There are plenty of software projects to choose from for the latter. Borgbackup, Restic, Plain old NFSv3 and rsync, Tarsnap, Backblaze B2 etc.

I simply use a combination of Cron, rsync and Ansible to make backups to our central NAS, and that central NAS is mirrored to our off site NAS (with extensive snapshots and tape backups).

I had to use Netbackup for the Tape system which does everything you listed - managing and scheduling backup procedure in "clients" and does fine-grained schedule, show backup status etc. But, I HATE that thing with all my gut. It is one complicated pile of crap that I have to fight all the time to make it write to the damn tapes. Nothing like plain old simple Unix tools.

and if rsync doesn't quite work for what you need/a target you want to use, rclone is always an option as well


Yes, but, I have used rclone via Restic which encompasses more functionality I think. compression, deduplication, encryption etc.

Self promotion ahead: I worked on a SaaS product to backup databases on S3 called DBacked. After some time I decided to open-source most of it. It's not made to backup files, only databases. You can look at the open-source project here: https://github.com/dbacked/agent and the SaaS product here: https://dbacked.com/

Looks cool!

Though for DB's i like to have near realtime backups. Like barman ( https://www.pgbarman.org ) does for PostgreSQL or Ottomatic ( https://ottomatik.io/ ) for MySQL. Both work with the similar principal of taking a full backup (pgdump, mysqldump, etc.) and then to stream logs (WAL in case of pgsql and binlogs in case of MySQL). These logs allow you to roll forward your full back to a certain point in time. Since the logs are streamed realtime you have a near realtime backup with point in time recovery.

I don't know of an open source alternative for ottomatik BTW. So if anyone knows one...

I like https://www.urbackup.org for Windows boxes. Does incremental file and image backups. Next system for the Linux world will most likely be https://restic.net with a rclone backend that saves to B2. Idea is to have an append-only service that is safe from ransomware deleting backups for extortion purposes.

I recommend everyone to read The Tao of Backup. It's a lovely but smart writeup concerning seven most crucial aspects of backup.


I like using syncthing to sync to file to another (sometimes multiple) location and the use restic (https://restic.net/ :)

I tried synching but it didn't seem very reliable, I had to restart it regularly to keep things in sync. In the end I gave up and moved back to DropBox.

For the second part of your requirement - supporting different backup providers - I'm a huge fan of rclone: https://rclone.org

I like to rsync into a BTRFS volume, and then take a snapshot.

It's simple, fine-grained (you can use .rsyncignore files like .gitignore files) and versioned.

Why do you choose not to use Btrfs' send and receive commands? They do what rsync does, a bit more efficiently, unless I am missing something.


In addition to what has already been said, the last time i used btrfs in this manner the protocol had issues with some extended attributes which happened to manifest in issues on binaries which lets them gain additional capabilities[0] for example. Most notably this made iputils ping[1] utility unusable after branching of a previously made snapshot for further usage. This appears not to be a too common thing in other distributions but in arch the ping utility uses CAP_NET_RAWIO capability in order to avoid the need to use setuid which appears to be more common elsewhere. None the less i think that is something one would want to know the consequences of as i am sure there are quite a few more binaries with extended capabilities set in the extended attributes of its file or use them in another manner like security labels for SELinux. I just rechecked if this is still the case and found this patch[2] but i am not sure this has already landed.

[0] https://linux.die.net/man/7/capabilities

[1] https://linux.die.net/man/8/ping

[2] https://patchwork.kernel.org/patch/10388011/

For my usage, I have BTRFS on my backup host, but not on all (or even any) of the machines I want to back up. They're variously macOS, Windows, Linux ext4, FreeBSD, and OpenBSD.

rsync has the upside of being universal.

btrfs send will only work on btrfs filesystems, you probably also want to backup other stuff.

Go for the btrbk tool - automates snapshots, and sends / receives them incrementally via SSH to a Server, with pretty fine-grained retention settings. Awesome.

There's a lot of good descriptions in this book:

Backup & Recovery - Inexpensive Backup Solutions for Open Systems


It's from 2009, but going through the comments here it seems not much has changed. It will help you choose the right tools, and with many of the open source ones, configure them.

I personally use Bacula: https://www.baculasystems.com/

But in truth, it's a bit complex to setup. You have 3 daemons (agent, director and storage) each with its configuration.

This configuration files must match exactly (a bit like a nagios configuration) (http://www.bacula.org/2.4.x-manuals/en/main/Conf-Diagram.png).

Adding SSL is also a bit annoying, as there are many moving parts.

But once setup, it works quite well and bconsole is nice.

It's also quite easy to install since it's already available in most distributions.

You can set backup policies on a per job basis, do full, incremental and differential backups, you can also set a pre and post backup scripts as well as do the same for restorations. It is quite comprehensive albeit a bit complex.

At home, I use it with an old LTO-1 tape recorder/bank (IBM 3581-H17) repaired with pieces of bicycle tube (I find this kind of robots fascinating, a bit anachronistic in our age, but still relevant).

In the end, I'm not sure if I will recommend it, but it's definitely a viable option.

I've been playing with Keybase for backups. It's neat because you don't need to deal with any proprietary APIs or scripts (but you do have to install Keybase). The backups are encrypted and can be accessed from another device as well either through your own account or as part of a Keybase Team. For now you get 250 gigs free storage.

I started using Backblaze B2 on servers for storing database backups and so far it's been great.

Now I wonder if `b2 sync` is a good candidate for more traditional desktop backups (documents, lots of photos...). Does anyone have feedback on it?

B2 can't do simple things like move or copy,[1] so in my opinion it's really bad for those kinds of applications. For example, if you move a 5GB video file, you shouldn't have to delete+re-upload the entire thing just to change the path.

[1]: https://github.com/Backblaze/B2_Command_Line_Tool/issues/525

Using a file storage service like B2 or S3 as a file system is a bad idea anyway. You won’t be able to use features such as deduplication and incremental backup.

It’s better to use an external tool such as Restic or Duplicacy. So moving 5GB file is just a matter of changing a few KB index file.

Depends what you're backing up IMO.

In the case of my home video and pictures, I have terabytes of data that is largely unique+incompressible so the "dedupe" step on every backup program I've used so far takes a ridiculously long time and yields approximately zero space savings.

Incremental backup I also don't care about because `s3 sync` already only syncs what changed, and I don't care about restoring to previous versions because I want it to be an append-only store, which S3 versioning gives me.

I also need it to be simple for my family to recover data from in case I die suddenly, so I don't want them to have to decode some binary format to get at our pictures and video.

Makes sense. I wasn’t think about all this use case.

I'm using b2 on both desktop and couple of servers. On desktop i use Cloudberry backup, but that has some irritating "features", such as not deleting remote files sometimes and getting out of sync and having to sync manually. I'm still using it because i'm cheap and don't wanna spend money on something else. but b2, itself is great!

rdiff-backup works like magic. I love it.

It's a command line tool that does all stuff related to backup perfectly. It's in the Debian repos.

It synces a given directory to a backup directory with an additional dir for metadata and history. It's super fast, even over the wire. It seems to achieve this by only sending the changed parts of files. I wonder if it also uses compression?

You can always change your directory back to any state you backed it up.

It has tons of nice features for checking the integrity of the backup, recovering single files from a given point of time etc.

Did I mention how much I love it?

Gitlab is surprisingly good for that. I keep backup schedules in it executed as CI pipelines. Gitlab runner executes my ansible custom code in docker container using image prepared for that purpose.

Thanks for sharing the feedback with the community. We are glad to hear how GitLab helps you.

The backup hosting service (https://www.borgbase.com/) I'm building for BorgBackup will show you the last backup of all repositories in one place. It will also alert you on missed backups.

For backing up databases, you'd need to install small packages, like `automysqlbackup`. We already provide an Ansible role for setting up automatic backups, which could be extended to apply your schedule or add "plugins".

burp (https://burp.grke.org/) comes to mind, as well. Windows and UNIX-like only, no macOS.

"For Mac users, burp is available in Homebrew. See the quick start page for more information." - https://burp.grke.org/download.html

Also, I can't imagine how something could be for Unix-likes and not target Mac OS, since it is literally a certified Unix.

Writing a backup tool and supporting an OS properly is a lot more work than just writing portable software that can run on the portable standard APIs supported by an OS. People expect their backups to preserve file attributes, access control policies, etc.

Beyond traditional POSIX file attributes like ownership and access mode, there are POSIX ACLs. Linux also has extended attributes and SELinux policies. Windows has a different ACL model, and NFSv4 ACLs on Linux are more like those. A Mac has "resource forks" which have no real analogue in Linux or POSIX systems, and a backup/restore that ignores these would be very useless to most Mac users.

While I agree, that macOS is a UNIX, backup tends to be a very sensitive form of software (think scalpell, not hammer) and I would not dare to use it on a platform without certification by its vendor (in this case an "OK" by the author).

We do not live in a perfect world. But backups can have exactly zero errors.

But thanks for pointing out, that a mac brew exists. I did not see this.

If it is targeting UNIX then it must work on Mac. Echoing your words, UNIX has stringent requirements to be UNIX and macOS fulfilled those requirements.


As a side note for fun, Linux is not UNIX and that’s intentional. Freedom is nice. ;)

Linux is not UNIX™, but it is a Unix in most meaningful senses of that term. My opinion would be that making much of the distinction would be pedantic in a negative sense; there does not seem to be a practical reason for this.

> Linux is not UNIX and that’s intentional

Isn't that the recursive acronym itself? :)


I'm currently using duplicacy and it works pretty good as far as my workload goes, has someone also tried rclone and could draw a quick comparison?

Unison is good for syncing between 2 locations. Personally I would treat DB's and other files differently.

restic with s3 back-end


Using RDS seems like a best option.

Many startups use it, and i think backup is also easy.

He didn't say, he's using AWS. Or did you mean something different?

He can migrate to AWS.

Managing database backup manually when there is something like RDS and now RDS serverless seems like a lost cause.

Unfortunately not everyone can migrate to AWS, and not everyone would want to either.

There are plenty of reasons companies still host everything on their own servers, and can't/don't want to move their infrastructure to a cloud service.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact