Hacker News new | past | comments | ask | show | jobs | submit login
Best Linux server backup system? (gist.github.com)
152 points by drKarl on March 16, 2015 | hide | past | favorite | 127 comments



I found the most important idea regarding backups was to stop searching for the one perfect tool that does it all. The most productive step was to accept that for every [group of] machine[s] and every application / scenario there are different best solutions.

Unfortunately most backup software designers seem to think in an extremely one-dimensional way about how backups should work and I do not know about one tool that offers all the flexibility you need for real life backups.

There is really room for invention here.

IMHO BackupNinja shows the right direction: it is a meta-tool that helps you to manage several different backup strategies. This is the way to go, but it should be generalized and have an API and several GUI options (web, qt, rest).

Also some more brain could be put into the application, like a nice wizard will ask you to define how you would like to have your backup and select the right tools for you:

  [ ] source
  [ ] it is a database
  [ ] which one: ________________
  [ ] very big files (oh, I already know about this)
  [ ] Mickysoft client
  [ ] Outlook
  [ ] other crapsoft that needs special handling
  [ ] look up the plugin repo for the best way to handle this
  [ ] Linux / *BSD
  [ ] destination
  [ ] encrypt backups
  [ ] versioning
  [ ] frequency
  [ ] make backup files browsable by filesystem tools
  [ ] also browsable for users (readonly)
  [ ] make backups available via samba
  [ ] make backups available via nfs
  [ ] make backups available via web gui
  [ ] where to browse: ________________
  [ ] import csv with usernames
  [ ] auto-generate login link for users (no future support hassle)
  [ ] decide which is the best (set of) tool(s)
  [ ] and just do it and let me do the real work
Certainly there is some more to it, but maybe you get the idea.

If you are bored and do not know which should be your next project, please release the world from all these backup pains (and wasted hours and weeks) and build it, thanks! Good Luck!


Sounds like your vision for want you want out of a backup system seems to be almost identical to mine - I would add one thing - it would be lovely if this process could store it's configuration for each backup item / set in YAML.

I think YAML based configuration would make it very easy to work with once created and offer ends less possibilities of integration and generation from/to other applications.

If someone made this - I would donate them money.

Bonus points for a nice visualisation of backups and when they expire etc...


So we implemented a variant of the Bekeley Lab Checkpoint and Restore [0] on terminal.com. You can snapshot RAM state at any given moment and commit it to disk without interrupting operations. You can use it to, for example, start a Spark cluster [1] with a dataset already in memory. I've tested it with a lot of software, so I'd say it works irrespective of the application, but you're welcome to test.

When you use our snapshotting, RAM-state, cpu cache and disk-state are all captured and can be resumed in seconds later. This all happens without a hypervisor.

This sort of obfuscates the need for doing any configuration storage (you snapshot systems at an initial state and then you can bring up new machines at that initial state without config files. If you need to pass an argument to a machine on boot you can do it programmatically by passing the shell commands [2]).

[0] http://crd.lbl.gov/departments/computer-science/CLaSS/resear... [1] https://www.terminal.com/snapshot/c81e6215eba5799335a45b6936... [2] https://blog.terminal.com/tutorial-terminal-startup-scripts-...


So a metatool that is flexible in terms of backup ingest? Almost like backup plugins?

Check out Deltaic, designed precisely for this:

https://github.com/cmusatyalab/deltaic


This tool certainly looks interesting and does support several sources for the backup.

I was describing a meta tool that would act like an experienced backup admin and setup the whole process / environment. E.g. this tool would configure network shares or a webserver with authentication to access the data, that deltaic restored.

However, such a tool does not exist, so I will study deltaic when there is some more time to waste with backup research. :)


I see what you're saying...kind of like an AI or an "automation" tool/framework for backup.

I was focusing on the lower-level building blocks.


I have used BackupPC (http://backuppc.sourceforge.net/) for many years now.

Pros: it's stable, supports several different transfer methods, is completely simple (and still flexible) to admin, and it's totally reliable. It does file deduplication and compression and manages its own pool very efficiently. I've had it responsible for networks of 50+ systems before and it worked without any trouble. It has a sensible and reliable email notification system. It gets the basics really right, which for some reason seems to be a problem with a lot of other backup software. The documentation is good. The developer is friendly, has been working on it for over a decade now, and is easy to reach by email. There's a quiet mailing list with some folks that have used BackupPC for Truly Large Networks and know it about as well as the developer. Version 4.0 is pretty awesome. It has saved my butt a few times and a client's butt at least twice. Also, it's free, assuming you have a server somewhere to run it on; it's not a SaaS or PaaS or YaWaaS, so there's no monthly cost.

Cons: it doesn't do encrypted backups. It's written in Perl, so that's probably a deal breaker for some people who don't know any better. Initial setup can be a bit of a pain, especially if you're new to it. The web interface doesn't use jQuery or LESS or CSS3 transitions or a lot of stock photography, so some people might find it scary-looking. It doesn't hold your hand, you'll have to be comfortable with the CLI every once in a while if you need to do something fancy (like, say, restore a batch of files to their original locations using a text file as input -- which I've done with it, btw). It won't make you coffee in the morning.


Added. I only included the cons, and of those only the lack of encryption. Most (or all) of the solutions in the list are CLI...


BackupPC is great for multiple servers, as it does de-duplication.

Backup data rests on a seperate, encrypted partition.


Hello, I'm the author of Snebu (the first item mentioned on the list) -- I've been trying to figure out how to get more exposure to it before I let it fly mainstream (i.e., submitting packages to the various distros, etc).

On the complaint that it doesn't do encryption -- That is one item that I'd like to seek some advice on. My plan is that if you want encryption, then use a LUKS encrypted filesystem on the target (communications to the target is already encrypted with ssh). The main reason is that I'm not a cryptographer, and even if I use existing libraries there is still a strong chance that I'd miss something and end up using them wrong. Just for example -- to do it right, you would add unique salt to each object you are encrypting (from the client side). That would then render any type of deduplication useless on the backend side, since multiple files with the same contents would have different encrypted contents.

That being said, I am adding code that lets you have replicas to other storage devices (tape, cloud, other disk based storage). So you would do your primary backup to a local disk device (that possibly has an encrypted volume), then the secondary stage would be packing files together into an (encrypted) object, for sending to a remote location.

I've got a small list going, after I redo the web site I'll put up a planned feature list along with comparisons with other backup utilities. If anyone has ideas to contribute, either drop me an email or open an issue on the Github page.

Thanks.

Edit: On the encryption side, am I correct in thinking that multiple files with the same contents should encrypt to different streams (via a random salt)? Also, should the file names themselves be encrypted? Finally, metadata, such as mod date, file size, and checksum -- should that all be encrypted too? Thanks.


>if you want encryption, then use a LUKS encrypted filesystem on the target

If you control the server the main purpose of encryption is to protect the backups even if somebody gains access to the backup server. An encrypted filesystem isn't a good solution because by design the backup server has the key. This only protects against intruders that are foolish enough to power down the server. As long as the server is running, an encrypted filesystem offers no protection at all.


Encryption can be implemented flexibly on systems you control. When it come to encrypting backups, the toughest issue is doing it safely on target systems you don't control (duplicity attacks this problem well). I'd recommend starting there. Encrypted replication is a very interesting idea, and actually helps with the problem of maintaining local & remote backups. I usually stagger them, but being able to replicate to arbitrary targets would be very convenient (I don't like to rsync backups after the fact, because you risk corrupting the target if there is a local failure).


One of the items I struggle with is that I want to keep the current client-side simplicity of Snebu -- on the client, the only thing that is required is bash, find, and tar (although GNU find is required, and older versions of find don't support enough options). However, I think I have a potential solution, which I'll work on this weekend.

The solution is: on the client side, include an optional encrypting tar filter, which takes a tar file as input, and encrypts the file contents of each file within the archive, delivering a tar file output with regular headers, but compressed encrypted files. I'll probably have to take some liberties with the tar file format, but as long as the snebu backend is similarly modified (to recognize a pre-compressed and encrypted file segment), then I should be able to get it to work without any major compromises.

The only issues with my current plan are: 1) Multiple files with the same contents will still be de-duplicated, which may leak some information (if an attacker already knows file A's contents, and B is marked as being a copy of A, then the attacker will know what file B is). 2) The metadata (file name, size, owner, possibly SHA1) will still be visible. Although I may just use the SHA1 of the encrypted version as the file reference. 3) Sparse files will still show where the "holes" in the files are, unless you tell it not to preserve file sparseness.


I like your solution of sending to an already encrypted volume. Let another dedicated crypto program handle that. I'll definitely look at this.


Here's a slight variation -- Normally, the client side runs a "find" command to generate a file manifest, and sends it to the "snebu" command on the server side. Which then generates file list of files that it doesn't already have. The client then responds with a tar command output of those files, sending that back into the "snebu" command on the server (all via ssh).

The variation: Run snebu locally, and have the file vault loaded on an iscsi device targeted to the remote backup server. Then you can run LUKS locally, and not have to worry about trusting the remote server. Just make sure you send a copy of the sqlite database file (containing the backup catalog) to the remote volume after a backup. However, this still doesn't address having a remote encrypted volume that multiple clients can send to (in that case, you will need to do the other approach of sending to a local server under your control, then replicating it with a plugin that supports encryption [to be developed yet]).

Note, this method should work for any of the other backup utilities that don't support encryption also.


I honestly can't understand why distro makers like Cannonical don't have some out-of-the-box solution that works like Timemachine. Every step one has to configure himself can lead to a mistake and thus to a potential data loss. IMO an OS without a very-simple-to-set-up backup system that supports at the very least remote storage, incremental backup and restore during reinstallation, is incomplete.


Well, there is something; assuming that you refer to an Ubuntu Desktop backup engine with a simple interface, there is Deja Dup.

The problem is though, that the last time I've checked, it was excessively simple- it didn't support individual files selection.


It shouldn't be desktop-only. It could be a well tested script with a good CLI that also has a simple GUI. Something like a backup shouldn't have many options anyway. Apple has the right ideas there IMO: target volume selection and defining backup exceptions are pretty much all you need. I'd prefer if the target were simpler than a sparsebundle though, which should totally be doable with an rdiff or rsync backed script solution.


Deja Dup is a GUI for duplicity.


I've been using duplicity for a few years now and have been quite satisfied. It did seem slow when I first started using it. I learned to live with it; you run backups overnight anyways.

From what I can remember, a lot of the slowness came from GPG, and specifically files being compressed before encryption. Disabling compression speeds things up, but trades off disk space (and security).

I wonder what it'd take to reengineer the thing to take advantage of multicore -- being CPU-bound on a single core is I think what makes it slow.


After having a look at duplicity [1], it actually reads as pretty decent. I still wonder why research is needed for such a thing, instead of Cannonical taking it, installing a good default script, adding some GUI in their installation image (choosing your sftp backup target, username, password (managed by keyring)), so everyone would already be on the right path.

[1] https://help.ubuntu.com/community/DuplicityBackupHowto


Something like zfsnap? (https://github.com/zfsnap/zfsnap) No special GUI but you can just browse to /.zfs/snapshot/<timestamp> and manipulate the files like normal files.


I'm a bit confused by this. So is ZFS a requirement or not? Do I have to reformat an existing Linux to ZFS in order to use it? Or is that only for the target volume? Even OSX, which it claims to support? I've never seen anyone using ZFS in production there to be honest.


Yes ZFS is a requirement, and it is designed to run on any OS that runs ZFS. You don't need the full machine to be running ZFS, just whatever volume you want to snapshot.

It is a bit unclear if you haven't been looking at other snapshot management utilities. Many other tools have various dependencies which may or may not work on your chosen operating system.

So generally the requirements are:

  - ZFS running on the volume you want to snapshot
  - Bourne shell
  - Gregorian calendar
As far as OS X goes, I'm not sure how many people run ZFS in production but I know there were some people working on it. I've never used it but it still seems to be updated: https://github.com/openzfsonosx/zfs


I would also add that Tarsnap is probably the best option if you only need to backup a small size of data (a few KBs, a few MBs or even a few hundred MBs), since you have top-notch encryption with the picodollar pricing and stellar dedup you will pay much less than with any other provider as you pay per usage. For instance, rsync.net is a good destination (can be used with some of the tools in the list, like duplicity or attic) but you need to buy at least 50GB so for small backups it's not worth it.


We love tarsnap and think it is wonderful. We want to live in a world where people like Colin are selling things for picodollars.

Our strong suit is the ability to point any SSH tool you like at our storage (rsync, mostly, but some folks point duplicity or unison).[1]

Also, as we run on ZFS and have daily/weekly snapshots enabled by default, you can just forget about incrementals or versions or datasets ... just do a dumb mirror to us every day and we'll maintain, live, browseable, in your account, what is essentially an offsite "Time Machine".

We have an HN readers discount. Email and ask about it.

[1] http://www.rsync.net/resources/howto/remote_commands.html


Yes, and I think your service is great. Just saying that it only starts to make sense when you need to backup 50Gb or more... If you just want to backup like 10Mb, or 200Mb or even 5Gb... On the other end, the more you need to backup the more sense it does, since your prices go down per Gb.


You could add rsnapshot (http://www.rsnapshot.org/) to your list.

While you might disqualify it due to lack of built in encryption, you should note that given that one of your comments implies you control the server where the backups enter cold storage that you can encrypt the disk/array where the backups are stored independent of the backup tool used to copy the data to the backup server.


Thanks, that is definitely an option, but I'd say that built-in encryption is a big plus.


We've been using rsnapshot for years.

The offsite backups are on USB drives that are encrypted.

We just created a shell script to automate that, and run the database dumps prior to running rsnapshot.


I have been using ZFS and rsync for around a decade. I wrote management tools on top of it so that it is maintenance-free, integrates with Nagios, and has a web UI including recoveries. https://github.com/tummy-dot-com/tummy-backup

It uses ZFS to do the heavy lifting of managing deltas and deduplication and rsync to do the snapshots. Combined with backup-client (https://github.com/realgo/backup-client) it can run as non-privileged users and trigger database dumps or snapshots, LVM snapshots of virtual machine instances, etc...

We have had it running internally on clusters of 10 backup servers, and several external backup servers over ~a decade, and it has worked very well.


As an aside, this started a decade ago as a personal backup script using rsync and hardlinks for a few personal machines. But hardlinks really start falling apart once you get a lot of machines, say more than 10, or large systems with lots of files.

At one point we switched to BackupPC, but ended up switching back to this after around a year. BackupPC implemented its own rsync code, which (maybe this is fixed now) didn't support incremental file-data transfers in the rsync version 3 protocol, so large systems could take hours to build the file index and then hours to re-walk the file-system to send the data. Larger systems were taking longer than 24 hours and tons of IOPS to backup.

It also wasn't very efficient, when we switched back to ZFS, we consolidated 4 Backup PC servers down to one with ZFS holding the same data. The biggest issue there was log and database files, big files that had small changes resulted in the whole file getting stored multiple times. Particularly Zope ZODB files killed us, they are append-only and we had users with 2+GB files that had small changes every day.


Another aside, it looks like BackupPC 4, which is in alpha state, includes rsync v3 code and a new back-end format, so it probably will be much improved in all regards.


I apreciate that you created a user in HN to comment on this thread! Thanks for your input!


"Long time listener" and all that. Finally had something to say.


Back in 2005 I've used Bacula to setup a distributed backup system for a big company. I don't work there since years, but AFAIK it still runs well and manages around ~50 (both servers and desktops running different operating systems).

Yes it's hard to approach (we spent ~2 weeks to learn and test it before deploying in production) but it had the features we needed:

* scheduling

* retention policies (store for 1 year then rotate, multiple tapes, etc...)

* backup on DST tapes

Perhaps, if you just need to backup a single server, Bacula isn't the right solution or, at least, is overkill :-)


Yep, Bacula looks like an option if you have to centralize the backup of a lot of servers, but it looks it's more oriented to tape backup, isn't it?


As far as I remember, you can use file storage and set the size of the file itself as well. so, for example, you might have 4GB file images to burn on dvd. Of course, it really shines with tape backups :-) One nice feature of multi-volume backups in Bacula is the fact that it remembers (through its Catalog) where to find a backup, given the file name and/or a date. So it might ask you: "Insert volume X-Y" to restore the data you need.


I use and have been using Amanda Network Backup for ~5 years now, and am extremely happy with the results. Some of the highlights of my decision to use this tool include:

- Uses native OS tools (tar, gzip, gpg in my case) for the actual backups and restores, and includes the actual command used to create the archive in the header of the archive! You can use this to restore it without Amanda in the event of an emergency.

- Supports on-disk backups in a holding area for quick restores

- Supports S3 as a virtual tape library

- Supports vaulting (i.e. moving an archive from one tape library to another)

- Your choice of client, server, or no compression & encryption

- Highly scriptable

- Works over SSH, among other methods

- Catalog data is easily backed up itself via simple OS commands, and stored in S3.

The systems I backup are all in AWS, so this is ideal for me. I've frequently thought it would be ideal to adapt Amanda's script agents to creating EBS snapshots, but I simply haven't had the time. It's on my someday-maybe list. Remember to vault your backups to another region!

(Edits: formatting)


My experience with attic (seems to be a trend with all of your dedup reviewed systems) is that it also takes a very long time to restore large amounts of data _however_ you have the option of restoring individual files via a FUSE file system which is _immensely_ useful.

An example was that a restore of ~200 GiB of VM snapshots took over a day from a NAS to the server in question.

Usually, the backups take about an hour to write out, so reading data from attic does take significantly longer.

This is probably because dedup is non-trivial to restore from (it can involve lots of random reads/disk seeks).


You have obnam: `Really slow for large backups (from a benchmark between obnam and attic)` , but then, no section for attic itself.

It's probably also wise to roughly group the backup systems by algorithm-class (e.g. separate rsnapshot from rdiff-backup from duplicity from attic/obnam/zbackup) since they result in different bandwidth and storage properties. Duplicity will need a full re-upload to avoid unbounded growth of both storage and restore-time, but such a re-upload is prohibitive for DSL users.


Good suggestion, I'll try to do that grouping.


http://www.jwz.org/doc/backups.html

i've been using the above as a really half assed way of doing backups on a server. using a nas instead of a usb hdd. very glad to benefit from the experience of others here.


If you're using Linux, it's something a lot like that. If you're using Windows, go fuck yourself.

: )

jwz's method is fine for disaster recovery, but it doesn't work for recovery from other kinds of errors (including human) since it only saves the last copy of the files.

In our case, the most important backups are the databases (servers can be rebuilt from config management), and having past copies definitively helps.

With rdiff-backup, we can restore the databases from any day for the past couple of years, and since it's incremental, it doesn't really take up much space.


I use rdiff-backup as my main backup solution. It's efficient and restoring from the latest backup is reasonably fast and simple. But I've found that the further you go back in time, reconstructing the file from all the deltas takes so long I don't consider it useful as an archiving solution. At least that seems to be true for large and/or complex collections of files. Have you actually restored large databases from years ago in less than a few hours?


We usually only restore revisions a few weeks old. That said, rdiff-backup works by taking the current mirror file and applying past diffs, so you could probably keep an older mirror (and current_mirror file from the rdiff-backup-data directory) to speed up the process.


I use OpenVPN to tunnel iSCSI on a remote machine. Then I mount the iSCSI device using LUKS. Finally I rsync into the LUKS mount point. Encrypted, incremental, bla bla. Works pretty well. Here's the script:

    zx2c4@thinkpad ~ $ cat Projects/remote-backup.sh 
    #!/bin/sh
    
    cd "$(readlink -f "$(dirname "$0")")"
    
    if [ $UID -ne 0 ]; then
            echo "You must be root."
            exit 1
    fi
    
    umount() {
            if ! /bin/umount "$1"; then
                    sleep 5
                    if ! /bin/umount "$1"; then
                            sleep 10
                            /bin/umount "$1"
                    fi
            fi
    }
    
    unwind() {
            echo "[-] ERROR: unwinding and quitting."
            sleep 3
            trace sync
            trace umount /mnt/mybackupserver-backup
            trace cryptsetup luksClose mybackupserver-backup || { sleep 5; trace cryptsetup luksClose mybackupserver-backup; }
            trace iscsiadm -m node -U all
            trace kill %1
            exit 1
    }
    
    trace() {
            echo "[+] $@"
            "$@"
    }
    
    RSYNC_OPTS="-i -rlptgoXDHxv --delete-excluded --delete --progress $RSYNC_OPTS"
    
    trap unwind INT TERM
    trace modprobe libiscsi
    trace modprobe scsi_transport_iscsi
    trace modprobe iscsi_tcp
    iscsid -f &
    sleep 1
    trace iscsiadm -m discovery -t st -p mybackupserver.somehost.somewere -P 1 -l
    sleep 5
    trace cryptsetup --key-file /etc/dmcrypt/backup-mybackupserver-key luksOpen /dev/disk/by-uuid/10a126a2-c991-49fc-89bf-8d621a73dd36 mybackupserver-backup || unwind
    trace fsck -a /dev/mapper/mybackupserver-backup || unwind
    trace mount -v /dev/mapper/mybackupserver-backup /mnt/mybackupserver-backup || unwind
    trace rsync $RSYNC_OPTS --exclude=/usr/portage/distfiles --exclude=/home/zx2c4/.cache --exclude=/var/tmp / /mnt/mybackupserver-backup/root || unwind
    trace rsync $RSYNC_OPTS /mnt/storage/Archives/ /mnt/mybackupserver-backup/archives || unwind
    trace sync
    trace umount /mnt/mybackupserver-backup
    trace cryptsetup luksClose mybackupserver-backup
    trace iscsiadm -m node -U all
    trace kill %1


Your backup script has no exit code checking and relies on sleeps!


What I would like to use is some kind of small programs from which backup script might be crafted. Actually most of the tools are available. Incremental backups with rsync. Encryption with openssl. Archiving/compression with tar/gzip/7-zip. But scripts will be quite verbose and error-prone. Better solution should be available.


Yes, that is another option, but that would be in the category "roll your own". You can start rolling your own with some shell scripts, or with some python, etc

Actually I think this is how most solutions started...


Seconding obnam. Speed it up with:

lru-size=1024

upload-queue-size=512

See http://listmaster.pepperfish.net/pipermail/obnam-support-obn...


Using /dev/urandom instead of /dev/random help, too. http://listmaster.pepperfish.net/pipermail/obnam-support-obn...


This makes me highly suspect about the encryption implementation of obnam.


Obnam uses gpg for encryption.

The only novel thing is its use of symmetric encryption keys which are used to encrypt the data and are included in the backup repository, encrypted by your regular private gpg key. This allows giving additional gpg keys access to the backup after it has been made. http://liw.fi/obnam/encryption/

(Which is useful for eg, backing up a server. You can make a dedicated gpg key for that server, but give your personal gpg key access to the backups to restore later.)

Anyway, the generation of the symmetric encryption key is what needs an entropy source. AFAIK this is done once per repository.


Yes I see this now. It wasn't as obvious at first glance. However, I'm surprised this issue with random wasn't caught much, much sooner.


Thanks, added this info to the list


If you are looking for a way to drive it all, this is working very nice for me: https://github.com/meskyanichi/backup


Would it be comparable to BackupNinja?


Yes but simpeler imho


Attic seems good. Its encryption makes me uneasy (https://github.com/jborg/attic/blob/master/attic/key.py), and I don't feel qualified to review it:

  * use of pkbdf2(passphrase)[0:32] as encryption key?
  * is AES correctly used? there are many pitfalls
I would be much more comfortable with it using gpg for encryption.


Using gpg is not so awesome. Obnam uses gpg. Since Obnam invokes gpg in batch mode, if you want to have a passphrase, you have to use gpg-agent, which at least for me took more effort than I found reasonable to set up on a GUIless server.

Furthermore, all the crypto config depends on gpg defaults or your gpg.conf. Whether this is good or bad depends on whether you are OK with gpg's defaults that are chosen for a non-Obnam use case and whether you like tweaking gpg config.

While figuring this out, I started wishing that Obnam used libsodium instead of gpg to avoid configuration and especially gpg-agent. (libsodium didn't exist when Obnam was created.)


you have to use gpg-agent, which at least for me took more effort than I found reasonable to set up on a GUIless server.

Did you try Keychain¹? I've used it in the past to auto-sign deb packages, and it was simple to set up.

¹ http://www.funtoo.org/Keychain


I didn't.

Having to be aware of tools like this is the problem when you face the requirement of having to set up gpg-agent and you don't already know how to do so in an environment where a desktop environment from your distro hasn't done it for you.


Duplicity allows using gpg in a rather painless way:

  PASSPHRASE="myBackupGpgKeyPassphrase" duplicity ...
I use one gpg key per machine for backups, so having the passphrase in cleartext on that machine is not much of a problem.


Good point, and I'm no cryptologist either to find any flaws on it or to verify its robustness... Maybe someone in the community has the required knowledge about encryption to say something about this?


I tried to find the best solution (simple, secure, incremental, reliable...) and could not find the perfect candidate either.

I finally ended up by sending encrypted gpg tar files to a remote backup machine (you may want to cross-backup machines) (without using any intermediate file)

(sorry if formatting breaks)

#!/bin/bash #

# list of local directories to be backuped DIRS="/home /media"

# destination directory target DEST="/data/backups"

# gpg encryption email GPGEMAIL="homer@example.com"

# remote ssh as user@pachine REMOTESSH="homer@backup.example.com."

# remote ssh additional args REMOTESSHARGS="-i /root/.ssh/id_backup"

for i in ${DIRS} ; do if test -d "$i" ; then

f=${DEST}/$(echo $i | tr '/' '_' | sed -e 's/^_//').tgz.gpg tmp=${DEST}/_tmp

echo "backuping $i to remote $f encrypted with pgp" >&2 /bin/tar cf - ${i} \ | /usr/bin/gpg --quiet --batch --encrypt --compress-algo zlib --recipient ${GPGEMAIL} -o - \ | /usr/bin/ssh -p ${REMOTESSHARGS} -o BatchMode=yes -o Compression=no ${REMOTESSH} \ "cat > $tmp && mv $tmp ${f}"

else echo "error: $d does not exist" >&2 fi done


If you mention Bacula then you also need to mention Amanda http://www.amanda.org/


oh the countless time I've wasted on Amanda - it's just too old now.


While for many, backup is better on disks these days, tape is far better for archival and Amanda does this quite well for Linux.


Misses IBM Tivoli TSM and Legato Networker, backup systems that work in practise and don't fail on you.

Other than that my vote is on ZFS snapshots reliable.


I have a few years experience with TSM and always found it to be pretty fast and very reliable. Like most IBM software, it looks overly complicated and obtuse at first glance, but once you overcome a few "standard" obstacles(1), you realize there's some pretty serious engineering under there and it's actually quite likable. It's not cheap though...

(1) IBM software seems so be designed by smart people for really hard scenarios, but it all gets buried under tons of enterprisey crap and UIs designed by monkeys...

1. Drop the excruciatingly slow and clunky web console and learn to use the CLI (really, it's horrible);

2. Resist the urge to throw your arms up in disgust because the syntax and concepts seem foreign (it probably won't look so foreign to mainframe people, and it isn't actually bad, just different);

3. ??

4. Realize the CLI and storage concepts are actually very powerful.


Thanks for the suggestions. I'm more inclined to use Open Source solutions, though, which Tivoli and Legato are not.

I agree with ZFS but I think it shines as a last resort backup (like rsync.net offers) and not as a main backup system since it doesn't do incremental and deduplication and so on...


ZFS is a also a good way to get consistent fast backups of databases by snapshotting. By ZFS copy-on-write its incremental by nature as it you only have to send over deltas incremental streams. Thus for site-to-site backups its usually very fast! Another advantage compared to rsync is that you do not need to directory traverse the whole file system in order to find the differences, I believe its similar to BTRFS in that aspect.

Then there is the feature that ZFS has checksums, so that when you write to disk you know what you get otherwise you can get corruption. RAID5-60 for me is a gamble that you can get hidden write disk errors unless there is checksums in software on a higher layer.

Always scrub your ZFS source and backup pool.


I am not sure I understand your comment, but just to be clear: we (rsync.net) have ZFS snapshots enabled in your account, by default.

You can request any schedule of days/weeks/months you want, and their (very small) space usage counts against your paid quota. The first 7 dailies are always free (don't count).

What this means is you can just do a dumb rsync to us. No incrementals, no expiring datasets, no logic at all - just rsync to us every night and our snapshot rotation schedule does the rest for you. Just browse right in (using the SSH/SFTP based tool of your choice[1]) and grab a file from 6 days ago.

Email us and ask about the HN readers discount for new customers.

[1] Like filezilla, or SSHFS, or our windows drive mapper


Snapshots can be incremental, right? Just don't 'chain' too many snapshots of snapshots.


It's admittedly been 8-10 years since I was responsible for any serious backups so my experience is quite out of date, but Legato Networker is at the end of the day the best backup solution I've ever used. Yes it costs quite a bit of money and it's not trivial to set up, but once you've got it working it works so much better and easier than anything else I've ever used.


Using bacula with a twin cabinet Quantum tape library with two LTO4 tape drives (all fibre channel), looking to replace with an equivalent cabinet with LTO6. Bacula really has a difficult learning curve but the features are equivalent to other enterprise-grade pieces of software like networker, etc.


I'm surprised nobody has mentioned Back In Time[0], especially since a lot of people have mentioned rsync.

Back In Time is basically just a GUI wrapper around rsync that manages snapshots for you. I used to have an rsync script that did the same thing manually, but it's great to have a tool that manages the snapshots for you so you don't have to remember to update the symlinks individually.

Unlike rsync[1], Back In time does incremental backups, so you save on storage space if your files don't change often.

[0]https://wiki.archlinux.org/index.php/Back_In_Time

[1] by default, that is - you can use rsync to achieve incremental backups, which is what BIT does


Has anyone tried BareOS? - http://www.bareos.org

It has a modern looking web-ui too: http://www.bareos.org/en/bareos-webui.html


My personal recipe: rsync -avPH --delete-before

And then make hard-links of all files to another folder, named BACKUP_yymmdd

This way, the backup is incremental, and you have snapshots of older versions of the backup, where there is structural sharing between snapshots.


That is similar to what rsnapshot does.


But that is not encrypted, right?


If you run it over ssh it is. Or do you mean stored in an encrypted way?


Both securely transmited (ssh), stored in an encrypted way, but it's better if the encryption happens in the client, not in the server.


We're using bacula for backing up ~20 servers and VMs onto an LTO5 tape library. While it might be difficult to set up initially, that setup has now been running for over ten years (modulo changes accomodating new hardware).

It does provide lots of flexibility, though, you can restrict who gets to restore which files onto which servers, for example, or the ability to backup to different targets.

Regarding clock sync, that hasn't been a problem for ages: “Note, on versions 1.33 or greater Bacula automatically makes the necessary adjustments to the time between the server and the client so that the times Bacula uses are synchronized.”


How are you backing up VMs? Using some kind of hypervisor API, or just run bacula file in VMs themselves?


Currently, we just run bacula inside the VMs, which has the benefit that VM owners can simply use bconsole inside the VM to restore files from a previous date.


I'm quite happy with duplicity for backing up to S3. It's incremental backups that fits in the with the S3 charging model, so you only pay for restores. I did find that it's important to set it to do a full backup from time to time, e.g. once per month. Otherwise, a restore will take very long and also more expensive. I discovered this the hard way, by doing my second test restore 7 months after the last full backup. Fortunately it was a test restore. I use the duply wrapper script to make setting it up even simpler.


Yet another option is the stateless desktop, similar to mobile devices (and actually to some degree Windows 8+). The ability to do a reset/refresh without affecting user data or apps; and full system reset where all updates and user data are removed, e.g. returning to an OS state defined by a read-only image (which itself can be atomically replaced for major upgrades).

What makes things a pain on Linux is applications have parts strewn all over the filesystem. My stuff has a /home, why isn't there an /OShome, and an /Appshome?


> What makes things a pain on Linux is applications have parts strewn all over the filesystem.

All of them reside under `/usr`. At least with my distro.

> My stuff has a /home, why isn't there an /OShome, and an /Appshome?

With monolithic repositories,* how would you code it?

* Ie. system and apps come from the same source, unlike eg os x where the distincion is clear.


Interesting overview, some have to look into. But nobody/nothing mentions the option of the application handling the data you want to backup, to perform the backups.

Imho it's ideal to have the application handling the data also create the backups. And then transport them to remote locations via any means viable or any of the in the article mentioned backup systems again. Though having the application back things up vs backup software, i tend to think there is less tuning required, less chance on external locks etc etc.


That is assuming there is a single source of data in need of backup and that the data is generated/handled by an application. Data could be from many different sources and formats: databases, images, video, audio, pdf, text...


That is true. I did assume :)

But the same applies to that. These 'other data formats' from other sources too, could also be replicated from the software handling them, assuming again! :o)

edit

Speaking from a perspective where i have seen many parties that went wrong with their '3rd party backup application', i tend to try to replicate most from the application themselves to other (safe) locations, then to rely on a 3rd party app which got setup in 2005, but in the meantime received little attention. Hence i prefer to look at the application generating/receiving/handling the data.


I backup using rsync some important files to a local and remote server(s) daily in a TGZ archive and keep the last 14 days, last 10 weeks (1/week), last 11 months (1/month), and 1 yearly forever.. Very much based off of this: https://nicaw.wordpress.com/2013/04/18/bash-backup-rotation-... ... appears to work pretty good for a small amount of data.


When I tried duplicity it worked great for all of my servers but one. I am not sure what caused the issues on the one server, possibly clock skewing, but backups kept consistently getting corrupted. I'm now a happy bup user, and as far as I'm aware they're working on pruning old backups.

Attic seems promising. Does anyone have any experience of that versus bup?


Attic has still problems for large backups (when the block index gets >2gb). There are also several minor issues due to the tool being relatively new (file selection is lacking). Check it's issues page: https://github.com/jborg/attic/issues/


To work around some of attic's more UI-level shortcomings, I made a wrapper script called atticmatic that adds a declarative config file, excludes file, etc:

http://torsion.org/atticmatic/



Recently collapsed about 2.5 TB of nightly backups into 30 GB. So far it has been an amazing tool.


One thing to mention is that Duplicity supports S3 as an endpoint.

So does Arq (OSX backup software), as well as Glacier.

This makes large-size backups very cost-effective.

I mean, some of the newer ones (e.g. Attic, Bup) certainly do look good, but you need a full-fledged Linux server at the other end, as opposed to being able to just shove it into S3/Glacier - this to me is a drawback.


I had issues with Duplicity and Glacier files in my S3 buckets - Duplicity thought they were regular S3 files, tried to fetch, and failed. This was a while back, though, so I'd imagine it's fixed.


Spideroak may be an option, it has has a headless mode.

https://spideroak.com/faq/questions/67/how_can_i_use_spidero...


Problem with Snebu: it only does file-level deduplication I think which means storing something like virtual machine snapshots or images will not deduplicate well.

You need block-level deduplication to work well with things like VM images.


That is correct -- at this time it is only doing file level dedup, due to overhead and complexity of block level (or better yet, variable block level) deduplication. The file level dedup is done by computing the sha1 checksum of the file, and using that as the name to store the file contents under.

I've got a solution in the works specifically for KVM images, that hopefully I'll be able to finish up the next time I get a week off of work. The way I'm planning on handling that (at least for LibVirt based VMs) is to use libguestfs to create an equivalent to the `find` command (with all the -printf options that it supports), and to generate tar file output of selected files from the VM. (Snebu is designed around only requiring find/tar on the client side -- so anything that can produce this output will work). Although I don't like that libguestfs actually fires up a VM in order to work with the image files -- I may work on an approach to read the qcow2 file images directly. Again, looking forward to a vacation this spring so I can pound out that module.


As some commenters mentioned, a meta-backup system that can manage multiple different ingest methods would be Deltaic:

https://github.com/cmusatyalab/deltaic


I have been using backup2l together with rsync for ages and it has been working great. It doesn't natively support encryption though.

http://backup2l.sourceforge.net


Added, with the drawback of lacking encryption


The way we always did was to add a custom driver and insert a few lines of gpg calls.

So while yes, there's no "native support" it's definitely not hard to add.

While this is just a random google result[1] as I don't have access to the aforementioned snippet, you'll get the idea I hope.

[1]: http://www.iniy.org/?p=151


I've been using backup2l with a custom driver that just calls "tar | gpg". I wouldn't flag it as "lacking". Backup2l can be used with any archiver.

If duplicity had an option to chose a backup ancestor, you could use backup2l with duplicity.

backup2l is strikingly simple and very effective at managing restores from it's incremental backup hierarchy.

In one of my tests (more than 2 years ago), the backup efficiency of daily, 3-level hierarchical tar (such as backup2l) vs Duplicity for a 1 year period was only 15% greater yet giving much more redundancy in case of failure.

If you consider how simple archives are and can be restored in case of problems, it's almost a no brainer.


Backup Ninja - https://labs.riseup.net/code/projects/backupninja

And it has a neat curses based GUI called NinjaHelper


It seems it is more a way of integrating different backups, which can be useful of course. For instance, it seems it can use Duplicity. Is that so?


Yes indeed, I've found it quite useful as a reliable and easy to configure 'metabackup' system if you will.

That's correct, Backupninja can use Duplicity as a backend: https://labs.riseup.net/code/projects/backupninja/wiki/Dup


Still one of my all time favorite approaches is the DIY-style snapshot-based system with rsync and hard links, which keeps snapshot sizes small:

This example keeps the days worth of filesystem backups:

  rm -rf backup.3
  mv backup.2 backup.3
  mv backup.1 backup.2
  cp -al backup.0 backup.1
  rsync -a --delete source_directory/  backup.0/
More details: http://www.mikerubel.org/computers/rsync_snapshots/

Trinkup is a script that automates this approach, somewhat like rsnapshot:

https://gist.github.com/ei-grad/7610406


I'm a big fan of the Ruby Gem Backup: https://meskyanichi.github.io/backup/v4/


How about adding btrfs send/receive to your list?


Added, do you have some more info on that one, drawbacks, performance?


De-dupe, will only transfer blocks changed since last snapshot.

Performance, it's super damn fast. Because the ZFS snapshot itself is just a stream you can also compress it on the wire.

Flexibility. You don't need to send the snapshot to another ZFS server. You could just take the ZFS snapshot streams and store them on say S3 compressed and re-assemble them. This would take custom tooling but it's definitely possible.

You can restore super quickly to an older version if you haven't deleted the local snapshots.

Similar to above but if you don't want to restore (as in make said snapshot the current active dataset) you can create a writable clone of one of the snapshots to play with it before committing to a restore or just to "go back in time" and play with something.

As a bonus you also now have your data on ZFS which has tons of great benefits apart from the point in time snapshots.


I'm using zfs-auto-snapshot[1] and zfs send/recv to backup my work desktop machine to a remote server. The naive approach of just doing 'zfs send -R -I snap1 snap2 | ssh remote zfs recv -dF pool'[2] has a number of drawbacks:

* Assumes remote is available 100% of the time. Recovery from downtime at the remote probably requires some manual intervention to get things back in sync.

* If the remote is in a different physical location with limited bandwidth between the hosts, compressing the stream on the fly isn't going to be particularly efficient with bandwidth.

I've built some scripts to help[3], dumping the necessary incremental streams into a local directory and then compressing them with lrzip[4]. This decouples the host and remote systems and the zfs streams compress really well: a 33Mb incremental stream I have here compresses to 4.4Mb with lrzip. Once you have a directory of compressed streams you can push them wherever you want (a remote server where you convert them back into zfs snapshots, giving you a live filesystem; S3 buckets etc.) You also are able to restore using standard operating system tools.

I'd assume btrfs is comparable, but haven't tried it myself.

[1]: https://github.com/zfsonlinux/zfs-auto-snapshot

[2]: see https://github.com/adaugherity/zfs-backup for example

[3]: currently in my local fork of zfstools at https://github.com/mhw/zfstools/blob/master/bin/zfs-save-sna...

[4]: http://ck.kolivas.org/apps/lrzip/README


I updated the gist with some comments and additional solutions, and added Markdown formatting for readibility on mrcrilly's suggestion.



It's still in alpha, but looks interesting, I'll keep an eye on it!


Attic seems most promising if it wasn't for the python 3 requirement and me still being on RHEL6.


What about installing Python3 manually? http://stackoverflow.com/questions/8087184/installing-python...


Problem with installing anything manually is patch management. And we take patch management very seriously.


I like using rsnapshot. I think it's a great rsync wrapper.


I'd suggest checking out Flexbackup (http://flexbackup.sourceforge.net/). It's super simple (always a plus), I've been using it for many years without fail and you can integrate encryption (http://rolandtapken.de/blog/2011-01/encrypted-files-flexback...).


I believe the answer is FreeBSD.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: