
Best Linux server backup system? - drKarl
https://gist.github.com/drkarl/739a864b3275e901d317
======
UserRights
I found the most important idea regarding backups was to stop searching for
the one perfect tool that does it all. The most productive step was to accept
that for every [group of] machine[s] and every application / scenario there
are different best solutions.

Unfortunately most backup software designers seem to think in an extremely
one-dimensional way about how backups should work and I do not know about one
tool that offers all the flexibility you need for real life backups.

There is really room for invention here.

IMHO BackupNinja shows the right direction: it is a meta-tool that helps you
to manage several different backup strategies. This is the way to go, but it
should be generalized and have an API and several GUI options (web, qt, rest).

Also some more brain could be put into the application, like a nice wizard
will ask you to define how you would like to have your backup and select the
right tools for you:

    
    
      [ ] source
      [ ] it is a database
      [ ] which one: ________________
      [ ] very big files (oh, I already know about this)
      [ ] Mickysoft client
      [ ] Outlook
      [ ] other crapsoft that needs special handling
      [ ] look up the plugin repo for the best way to handle this
      [ ] Linux / *BSD
      [ ] destination
      [ ] encrypt backups
      [ ] versioning
      [ ] frequency
      [ ] make backup files browsable by filesystem tools
      [ ] also browsable for users (readonly)
      [ ] make backups available via samba
      [ ] make backups available via nfs
      [ ] make backups available via web gui
      [ ] where to browse: ________________
      [ ] import csv with usernames
      [ ] auto-generate login link for users (no future support hassle)
      [ ] decide which is the best (set of) tool(s)
      [ ] and just do it and let me do the real work
    

Certainly there is some more to it, but maybe you get the idea.

If you are bored and do not know which should be your next project, please
release the world from all these backup pains (and wasted hours and weeks) and
build it, thanks! Good Luck!

~~~
mrmondo
Sounds like your vision for want you want out of a backup system seems to be
almost identical to mine - I would add one thing - it would be lovely if this
process could store it's configuration for each backup item / set in YAML.

I think YAML based configuration would make it very easy to work with once
created and offer ends less possibilities of integration and generation
from/to other applications.

If someone made this - I would donate them money.

Bonus points for a nice visualisation of backups and when they expire etc...

~~~
josh2600
So we implemented a variant of the Bekeley Lab Checkpoint and Restore [0] on
terminal.com. You can snapshot RAM state at any given moment and commit it to
disk without interrupting operations. You can use it to, for example, start a
Spark cluster [1] with a dataset already in memory. I've tested it with a lot
of software, so I'd say it works irrespective of the application, but you're
welcome to test.

When you use our snapshotting, RAM-state, cpu cache and disk-state are all
captured and can be resumed in seconds later. This all happens without a
hypervisor.

This sort of obfuscates the need for doing any configuration storage (you
snapshot systems at an initial state and then you can bring up new machines at
that initial state without config files. If you need to pass an argument to a
machine on boot you can do it programmatically by passing the shell commands
[2]).

[0] [http://crd.lbl.gov/departments/computer-
science/CLaSS/resear...](http://crd.lbl.gov/departments/computer-
science/CLaSS/research/BLCR/) [1]
[https://www.terminal.com/snapshot/c81e6215eba5799335a45b6936...](https://www.terminal.com/snapshot/c81e6215eba5799335a45b69360d14e26bac50358c369066d15b159ac704a33d)
[2] [https://blog.terminal.com/tutorial-terminal-startup-
scripts-...](https://blog.terminal.com/tutorial-terminal-startup-scripts-and-
multiverses/)

------
thaumaturgy
I have used BackupPC
([http://backuppc.sourceforge.net/](http://backuppc.sourceforge.net/)) for
many years now.

Pros: it's stable, supports several different transfer methods, is completely
simple (and still flexible) to admin, and it's totally reliable. It does file
deduplication and compression and manages its own pool very efficiently. I've
had it responsible for networks of 50+ systems before and it worked without
any trouble. It has a sensible and reliable email notification system. It gets
the basics really right, which for some reason seems to be a problem with a
lot of other backup software. The documentation is good. The developer is
friendly, has been working on it for over a decade now, and is easy to reach
by email. There's a quiet mailing list with some folks that have used BackupPC
for Truly Large Networks and know it about as well as the developer. Version
4.0 is pretty awesome. It has saved my butt a few times and a client's butt at
least twice. Also, it's free, assuming you have a server somewhere to run it
on; it's not a SaaS or PaaS or YaWaaS, so there's no monthly cost.

Cons: it doesn't do encrypted backups. It's written in Perl, so that's
probably a deal breaker for some people who don't know any better. Initial
setup can be a bit of a pain, especially if you're new to it. The web
interface doesn't use jQuery or LESS or CSS3 transitions or a lot of stock
photography, so some people might find it scary-looking. It doesn't hold your
hand, you'll have to be comfortable with the CLI every once in a while if you
need to do something fancy (like, say, restore a batch of files to their
original locations using a text file as input -- which I've done with it,
btw). It won't make you coffee in the morning.

~~~
drKarl
Added. I only included the cons, and of those only the lack of encryption.
Most (or all) of the solutions in the list are CLI...

------
derekp7
Hello, I'm the author of Snebu (the first item mentioned on the list) -- I've
been trying to figure out how to get more exposure to it before I let it fly
mainstream (i.e., submitting packages to the various distros, etc).

On the complaint that it doesn't do encryption -- That is one item that I'd
like to seek some advice on. My plan is that if you want encryption, then use
a LUKS encrypted filesystem on the target (communications to the target is
already encrypted with ssh). The main reason is that I'm not a cryptographer,
and even if I use existing libraries there is still a strong chance that I'd
miss something and end up using them wrong. Just for example -- to do it
right, you would add unique salt to each object you are encrypting (from the
client side). That would then render any type of deduplication useless on the
backend side, since multiple files with the same contents would have different
encrypted contents.

That being said, I am adding code that lets you have replicas to other storage
devices (tape, cloud, other disk based storage). So you would do your primary
backup to a local disk device (that possibly has an encrypted volume), then
the secondary stage would be packing files together into an (encrypted)
object, for sending to a remote location.

I've got a small list going, after I redo the web site I'll put up a planned
feature list along with comparisons with other backup utilities. If anyone has
ideas to contribute, either drop me an email or open an issue on the Github
page.

Thanks.

Edit: On the encryption side, am I correct in thinking that multiple files
with the same contents should encrypt to different streams (via a random
salt)? Also, should the file names themselves be encrypted? Finally, metadata,
such as mod date, file size, and checksum -- should that all be encrypted too?
Thanks.

~~~
wampus
Encryption can be implemented flexibly on systems you control. When it come to
encrypting backups, the toughest issue is doing it safely on target systems
you don't control (duplicity attacks this problem well). I'd recommend
starting there. Encrypted replication is a very interesting idea, and actually
helps with the problem of maintaining local & remote backups. I usually
stagger them, but being able to replicate to arbitrary targets would be very
convenient (I don't like to rsync backups after the fact, because you risk
corrupting the target if there is a local failure).

~~~
derekp7
One of the items I struggle with is that I want to keep the current client-
side simplicity of Snebu -- on the client, the only thing that is required is
bash, find, and tar (although GNU find is required, and older versions of find
don't support enough options). However, I think I have a potential solution,
which I'll work on this weekend.

The solution is: on the client side, include an optional encrypting tar
filter, which takes a tar file as input, and encrypts the file contents of
each file within the archive, delivering a tar file output with regular
headers, but compressed encrypted files. I'll probably have to take some
liberties with the tar file format, but as long as the snebu backend is
similarly modified (to recognize a pre-compressed and encrypted file segment),
then I should be able to get it to work without any major compromises.

The only issues with my current plan are: 1) Multiple files with the same
contents will still be de-duplicated, which may leak some information (if an
attacker already knows file A's contents, and B is marked as being a copy of
A, then the attacker will know what file B is). 2) The metadata (file name,
size, owner, possibly SHA1) will still be visible. Although I may just use the
SHA1 of the encrypted version as the file reference. 3) Sparse files will
still show where the "holes" in the files are, unless you tell it not to
preserve file sparseness.

------
m_mueller
I honestly can't understand why distro makers like Cannonical don't have some
out-of-the-box solution that works like Timemachine. Every step one has to
configure himself can lead to a mistake and thus to a potential data loss. IMO
an OS without a very-simple-to-set-up backup system that supports at the very
least remote storage, incremental backup and restore during reinstallation, is
incomplete.

~~~
pizza234
Well, there is something; assuming that you refer to an Ubuntu Desktop backup
engine with a simple interface, there is Deja Dup.

The problem is though, that the last time I've checked, it was excessively
simple- it didn't support individual files selection.

~~~
m_mueller
It shouldn't be desktop-only. It could be a well tested script with a good CLI
that also has a simple GUI. Something like a backup _shouldn 't_ have many
options anyway. Apple has the right ideas there IMO: target volume selection
and defining backup exceptions are pretty much all you need. I'd prefer if the
target were simpler than a sparsebundle though, which should totally be doable
with an rdiff or rsync backed script solution.

~~~
Ded7xSEoPKYNsDd
Deja Dup is a GUI for duplicity.

~~~
phunge
I've been using duplicity for a few years now and have been quite satisfied.
It did seem slow when I first started using it. I learned to live with it; you
run backups overnight anyways.

From what I can remember, a lot of the slowness came from GPG, and
specifically files being compressed before encryption. Disabling compression
speeds things up, but trades off disk space (and security).

I wonder what it'd take to reengineer the thing to take advantage of multicore
-- being CPU-bound on a single core is I think what makes it slow.

------
drKarl
I would also add that Tarsnap is probably the best option if you only need to
backup a small size of data (a few KBs, a few MBs or even a few hundred MBs),
since you have top-notch encryption with the picodollar pricing and stellar
dedup you will pay much less than with any other provider as you pay per
usage. For instance, rsync.net is a good destination (can be used with some of
the tools in the list, like duplicity or attic) but you need to buy at least
50GB so for small backups it's not worth it.

~~~
rsync
We love tarsnap and think it is wonderful. We want to live in a world where
people like Colin are selling things for picodollars.

Our strong suit is the ability to point any SSH tool you like at our storage
(rsync, mostly, but some folks point duplicity or unison).[1]

Also, as we run on ZFS and have daily/weekly snapshots enabled by default, you
can just forget about incrementals or versions or datasets ... just do a dumb
mirror to us every day and we'll maintain, live, browseable, in your account,
what is essentially an offsite "Time Machine".

We have an HN readers discount. Email and ask about it.

[1]
[http://www.rsync.net/resources/howto/remote_commands.html](http://www.rsync.net/resources/howto/remote_commands.html)

~~~
drKarl
Yes, and I think your service is great. Just saying that it only starts to
make sense when you need to backup 50Gb or more... If you just want to backup
like 10Mb, or 200Mb or even 5Gb... On the other end, the more you need to
backup the more sense it does, since your prices go down per Gb.

------
pwg
You could add rsnapshot
([http://www.rsnapshot.org/](http://www.rsnapshot.org/)) to your list.

While you might disqualify it due to lack of built in encryption, you should
note that given that one of your comments implies you control the server where
the backups enter cold storage that you can encrypt the disk/array where the
backups are stored independent of the backup tool used to copy the data to the
backup server.

~~~
drKarl
Thanks, that is definitely an option, but I'd say that built-in encryption is
a big plus.

~~~
ansible
We've been using rsnapshot for years.

The offsite backups are on USB drives that are encrypted.

We just created a shell script to automate that, and run the database dumps
prior to running rsnapshot.

------
linsomniac
I have been using ZFS and rsync for around a decade. I wrote management tools
on top of it so that it is maintenance-free, integrates with Nagios, and has a
web UI including recoveries. [https://github.com/tummy-dot-com/tummy-
backup](https://github.com/tummy-dot-com/tummy-backup)

It uses ZFS to do the heavy lifting of managing deltas and deduplication and
rsync to do the snapshots. Combined with backup-client
([https://github.com/realgo/backup-client](https://github.com/realgo/backup-
client)) it can run as non-privileged users and trigger database dumps or
snapshots, LVM snapshots of virtual machine instances, etc...

We have had it running internally on clusters of 10 backup servers, and
several external backup servers over ~a decade, and it has worked very well.

~~~
linsomniac
As an aside, this started a decade ago as a personal backup script using rsync
and hardlinks for a few personal machines. But hardlinks really start falling
apart once you get a lot of machines, say more than 10, or large systems with
lots of files.

At one point we switched to BackupPC, but ended up switching back to this
after around a year. BackupPC implemented its own rsync code, which (maybe
this is fixed now) didn't support incremental file-data transfers in the rsync
version 3 protocol, so large systems could take hours to build the file index
and then hours to re-walk the file-system to send the data. Larger systems
were taking longer than 24 hours and tons of IOPS to backup.

It also wasn't very efficient, when we switched back to ZFS, we consolidated 4
Backup PC servers down to one with ZFS holding the same data. The biggest
issue there was log and database files, big files that had small changes
resulted in the whole file getting stored multiple times. Particularly Zope
ZODB files killed us, they are append-only and we had users with 2+GB files
that had small changes every day.

~~~
linsomniac
Another aside, it looks like BackupPC 4, which is in alpha state, includes
rsync v3 code and a new back-end format, so it probably will be much improved
in all regards.

------
apeacox
Back in 2005 I've used Bacula to setup a distributed backup system for a big
company. I don't work there since years, but AFAIK it still runs well and
manages around ~50 (both servers and desktops running different operating
systems).

Yes it's hard to approach (we spent ~2 weeks to learn and test it before
deploying in production) but it had the features we needed:

* scheduling

* retention policies (store for 1 year then rotate, multiple tapes, etc...)

* backup on DST tapes

Perhaps, if you just need to backup a single server, Bacula isn't the right
solution or, at least, is overkill :-)

~~~
drKarl
Yep, Bacula looks like an option if you have to centralize the backup of a lot
of servers, but it looks it's more oriented to tape backup, isn't it?

~~~
apeacox
As far as I remember, you can use file storage and set the size of the file
itself as well. so, for example, you might have 4GB file images to burn on
dvd. Of course, it really shines with tape backups :-) One nice feature of
multi-volume backups in Bacula is the fact that it remembers (through its
Catalog) where to find a backup, given the file name and/or a date. So it
might ask you: "Insert volume X-Y" to restore the data you need.

------
_cudgel
I use and have been using Amanda Network Backup for ~5 years now, and am
extremely happy with the results. Some of the highlights of my decision to use
this tool include:

\- Uses native OS tools (tar, gzip, gpg in my case) for the actual backups and
restores, and includes the actual command used to create the archive in the
header of the archive! You can use this to restore it without Amanda in the
event of an emergency.

\- Supports on-disk backups in a holding area for quick restores

\- Supports S3 as a virtual tape library

\- Supports vaulting (i.e. moving an archive from one tape library to another)

\- Your choice of client, server, or no compression & encryption

\- Highly scriptable

\- Works over SSH, among other methods

\- Catalog data is easily backed up itself via simple OS commands, and stored
in S3.

The systems I backup are all in AWS, so this is ideal for me. I've frequently
thought it would be ideal to adapt Amanda's script agents to creating EBS
snapshots, but I simply haven't had the time. It's on my someday-maybe list.
Remember to vault your backups to another region!

(Edits: formatting)

------
theonewolf
My experience with attic (seems to be a trend with all of your dedup reviewed
systems) is that it also takes a very long time to restore large amounts of
data _however_ you have the option of restoring individual files via a FUSE
file system which is _immensely_ useful.

An example was that a restore of ~200 GiB of VM snapshots took over a day from
a NAS to the server in question.

Usually, the backups take about an hour to write out, so reading data from
attic does take significantly longer.

This is probably because dedup is non-trivial to restore from (it can involve
lots of random reads/disk seeks).

------
mappu
You have obnam: `Really slow for large backups (from a benchmark between obnam
and attic)` , but then, no section for attic itself.

It's probably also wise to roughly group the backup systems by algorithm-class
(e.g. separate rsnapshot from rdiff-backup from duplicity from
attic/obnam/zbackup) since they result in different bandwidth and storage
properties. Duplicity will need a full re-upload to avoid unbounded growth of
both storage and restore-time, but such a re-upload is prohibitive for DSL
users.

~~~
drKarl
Good suggestion, I'll try to do that grouping.

------
throwawayaway
[http://www.jwz.org/doc/backups.html](http://www.jwz.org/doc/backups.html)

i've been using the above as a really half assed way of doing backups on a
server. using a nas instead of a usb hdd. very glad to benefit from the
experience of others here.

~~~
icebraining
_If you 're using Linux, it's something a lot like that. If you're using
Windows, go fuck yourself._

: )

jwz's method is fine for disaster recovery, but it doesn't work for recovery
from other kinds of errors (including human) since it only saves the last copy
of the files.

In our case, the most important backups are the databases (servers can be
rebuilt from config management), and having past copies definitively helps.

With rdiff-backup, we can restore the databases from any day for the past
couple of years, and since it's incremental, it doesn't really take up much
space.

~~~
wampus
I use rdiff-backup as my main backup solution. It's efficient and restoring
from the latest backup is reasonably fast and simple. But I've found that the
further you go back in time, reconstructing the file from all the deltas takes
so long I don't consider it useful as an archiving solution. At least that
seems to be true for large and/or complex collections of files. Have you
actually restored large databases from years ago in less than a few hours?

~~~
icebraining
We usually only restore revisions a few weeks old. That said, rdiff-backup
works by taking the current mirror file and applying past diffs, so you could
probably keep an older mirror (and current_mirror file from the rdiff-backup-
data directory) to speed up the process.

------
zx2c4
I use OpenVPN to tunnel iSCSI on a remote machine. Then I mount the iSCSI
device using LUKS. Finally I rsync into the LUKS mount point. Encrypted,
incremental, bla bla. Works pretty well. Here's the script:

    
    
        zx2c4@thinkpad ~ $ cat Projects/remote-backup.sh 
        #!/bin/sh
        
        cd "$(readlink -f "$(dirname "$0")")"
        
        if [ $UID -ne 0 ]; then
                echo "You must be root."
                exit 1
        fi
        
        umount() {
                if ! /bin/umount "$1"; then
                        sleep 5
                        if ! /bin/umount "$1"; then
                                sleep 10
                                /bin/umount "$1"
                        fi
                fi
        }
        
        unwind() {
                echo "[-] ERROR: unwinding and quitting."
                sleep 3
                trace sync
                trace umount /mnt/mybackupserver-backup
                trace cryptsetup luksClose mybackupserver-backup || { sleep 5; trace cryptsetup luksClose mybackupserver-backup; }
                trace iscsiadm -m node -U all
                trace kill %1
                exit 1
        }
        
        trace() {
                echo "[+] $@"
                "$@"
        }
        
        RSYNC_OPTS="-i -rlptgoXDHxv --delete-excluded --delete --progress $RSYNC_OPTS"
        
        trap unwind INT TERM
        trace modprobe libiscsi
        trace modprobe scsi_transport_iscsi
        trace modprobe iscsi_tcp
        iscsid -f &
        sleep 1
        trace iscsiadm -m discovery -t st -p mybackupserver.somehost.somewere -P 1 -l
        sleep 5
        trace cryptsetup --key-file /etc/dmcrypt/backup-mybackupserver-key luksOpen /dev/disk/by-uuid/10a126a2-c991-49fc-89bf-8d621a73dd36 mybackupserver-backup || unwind
        trace fsck -a /dev/mapper/mybackupserver-backup || unwind
        trace mount -v /dev/mapper/mybackupserver-backup /mnt/mybackupserver-backup || unwind
        trace rsync $RSYNC_OPTS --exclude=/usr/portage/distfiles --exclude=/home/zx2c4/.cache --exclude=/var/tmp / /mnt/mybackupserver-backup/root || unwind
        trace rsync $RSYNC_OPTS /mnt/storage/Archives/ /mnt/mybackupserver-backup/archives || unwind
        trace sync
        trace umount /mnt/mybackupserver-backup
        trace cryptsetup luksClose mybackupserver-backup
        trace iscsiadm -m node -U all
        trace kill %1

~~~
nodata
Your backup script has no exit code checking and relies on sleeps!

------
vbezhenar
What I would like to use is some kind of small programs from which backup
script might be crafted. Actually most of the tools are available. Incremental
backups with rsync. Encryption with openssl. Archiving/compression with
tar/gzip/7-zip. But scripts will be quite verbose and error-prone. Better
solution should be available.

~~~
drKarl
Yes, that is another option, but that would be in the category "roll your
own". You can start rolling your own with some shell scripts, or with some
python, etc

Actually I think this is how most solutions started...

------
nodata
Seconding obnam. Speed it up with:

lru-size=1024

upload-queue-size=512

See [http://listmaster.pepperfish.net/pipermail/obnam-support-
obn...](http://listmaster.pepperfish.net/pipermail/obnam-support-
obnam.org/2014-June/003086.html)

~~~
hsivonen
Using /dev/urandom instead of /dev/random help, too.
[http://listmaster.pepperfish.net/pipermail/obnam-support-
obn...](http://listmaster.pepperfish.net/pipermail/obnam-support-
obnam.org/2015-February/003421.html)

~~~
feld
This makes me highly suspect about the encryption implementation of obnam.

~~~
joeyh
Obnam uses gpg for encryption.

The only novel thing is its use of symmetric encryption keys which are used to
encrypt the data and are included in the backup repository, encrypted by your
regular private gpg key. This allows giving additional gpg keys access to the
backup after it has been made.
[http://liw.fi/obnam/encryption/](http://liw.fi/obnam/encryption/)

(Which is useful for eg, backing up a server. You can make a dedicated gpg key
for that server, but give your personal gpg key access to the backups to
restore later.)

Anyway, the generation of the symmetric encryption key is what needs an
entropy source. AFAIK this is done once per repository.

~~~
feld
Yes I see this now. It wasn't as obvious at first glance. However, I'm
surprised this issue with random wasn't caught much, much sooner.

------
barttenbrinke
If you are looking for a way to drive it all, this is working very nice for
me:
[https://github.com/meskyanichi/backup](https://github.com/meskyanichi/backup)

~~~
drKarl
Would it be comparable to BackupNinja?

~~~
barttenbrinke
Yes but simpeler imho

------
Nusyne
Attic seems good. Its encryption makes me uneasy
([https://github.com/jborg/attic/blob/master/attic/key.py](https://github.com/jborg/attic/blob/master/attic/key.py)),
and I don't feel qualified to review it:

    
    
      * use of pkbdf2(passphrase)[0:32] as encryption key?
      * is AES correctly used? there are many pitfalls
    

I would be much more comfortable with it using gpg for encryption.

~~~
hsivonen
Using gpg is not so awesome. Obnam uses gpg. Since Obnam invokes gpg in batch
mode, if you want to have a passphrase, you have to use gpg-agent, which at
least for me took more effort than I found reasonable to set up on a GUIless
server.

Furthermore, all the crypto config depends on gpg defaults or your gpg.conf.
Whether this is good or bad depends on whether you are OK with gpg's defaults
that are chosen for a non-Obnam use case and whether you like tweaking gpg
config.

While figuring this out, I started wishing that Obnam used libsodium instead
of gpg to avoid configuration and especially gpg-agent. (libsodium didn't
exist when Obnam was created.)

~~~
icebraining
_you have to use gpg-agent, which at least for me took more effort than I
found reasonable to set up on a GUIless server._

Did you try Keychain¹? I've used it in the past to auto-sign deb packages, and
it was simple to set up.

¹ [http://www.funtoo.org/Keychain](http://www.funtoo.org/Keychain)

~~~
hsivonen
I didn't.

Having to be aware of tools like this is the problem when you face the
requirement of having to set up gpg-agent and you don't already know how to do
so in an environment where a desktop environment from your distro hasn't done
it for you.

------
xroche
I tried to find the best solution (simple, secure, incremental, reliable...)
and could not find the perfect candidate either.

I finally ended up by sending encrypted gpg tar files to a remote backup
machine (you may want to cross-backup machines) (without using any
intermediate file)

(sorry if formatting breaks)

#!/bin/bash #

# list of local directories to be backuped DIRS="/home /media"

# destination directory target DEST="/data/backups"

# gpg encryption email GPGEMAIL="homer@example.com"

# remote ssh as user@pachine REMOTESSH="homer@backup.example.com."

# remote ssh additional args REMOTESSHARGS="-i /root/.ssh/id_backup"

for i in ${DIRS} ; do if test -d "$i" ; then

f=${DEST}/$(echo $i | tr '/' '_' | sed -e 's/^_//').tgz.gpg tmp=${DEST}/_tmp

echo "backuping $i to remote $f encrypted with pgp" >&2 /bin/tar cf - ${i} \ |
/usr/bin/gpg --quiet --batch --encrypt --compress-algo zlib --recipient
${GPGEMAIL} -o - \ | /usr/bin/ssh -p ${REMOTESSHARGS} -o BatchMode=yes -o
Compression=no ${REMOTESSH} \ "cat > $tmp && mv $tmp ${f}"

else echo "error: $d does not exist" >&2 fi done

------
tomaac
If you mention Bacula then you also need to mention Amanda
[http://www.amanda.org/](http://www.amanda.org/)

~~~
mrmondo
oh the countless time I've wasted on Amanda - it's just too old now.

~~~
ewindisch
While for many, backup is better on disks these days, tape is far better for
_archival_ and Amanda does this quite well for Linux.

------
acd
Misses IBM Tivoli TSM and Legato Networker, backup systems that work in
practise and don't fail on you.

Other than that my vote is on ZFS snapshots reliable.

~~~
drKarl
Thanks for the suggestions. I'm more inclined to use Open Source solutions,
though, which Tivoli and Legato are not.

I agree with ZFS but I think it shines as a last resort backup (like rsync.net
offers) and not as a main backup system since it doesn't do incremental and
deduplication and so on...

~~~
acd
ZFS is a also a good way to get consistent fast backups of databases by
snapshotting. By ZFS copy-on-write its incremental by nature as it you only
have to send over deltas incremental streams. Thus for site-to-site backups
its usually very fast! Another advantage compared to rsync is that you do not
need to directory traverse the whole file system in order to find the
differences, I believe its similar to BTRFS in that aspect.

Then there is the feature that ZFS has checksums, so that when you write to
disk you know what you get otherwise you can get corruption. RAID5-60 for me
is a gamble that you can get hidden write disk errors unless there is
checksums in software on a higher layer.

Always scrub your ZFS source and backup pool.

------
aduitsis
Using bacula with a twin cabinet Quantum tape library with two LTO4 tape
drives (all fibre channel), looking to replace with an equivalent cabinet with
LTO6. Bacula really has a difficult learning curve but the features are
equivalent to other enterprise-grade pieces of software like networker, etc.

------
chimeracoder
I'm surprised nobody has mentioned Back In Time[0], especially since a lot of
people have mentioned rsync.

Back In Time is basically just a GUI wrapper around rsync that manages
snapshots for you. I used to have an rsync script that did the same thing
manually, but it's great to have a tool that manages the snapshots for you so
you don't have to remember to update the symlinks individually.

Unlike rsync[1], Back In time does incremental backups, so you save on storage
space if your files don't change often.

[0][https://wiki.archlinux.org/index.php/Back_In_Time](https://wiki.archlinux.org/index.php/Back_In_Time)

[1] by default, that is - you can use rsync to achieve incremental backups,
which is what BIT does

------
mrmondo
Has anyone tried BareOS? - [http://www.bareos.org](http://www.bareos.org)

It has a modern looking web-ui too: [http://www.bareos.org/en/bareos-
webui.html](http://www.bareos.org/en/bareos-webui.html)

------
amelius
My personal recipe: rsync -avPH --delete-before

And then make hard-links of all files to another folder, named BACKUP_yymmdd

This way, the backup is incremental, and you have snapshots of older versions
of the backup, where there is structural sharing between snapshots.

~~~
drKarl
But that is not encrypted, right?

~~~
amelius
If you run it over ssh it is. Or do you mean stored in an encrypted way?

~~~
drKarl
Both securely transmited (ssh), stored in an encrypted way, but it's better if
the encryption happens in the client, not in the server.

------
mmarx
We're using bacula for backing up ~20 servers and VMs onto an LTO5 tape
library. While it might be difficult to set up initially, that setup has now
been running for over ten years (modulo changes accomodating new hardware).

It does provide lots of flexibility, though, you can restrict who gets to
restore which files onto which servers, for example, or the ability to backup
to different targets.

Regarding clock sync, that hasn't been a problem for ages: “Note, on versions
1.33 or greater Bacula automatically makes the necessary adjustments to the
time between the server and the client so that the times Bacula uses are
synchronized.”

~~~
kklimonda
How are you backing up VMs? Using some kind of hypervisor API, or just run
bacula file in VMs themselves?

~~~
mmarx
Currently, we just run bacula inside the VMs, which has the benefit that VM
owners can simply use bconsole inside the VM to restore files from a previous
date.

------
kokey
I'm quite happy with duplicity for backing up to S3. It's incremental backups
that fits in the with the S3 charging model, so you only pay for restores. I
did find that it's important to set it to do a full backup from time to time,
e.g. once per month. Otherwise, a restore will take very long and also more
expensive. I discovered this the hard way, by doing my second test restore 7
months after the last full backup. Fortunately it was a test restore. I use
the duply wrapper script to make setting it up even simpler.

------
cmurf
Yet another option is the stateless desktop, similar to mobile devices (and
actually to some degree Windows 8+). The ability to do a reset/refresh without
affecting user data or apps; and full system reset where all updates and user
data are removed, e.g. returning to an OS state defined by a read-only image
(which itself can be atomically replaced for major upgrades).

What makes things a pain on Linux is applications have parts strewn all over
the filesystem. My stuff has a /home, why isn't there an /OShome, and an
/Appshome?

~~~
SSLy
> What makes things a pain on Linux is applications have parts strewn all over
> the filesystem.

All of them reside under `/usr`. At least with my distro.

> My stuff has a /home, why isn't there an /OShome, and an /Appshome?

With monolithic repositories,* how would you code it?

* Ie. system and apps come from the same source, unlike eg os x where the distincion is clear.

------
digital-rubber
Interesting overview, some have to look into. But nobody/nothing mentions the
option of the application handling the data you want to backup, to perform the
backups.

Imho it's ideal to have the application handling the data also create the
backups. And then transport them to remote locations via any means viable or
any of the in the article mentioned backup systems again. Though having the
application back things up vs backup software, i tend to think there is less
tuning required, less chance on external locks etc etc.

~~~
drKarl
That is assuming there is a single source of data in need of backup and that
the data is generated/handled by an application. Data could be from many
different sources and formats: databases, images, video, audio, pdf, text...

~~~
digital-rubber
That is true. I did assume :)

But the same applies to that. These 'other data formats' from other sources
too, could also be replicated from the software handling them, assuming again!
:o)

 _edit_

Speaking from a perspective where i have seen many parties that went wrong
with their '3rd party backup application', i tend to try to replicate most
from the application themselves to other (safe) locations, then to rely on a
3rd party app which got setup in 2005, but in the meantime received little
attention. Hence i prefer to look at the application
generating/receiving/handling the data.

------
Istof
I backup using rsync some important files to a local and remote server(s)
daily in a TGZ archive and keep the last 14 days, last 10 weeks (1/week), last
11 months (1/month), and 1 yearly forever.. Very much based off of this:
[https://nicaw.wordpress.com/2013/04/18/bash-backup-
rotation-...](https://nicaw.wordpress.com/2013/04/18/bash-backup-rotation-
script/) ... appears to work pretty good for a small amount of data.

------
tokenizerrr
When I tried duplicity it worked great for all of my servers but one. I am not
sure what caused the issues on the one server, possibly clock skewing, but
backups kept consistently getting corrupted. I'm now a happy bup user, and as
far as I'm aware they're working on pruning old backups.

Attic seems promising. Does anyone have any experience of that versus bup?

~~~
tenfingers
Attic has still problems for large backups (when the block index gets >2gb).
There are also several minor issues due to the tool being relatively new (file
selection is lacking). Check it's issues page:
[https://github.com/jborg/attic/issues/](https://github.com/jborg/attic/issues/)

~~~
witten
To work around some of attic's more UI-level shortcomings, I made a wrapper
script called atticmatic that adds a declarative config file, excludes file,
etc:

[http://torsion.org/atticmatic/](http://torsion.org/atticmatic/)

------
victorhooi
One thing to mention is that Duplicity supports S3 as an endpoint.

So does Arq (OSX backup software), as well as Glacier.

This makes large-size backups very cost-effective.

I mean, some of the newer ones (e.g. Attic, Bup) certainly do look good, but
you need a full-fledged Linux server at the other end, as opposed to being
able to just shove it into S3/Glacier - this to me is a drawback.

~~~
ceejayoz
I had issues with Duplicity and Glacier files in my S3 buckets - Duplicity
thought they were regular S3 files, tried to fetch, and failed. This was a
while back, though, so I'd imagine it's fixed.

------
akampjes
Spideroak may be an option, it has has a headless mode.

[https://spideroak.com/faq/questions/67/how_can_i_use_spidero...](https://spideroak.com/faq/questions/67/how_can_i_use_spideroak_from_the_commandline/)

------
theonewolf
Problem with Snebu: it only does file-level deduplication _I think_ which
means storing something like virtual machine snapshots or images will not
deduplicate well.

You need block-level deduplication to work well with things like VM images.

~~~
derekp7
That is correct -- at this time it is only doing file level dedup, due to
overhead and complexity of block level (or better yet, variable block level)
deduplication. The file level dedup is done by computing the sha1 checksum of
the file, and using that as the name to store the file contents under.

I've got a solution in the works specifically for KVM images, that hopefully
I'll be able to finish up the next time I get a week off of work. The way I'm
planning on handling that (at least for LibVirt based VMs) is to use
libguestfs to create an equivalent to the `find` command (with all the -printf
options that it supports), and to generate tar file output of selected files
from the VM. (Snebu is designed around only requiring find/tar on the client
side -- so anything that can produce this output will work). Although I don't
like that libguestfs actually fires up a VM in order to work with the image
files -- I may work on an approach to read the qcow2 file images directly.
Again, looking forward to a vacation this spring so I can pound out that
module.

------
theonewolf
As some commenters mentioned, a meta-backup system that can manage multiple
different ingest methods would be Deltaic:

[https://github.com/cmusatyalab/deltaic](https://github.com/cmusatyalab/deltaic)

------
joch
I have been using backup2l together with rsync for ages and it has been
working great. It doesn't natively support encryption though.

[http://backup2l.sourceforge.net](http://backup2l.sourceforge.net)

~~~
drKarl
Added, with the drawback of lacking encryption

~~~
wink
The way we always did was to add a custom driver and insert a few lines of gpg
calls.

So while yes, there's no "native support" it's definitely not hard to add.

While this is just a random google result[1] as I don't have access to the
aforementioned snippet, you'll get the idea I hope.

[1]: [http://www.iniy.org/?p=151](http://www.iniy.org/?p=151)

------
mrmondo
Backup Ninja -
[https://labs.riseup.net/code/projects/backupninja](https://labs.riseup.net/code/projects/backupninja)

And it has a neat curses based GUI called NinjaHelper

~~~
drKarl
It seems it is more a way of integrating different backups, which can be
useful of course. For instance, it seems it can use Duplicity. Is that so?

~~~
mrmondo
Yes indeed, I've found it quite useful as a reliable and easy to configure
'metabackup' system if you will.

That's correct, Backupninja can use Duplicity as a backend:
[https://labs.riseup.net/code/projects/backupninja/wiki/Dup](https://labs.riseup.net/code/projects/backupninja/wiki/Dup)

------
pwenzel
Still one of my all time favorite approaches is the DIY-style snapshot-based
system with rsync and hard links, which keeps snapshot sizes small:

This example keeps the days worth of filesystem backups:

    
    
      rm -rf backup.3
      mv backup.2 backup.3
      mv backup.1 backup.2
      cp -al backup.0 backup.1
      rsync -a --delete source_directory/  backup.0/
    

More details:
[http://www.mikerubel.org/computers/rsync_snapshots/](http://www.mikerubel.org/computers/rsync_snapshots/)

Trinkup is a script that automates this approach, somewhat like rsnapshot:

[https://gist.github.com/ei-grad/7610406](https://gist.github.com/ei-
grad/7610406)

------
pwelch
I'm a big fan of the Ruby Gem Backup:
[https://meskyanichi.github.io/backup/v4/](https://meskyanichi.github.io/backup/v4/)

------
andor
How about adding btrfs send/receive to your list?

~~~
drKarl
Added, do you have some more info on that one, drawbacks, performance?

~~~
jpgvm
De-dupe, will only transfer blocks changed since last snapshot.

Performance, it's super damn fast. Because the ZFS snapshot itself is just a
stream you can also compress it on the wire.

Flexibility. You don't need to send the snapshot to another ZFS server. You
could just take the ZFS snapshot streams and store them on say S3 compressed
and re-assemble them. This would take custom tooling but it's definitely
possible.

You can restore super quickly to an older version if you haven't deleted the
local snapshots.

Similar to above but if you don't want to restore (as in make said snapshot
the current active dataset) you can create a writable clone of one of the
snapshots to play with it before committing to a restore or just to "go back
in time" and play with something.

As a bonus you also now have your data on ZFS which has tons of great benefits
apart from the point in time snapshots.

------
drKarl
I updated the gist with some comments and additional solutions, and added
Markdown formatting for readibility on mrcrilly's suggestion.

------
andrewchambers
[https://github.com/restic/restic](https://github.com/restic/restic)

~~~
drKarl
It's still in alpha, but looks interesting, I'll keep an eye on it!

------
INTPenis
Attic seems most promising if it wasn't for the python 3 requirement and me
still being on RHEL6.

~~~
drKarl
What about installing Python3 manually?
[http://stackoverflow.com/questions/8087184/installing-
python...](http://stackoverflow.com/questions/8087184/installing-python3-on-
rhel)

~~~
INTPenis
Problem with installing anything manually is patch management. And we take
patch management very seriously.

------
geoka9
I like using rsnapshot. I think it's a great rsync wrapper.

------
npaquin
I'd suggest checking out Flexbackup
([http://flexbackup.sourceforge.net/](http://flexbackup.sourceforge.net/)).
It's super simple (always a plus), I've been using it for many years without
fail and you can integrate encryption
([http://rolandtapken.de/blog/2011-01/encrypted-files-
flexback...](http://rolandtapken.de/blog/2011-01/encrypted-files-flexbackup)).

------
wantab
I believe the answer is FreeBSD.

