
Backups suck (2013) - JensRantil
http://3ofcoins.net/2013/11/14/backups-suck-a-rant/
======
danieldk
I agree with many points of the article. I am slowly moving back from cloud-
based services to simple local (though synced) text-based files. E.g. I
switched from Google Mail to plain IMAP, which I fetch with isync, read with
mutt(-kz) and index with notmuch.

I generally do not do presentations and documents in Google Drive of Microsoft
Office anymore. When possible, I stick to Markdown and use pandoc to convert
to PDF. Since it's plain files, it is easy to backup. And git provides
excellent versioning. Most people can also read and write Markdown without too
much trouble.

I also decided to move my task planning back to local, using Task Warrior and
Dropbox for sync.

For backups of my MacBook and Mac Mini, I am now mostly using Arq [1], which
can backup to nearly anywhere (ssh to Amazon S3), and has a sensible
interface. The data format is documented, and they have an open source
command-line restore utility for the case Arq disappears. It can also restrict
backups when connected to known Wifi networks, etc.

If I wasn't on Mac, I would probably look at Attic.

[1]
[http://www.haystacksoftware.com/arq/](http://www.haystacksoftware.com/arq/)

~~~
ekianjo
> I also decided to move my task planning back to local, using Task Warrior
> and Dropbox for sync.

Smart, did not think about using TaskWarrior within dropbox. Pretty good idea!

------
XERQ
I've seen many of my clients set up their own backup systems and have those
fail at the worst times. Last month a large client of ours called our managed
support team at 3AM saying they hired the wrong developer who completely
trashed their database and hosed their entire application. They had their own
backup system in place and it silently failed, but luckily they ordered our
internal backup solution as a secondary. We were able to get them restored in
5 minutes, if they didn't have our solution in place they would've had to
spend weeks fixing what the developer broke.

Current Linux backup solutions are not made for humans. Have a look at the
mondorescue guide[1], nobody is going to read that and comprehend it with full
mastery, meaning you're leaving yourself open to losing data. VPS providers
offer backups that are usually in the same datacenter, which means you're SOL
if there's a disaster. Those same providers also don't allow you to restore
single files/directories from snapshots, usually you have to launch a new
instance or revert everything back to snapshot.

[plug] We ended up creating a simple Linux backup solution[2] that's as simple
as copying and pasting a single command to get installed, notifies you if your
backups aren't running, handles snapshots, and is secure. Restoring your data
is a single command away, so you can focus instead on building your startup
rocketship. Our mission is to make data loss a thing of the past. [/plug]

[1] [http://www.mondorescue.org/docs/mondorescue-
howto.html](http://www.mondorescue.org/docs/mondorescue-howto.html)

[2] [https://jarvys.io](https://jarvys.io)

~~~
donmcronald
How do you handle a scenario like this...

I have a Redmine install that I want to backup. It uses both a database and
the file system. If I attach a file to an issue, the attachment is stored on
the FS with a reference in the DB.

I don't know how Redmine deals with keeping the FS vs the DB consistent, but
assume it uses some type of transaction across both when an attachment is
added.

How do you back that up without possibly getting the FS and the DB out of
sync. Ex: What if you snapshot the FS right after an attachment is written,
but before the reference is added to the DB? The transaction in the app will
succeed, but your backup is inconsistent compared to what the app expects.

How can you get a truly consistent backup without either a) stopping the app
or b) integrating with the app to make sure you're not breaking assumptions
needed for consistency?

Basically, almost every backup solution I've ever seen is crash consistent at
best. How is yours different?

~~~
XERQ
Our core value is around the simplicity of use, so instead of having to read a
novel of a manual written by a crusty UNIX sysadmin, along with combining it
with storage, you can create an account and copy-and-paste a single command
that does all the work for you. We've even tested it with people who've never
used Linux before and they were able to install it and restore a file without
much direction.

With that said, we do provide the ability to hook your own scripts and
commands into the backup process itself, for instance backing up MySQL you'd
just put in the mysqldump command (with relevant DB and user/pass info) into
the hook script uncleverly titlted 'run-before-backup.sh' in the config
directory.

I personally don't have any experience with Redmine. Application-specific
backups are outside the scope of our initial launch, but as we get more user
feedback we'll be able to build plugins that integrate with specific
applications. In the mean time, I'd both: 1) pester the developers to provide
a simple backup solution for Redmine, and 2) look into either putting it on a
VM that you can snapshot, or use something like ZFS snapshots and send/receive
it to a remote location.

~~~
moe
Um. From your docs:

 _C. RESTORE FROM ANOTHER SERVER 'S BACKUP

This feature is currently in development._

So... My server goes up in a puff of smoke and I can't access the backup?

~~~
XERQ
This is a beta product, and we put that up on the docs so people can
understand in its current state what it's capable of and what it isn't. One
thing we didn't put in the docs is that we can change the target of a new
system to an old server's backup. So in the case of your server going up in
smoke, you would simply open a uservoice ticket (or email me using my profile
info), fire up a new server installed with JARVYS (disabling the cronjob until
you're restored), and we'd change the target UUID to that of the new server,
allowing you to restore from your old backups.

Soon we will have it where you can simply select any other server from your
account to restore from it to any other server on your account, and the docs
will be updated with how to do it.

------
mattbee
Good rant :)

As the co-owner of a hosting provider with its own data centre I might be
biased but I totally reject "You can’t trust your datacenter anymore when
you’re in the cloud" argument. You _need_ your backup server to do some
lifting for you to make half of that wishlist work. If you can't trust your
provider not to read your unencrypted some of the time data, you're using the
wrong provider, and making backups much harder than they need to be.

Also a lot of that wishlist seems to ignore network realities - you can do a
lot more with a backup server in the next rack, or one that's the end of a
private link, than you can with a very remote storage provider, so insisting
on the same tools no matter where your backups are located seems a bit
hopeful.

I've worked on a backup system for our customers called "byteback" which I'm
slowly finishing off and documenting - will only be 2-3000 lines of Ruby when
finished.

It currently leans on rsync, ssh and btrfs's copy-on-write snapshots to keep
efficient copies of entire servers. There's a server-side pruning algorithm
that allows the disc to fill up with daily snapshots for multiple servers,
then prunes the least "useful" ones.

It's trying to be zero-configuration, so it builds its list from a list of
"local" filesystems, so that you can copy the whole snapshot back quickly to
restore a system.

The only other feature I'm going to need is to automatically drive "snapshot"
functionality when it finds it on the server - e.g. LVs, btrfs subvolumes and
other points on the filesystem where it can make a safe snapshot, it should do
that automatically.

The zero-configuration rationale is just that where we've had backup failures,
it's been manual misconfigurations and misguided attempts to be "efficient"
that caused important files to be missed. So I'm trying to bake in defaults
and features to cover every mistake we've ever made :)

As the name implies it's made for our customers, and our defaults, but I'm
pretty sure it'll work for a lot of other server use cases as well. I'm still
going over a few 10s of live backups fixing problems and adding new defaults
as I find them. If anyone's interested, shout and I'll try to push on with the
documentation and put it up in its current state.

~~~
mpasternacki
Original ranter here, thanks :)

Regarding "trusting the datacenter": it was an overstatement, but while I can
risk my own data, I am double-wary with my clients' data, and triple-wary with
my clients' users' data. And while I don't _assume_ that cloud provider is
_necessarily_ evil and unreliable, I need to survive losing a provider (be it
because provider went bankrupt, or because I tried out GAE and got locked out
in a Google Checkout mishap, or because of whatever reason), and I cannot rule
out possibility of a leak (think Dropbox: hindsight's 20/20, but I don't
believe I can predict which one of cloud storages will be next Dropbox). An
extra bonus of good encryption is that I can spread storage over cheaper, but
less reliable providers.

Your remark about ignoring network realities is good, thanks! I work mostly
with cloud or remote servers, it's been at least 10 years since lat time I in
the same room as a server I manage (and back then it was an office Samba
file/print server). Maybe my perspective shows here, and it may be just as
limiting as older tools just phrasing everything as "tapes" and
"autochangers". I am closely looking at FreeBSD/ZFS right now, and this may
not fit well with `zfs send`-based backups.

I'd be definitely interested in taking a closer look at Byteback! Since it's
based on btrfs' snapshots, it may also work well with zfs (and may be exactly
the plumbing I am about to write soon). If you manage to publish it, please
let me know. Not sure if my email is listed on my HN profile: it's maciej at
pasternacki dot net. Thanks!

~~~
ScottBurson
Have you looked at Tahoe-LAFS? I don't think it's a complete solution, but it
might be part of one.

------
prohor
There is another very interesting tool:
[http://www.boxbackup.org/](http://www.boxbackup.org/) . Unfortunately, it
looks like it is dead now, like many other open-source tools of this kind. So
at some point I've done a backup script around rsync, which also works with
snapshots, so has strong data deduplication. Well, it is also dead, but at
least it is short enough for anyone to fix it. If anyone is interested, here
it goes: [http://okrasz-techblog.blogspot.com/2011/02/backing-up-
with-...](http://okrasz-techblog.blogspot.com/2011/02/backing-up-with-
rsync.html)

~~~
dspillett
There is a decade old guide to using rsync for backups and snapshots at
[http://www.mikerubel.org/computers/rsync_snapshots/](http://www.mikerubel.org/computers/rsync_snapshots/)
which I originally based my hand-rolled arrangements on. It hasn't been
updated since 2004 but is still relevant. There are tools that make this more
hand-holdy if you prefer to do less work/thinking yourself, like rsnapshot.

For extra safety against hack+delete+ransom attacks, I make sure my backup
servers and main kit have different credentials can't talk directly to each
other at all - this way if someone hacks into my mains they can't
automatically get at my backups and vice versa. I have an intermediate backup
location: the active machines push data to that and the backup services pull
from there, it can't login to either of the other sets. For automated backup
testing, which I recommend you find time to setup, some data goes the other
way (backups push to intermediate, other sites pull from there).

~~~
rsync
The "Mike Rubel" guide is a great one, and one that we have pointed customers
at for years - especially for his explanations of "rsync snapshots".

FWIW, we finally wrote our own "rsync HOWTO", which is ironic, given that we
ran rsync.net for almost a decade without one.[1] It is _NOT_ rsync.net
specific, which is why I am mentioning it here. Just our attempt at a simple,
concise rsync HOWTO. It includes crontab explanations and examples, as well as
all of the SSH key generation steps.

"For extra safety against hack+delete+ransom attacks..."

rsync.net customers get protection from this in two ways. First, all accounts
have ZFS snapshots enabled by default, and ZFS snapshots are absolutely
immutable. Only local root can destroy them, and only with a snapshot-specific
destruction command.

Second, we do server side "pulls" for all customers who request it, so you can
have your backups at rsync.net without any credentials on your end for an
attacker to use.

[1]
[http://rsync.net/resources/howto/rsync.html](http://rsync.net/resources/howto/rsync.html)

~~~
dspillett
Nice. I'll have to remember that next time I'm rearranging things and
reconsidering options.

------
freework
The problem is that backing up data is fundamentally a management task. It
requires skills to actually do right. People view backing up their computer in
the same way they view changing the oil in their car. Just like there is no
button you press inside the car that says "change oil", there is never going
to be a backup solution that is easy to use for the regular user. At least not
one that does a complete job is backing everything up.

Here is how I handle 'backups': I never do it. If the data is important, I
make sure I store it on a device that is already redundant. If I take a photo
that I really want to keep, I'll send it to dropdox or google drive or gmail
or some place like that. Neither my macbook, nor my iphone, nor my linux
laptop is the canonical living place of anything important. If any of those
devices were to disappear, I will lose stuff, but nothing important. This
system I've sort of subconsciously migrated towards over the past decade or so
of hard drive failures and lost phones.

~~~
acveilleux
Redundancy != Backups. As a recovering sys admin, I want to state that fact
again and again until it syncs in.

The difference is that backups should provide point-in-time snapshots.

Now some of the things you suggest provide some of that (Dropbox has some
versioning) and manually copying things to multiple locations effectively
makes a snapshot of that thing. Backup systems should do this automatically,
on a schedule and comprehensively.

It says a lot that the best I've used was time machine and it's not especially
powerful. It does however "just work." I've wasted hundreds of hours of my
life managing various backup software and they all pretty much sucked. They
did however save my bacon a number of times.

------
Rapzid
Backing up a few servers isn't that hard. I managed the backups(and pretty
much re-wrote the entire system) at a VPS provider. We essentially went in and
backed up every LV found that didn't match certain patterns. You definitely
want some sort of snapshot functionality to help you out and lvm provides that
to you. It also makes it a piece of cake to add another server to the backups;
as long as it has LV's they will get backed up. There was of course a lot of
logic to handle edge cases, DRBD, ntfs, alerting, rsync includes/excludes,
etc.

The trouble comes when you need to backup an entire business, including all
the VPS's, every night. The shear volume of files and the LSTAT's was killing.
I remember when we crossed the number of files we had enough RAM for XFS to
cache the metadata for and performance started to plummet. In the end we ended
up moving to ZFS backup boxes, tons of RAM, rsync inplace, and snapshot
send/receive to replicate between data centres.

------
iamtew
Over a year old article, but I'm still curious; what options are there for
backups these days?

~~~
xorcist
That depends on what you requirements are. Do you need application specific
backups (which?), windows support, mac support, hierarchical storage, tape
support?

For personal/development use on Linux, you should absolutely take a look at
obnam (and the similar attic and bup). They are simple software that do dedup,
encryption and cloud storage.

~~~
iamtew
Right, I was quite sparse with details...

I do mainly Linux stuff, and obnam looks nice and feature-packed enough to fit
my current needs, I'll dig in to that a bit. Cheers :-)

~~~
ThatGeoGuy
To be honest, I had the same question last year, and wrote a short guide on
how I (eventually) came to do backups [1]. Funny enough, I didn't even notice
the topic article until today, which is kind of funny that I see it almost
exactly a year later (only off by two days, really).

Of course, this is very Linux centric, but it fits my needs and likewise
allows me to rsync back anything on the backup at any time, even if I
accidentally all of /bin/ or something equally stupid. Though now I've
somewhat updated it (I use a fully encrypted system and set cryptsetup to use
pass-files on the main disk for the backups in addition to a set password),
but the fundamentals are all there.

[1] [https://thatgeoguy.ca/blog/2013/12/26/encrypted-backups-
in-d...](https://thatgeoguy.ca/blog/2013/12/26/encrypted-backups-in-debian/)

------
PhantomGremlin
Let's question a major premise. Does backup _have_ to be open source??? Maybe
there are good commercial solutions available?

There hasn't been a great deal of discussion of that topic so far. Surely
there are people using commercial software who can offer "non marketing copy"
opinions on those products?

~~~
arca_vorago
"Does backup have to be open source??? Maybe there are good commercial
solutions available?"

I've dealt with almost every major commercial backup solution, and in my
opinion, all of them suck. Why? Mostly because of intuitiveness. I've seen
knowledgeable techs get confused by terminology during setup and during
testin/restoring. Differential, incremental, archive bit, etc.

Another note is cost. At one the support startups I was at, we licensed a
backup solution that we customized so the customers thought it was our own in
house, when it wasn't really. Many of the bigger names can get really $$$
really fast. (looking at you EMC)

Backup doesn't _have_ to be open-source, but honestly I don't know how much I
would trust anything else? I have, reluctantly at first, recently embraced the
google drive ecosystem, combined with vault for regulatory compliance, and the
versioning that comes with google, makes me feel more confident regarding
documents stored there than documents put on random servers and being backed
up to other places.

The problem with your original question though, is that open-source solutions
still suck as well. That's why, depending on the complexity and needs of the
situation, will generally just roll my own using rsync, git, rsnapshot, etc,
in a transparent way to the user. eg, user uses file server A, wich backs up
to backup 1, with backup 1 backing up to backup 2.

I would also like to point out the growing issue of raid. In my mind, at hdd
sizes where they are, ZFS or a similar solution is the only way to go. Raid-5
is dead to me.

------
sustrai
If you like RSnapshot maybe you like ElkarBackup
[http://github.com/elkarbackup/elkarbackup](http://github.com/elkarbackup/elkarbackup)

It uses RSnapshot and the GUI is an easy-to-use web interface.

------
junto
Something that would make cloud backups easier would be the ability to manage
my cable connection's upload versus download speed.

Sometimes I just want to upload backups and it not take 3 weeks.

