

Let's Stop Talking About "Backups" - JoelSutherland
http://www.joelonsoftware.com/items/2009/12/14.html

======
gaius
It's not us you need to tell Joel, it's your business partner. And if this is
your way of telling him, isn't it a little passive-aggressive to do it in a
blog post?

~~~
spolsky
I think that a lot of people think they have backups, but they've never
restored them, so I thought it would be a good practice if everyone starting
thinking in terms of "have we restored" rather than "are we backing up." Of
course, there's no question that the thought process started when Jeff
Atwood's personal blog was lost, but don't think for a minute that the only
way I communicate with him is by blogging... we talk all the time, over skype,
over email, over FogBugz, and sometimes, when there's something other people
can learn, in public on the internet.

~~~
telemachos
The weird part is not that you made a blog post about this issue now.

The weird part is that you didn't mention the 400 pound gorilla in the room.
("I've been thinking about this for a long time, but it really hit home
recently when...")

To my mind, that's why the post seems odd or passive aggressive. (It's a
relatively short post as well. It just feels clipped somehow. The reader
inevitably says, "What's _not_ here?")

~~~
mechanical_fish
It may be difficult to remember now that Twitter is all the rage, but
essayists often aim for that "timeless" quality. You want the essay to seem as
relevant three years from now as it is today.

The web has more than enough content that feels stale after a week. Better to
aim for something with a slightly longer shelf life.

~~~
pyre
Yea, but how is "my friend/business partner recently lost his entire blog due
to poor backups, and it got me thinking..." not 'timeless?'

~~~
mattmanser
Not poor backups, poor restores. You're missing the point of the article!

~~~
oscardelben
It's also poor backup. They thought the backup was being done by their hosting
company while in reality it was failing silently.

~~~
Periodic
A good backup is one that can be restored.

------
DenisM
I find it disturbing how many people decided to comment or up-vote on Joel-
Atwood angle, or insult each other.

Remember: great people discuss ideas, normal people discuss events, shallow
people discuss other people.

------
ghshephard
What I find someone charming/quaint about Joel's Posts on "Operations" over
the last six years that I've been reading him, is that he is, slowly but
surely, discovering the "Art of Operations" - albeit at a glacial pace.

Most People who work in Production Operations environments of any scale,
discover what has taken Joel the better part of a decade, in the first two-
three years of their career.

I almost feel like that Airplane passenger sitting beside Brooks Jr - Brooks
saw him reading his book, "Mythical Man Month" - and asked the guy (who had no
idea who he was sitting beside) what he thought of the book - The gentleman
responded that it was basically a summary of things he knew already. Joel is a
giant in the industry, but he does have a tendency to discover/restate the
obvious.

"It's not backups, but the restores that matter" - is kind of the mantra of
every single person who has ever been responsible for backups.

Then, you go to _any_ class on running a production environment, and you
discover things like RPO, RTO, Dress-Rehearsals, etc.. and the whole "It's
restores that matter" begins to look quaint.

~~~
lifeisstillgood
the "Art of Operations" suggests a book or similar that you are referring to -
is there one, or am I reading too much into some double quotes?

~~~
josephkern
Yes there is a book: The Practice of System and Network Administration, Second
Edition. This is the best Operations book I've ever bought. Worth every penny.

~~~
vdm
Because Google isn't links:

[http://www.amazon.com/Practice-System-Network-
Administration...](http://www.amazon.com/Practice-System-Network-
Administration-Second/dp/0321492668/)

------
wglb
This is good food for thought. Let's also add to this a concept from a
different realm: Everybody has at least two dns servers listed in their
/etc/resolv.conf, right? The reason is that in case one of them goes down,
there is the other one.

So this seems like a good lesson to take about backups. Mebbe three? One by
your hosting provider, one at tarsnap, one on a separate dat tape, one on a
usb stick?

A good point is though that even something as big as a dat tape looks pretty
small by the standards of what we need to back up today.

------
Sukotto
Also this, similar take on the horror of backups gone wrong:
<http://www.penny-arcade.com/comic/2005/8/10/>

Don't blindly rely on your partner to do it... Trust, but verify.

------
Goladus
It's helpful to boost signal for this message. It's an old message, but
planning and practicing system restores can be as expensive in terms of
equipment and manpower as actually making the backups. This leads to _a lot_
of neglect.

------
loupgarou21
... shouldn't it be common practice to test your backup system to make sure
that the restore procedure meets the requirements of the client (company,
etc?)

The IT company that I work for creates a backup system based on the
requirements of our clients and then demonstrates the whole backup and restore
procedure to make sure that it falls in line with what our client actually
wants. It's really not difficult to do. Sure, some of the restore procedures
may be slower (depending on other requirements, such as cost,) but the client
knows that will be the case and signs off on it.

------
mark_l_watson
Common sense and obvious. In the 1980s I worked on a large DARPA project where
a huge hit was taken because our admins never tried to restore from backups.
It is the kind of lesson that is (hopefully) learned with just one bad
experience.

This is another reason why I like EC2 deployments: it is fairly easy to take
your backups (automated deployment scripts, application, data) and spin up
another copy of your whole system (except for flipping the DNS). Make sure
those EBS-backed EC2 AMIs are really bootable and functioning :-)

~~~
DenisM
What happens if Amazon runs out of machines, has a system-wide corruption that
was noticed too late or decides to be dicks about something or other?

~~~
mark_l_watson
Good questions.

I would think that if they are making money with AWS then they will keep
buying more servers.

I usually trust S3 for backups and restores but periodically back my own data
off of AWS to local storage.

I expect both Amazon and Google infrastructure services to experience outages
from time to time. However, they have far more resources and expertise than I
do to provide scalable services for a low cost.

~~~
DenisM
I was thinking about making a set up where my S3 backups are automatically
mirrored to rackspace or something like that. That would be a really neat
setup.

------
hexis
I tend to be a little paranoid about backups, so I have a few different disks
backing up my main, desktop, machine. But I also use one of my backups to sync
data to my laptop, not quite a full "restore" due to a big size difference in
the respective drives. But, generally speaking, the two machines are in sync
and I can be sure that at least one of my backups works reasonably well.

------
DannoHung
This whole ordeal is getting me motivated to actually buy a cloud backup
service (personal use, not business use). I was thinking of Carbonite or
Backblaze. Anyone have any experience with those?

~~~
slig
I used carbonite in early 2006 and the software was horrible. Eventually, I
tried to uninstall and the process failed. I got a half installation that
wouldn't work and couldn't be removed. I didn't try too hard after that,
because I was planning to format and start over.

------
jsz0
Full disk image backups are a good solution for this problem. No worries about
partial backups or a complex restoration process. It's totally inefficient but
storage is cheaper than man hours.

~~~
peterwwillis
I'm assuming you're talking about some kind of atomic file-or-block-level
backup such as LVM snapshots? Large files such as databases can change while
reading them over a long period of time, so a standard disk image or file copy
wouldn't be reliable for a live system.

~~~
ryanpetrich
zfs send/receive is amazing. I wish other filesystems had it

------
orblivion
As if I didn't lose enough sleep pondering backups already.

------
lazyant
Testing your (off-site) backups is an obvious first item in many lists
<http://watsec.com/article/49>

------
dnsworks
What he's saying is, "We failed miserably at having good process and
procedures. Because of this we are going to lecture others, and point the
blame at everybody but ourselves, in hopes that they'll stop pointing out how
much credibility we lost over this."

------
aw3c2
Pointless write-up about linguistics. He says "restore" is the important
thing, not the "backup". Well, duh.

~~~
itgoon
Not really. That's actually one of my favorite "health check" questions: when
was the last time you restored?

Most places have very reliable backup procedures. Most of those have very poor
restore procedures - I'd say about half fail when put to the test.

~~~
aw3c2
To me that is like saying "You can put money on your bank account but you
should make sure you get it back". Of course you have to test your backups.

------
jfoutz
backups are for suckers. keep the data on a few different spinning disks. if
you can solve data synch between two sites, just keep your data synched.

it's much better to ask yourself how long to replicate your existing system
then how to back up. pxe boot to a kernel that you can install over the
network with, bcfg2 to get the thing up to spec, start copying data.

a _lot_ of machines can be back and configured in 5 minutes.

that said, i'm not you. i don't have terabytes of data to do statistics on.
maybe there are other horrible details i'm forgetting. fast rebuilding is a
pretty awesome strategy for a lot of cases.

~~~
mechanical_fish
_maybe there are other horrible details i'm forgetting_

Yes.

Why should I bother to write this? I'll outsource the task to the authors of
_High Performance MySQL, Second Edition_ , page 475:

 _Backup Myth #1: "I Use Replication As a Backup"_

 _This is a mistake we see quite often. A replication slave is not a backup.
Neither is a RAID array. To see why, consider this: will they help you get
back all your data if you accidentally execute DROP DATABASE on your
production database? RAID and replication don't pass even this simple test.
Not only are they not backups, they're not a substitute for backups. Nothing
but backups fill the need for backups._

~~~
diego
This is the incremental backup script I use on my Linux box at home, a quick-
and-dirty imitation of what Time Machine does. Obviously $HOME/backup is a
different physical disk. Feel free to improve on this.

\--------

#!/bin/bash

HOME=

date=`date "+%Y-%m-%dT%H:%M:%S"`

rsync -aP --link-dest=$HOME/backup/current /home

$HOME/backup/back-$date

rsync -aP --link-dest=$HOME/backup/current /etc $HOME/backup/back-$date

rm $HOME/backup/current

ln -s back-$date $HOME/backup/current

#see if the disk is getting full

FREE=`df -lk|grep sdb1|awk -F" " '{print $5}'|awk -F"%" '{print $1}'`

#alert me if the backup disk is getting full.

T=80

if [ "$FREE" -gt "$T" ]

then

    
    
        df -lk| mail $myaddress -s"disk alert $T% capacity"
    

else

    
    
        echo "backup disk is less than $T full"
    

fi

~~~
Locke1689
You should think about keeping an offsite backup if possible. A second disk
won't necessarily help if your computer gets dropped in a pool, house catches
on fire, etc.

