If anything, I'd say this is a sign not to work with CrystalTech.
The failures in this case appear to be:
1. Not taking the robustness of his blogs seriously enough (a LOT of people make this mistake, especially with their own content).
2. Being overly trusting of the procedures of his hosting provider. He thought he could trust his hosting provider's "backups" since they are a big company with lots of customers and he paid money for said backups, turns out he got ripped off.
3. Forgetting the maxim that you have to own your core competencies, and existence should always be a core competency.
If it wasn't Jeff Atwood but me, would it be more their fault? Or would I just be less disingenuous?
Nobody's calling them names, or threatening to take business elsewhere or anything. But it's good to get sunlight in there, show them consequences, when they fail you. When a company providing you with service fails, you're allowed to scream it from the rooftops if you feel like it.
Someone else pointed out in another comment, there's a good analogy to be drawn to Atwood's own words here:
Fair enough. But he's not complaining about looking foolish (nor is grandparent as far as I can see). He's complaining about losing data.
In that link he isn't saying "it's your fault no matter what the world throws at you, suck it up." He's saying, "look harder at yourself before you decide somebody else is to blame." Not an issue here; can we agree it is their fault?
Is his tweet. Sounds like blaming to me. I suppose it's a matter of personal taste but if you write about how you're serving your images off S3 and how everyone should be making redundant backups and then get bitten by the fact you somehow wrote about these things but didn't actually do them, you're not looking very hard at yourself.
My simple point is - if you're an advice-giver who's been shown to not follow his own advice on backups, don't compound it by being an advice giver who doesn't follow his own advice on humility.
I thought we were arguing about whether he cares about looking foolish.
Of course he's blaming them. There's reason to blame them. Regardless of what he did or didn't do, they screwed up. In fact I think he's being very restrained and humble taking 50% of the blame. I'd be fucking pissed if that happened to me. Even if I restored from backups in 30 minutes (extremely optimistic).
"if you're an advice-giver who's been shown to not follow his own advice on backups, don't compound it by being an advice giver who doesn't follow his own advice on humility."
Again, the humility he was referring to was to assume first that it was your fault until you determine otherwise for sure. What's that got to do with them screwing up?
You're basically saying: if I act holier-than-thou and advocate backups, then Slicehost or Amazon can give me crappy service because I'm not allowed to 'blame' them or mention how they suck. Which is really weird.
(I feel for the guy. I'm sitting here yakking because I can't connect to my EC2 slice for the last n hours.)
I didn't say he deserves crappy service or that his provider didn't screw up by firing his VM into Neptune. What I'm saying is (in 17 different ways, at this point) - if you are a vocal advocate but don't actually practice or believe what you preach, you're not really an advocate or useful commentator but a hack. When circumstances (that incidentally and in part happen to be someone else's fault) expose you as somewhat of a hack, it's a little uncouth to be pointing fingers at others for their much more minor screw-ups. Being a hack is bigger screw-up than accidentally (or through negligence) firing someone's VM into Neptune, that's all.
After a mistake like this, it may be a sign that they will never mess up again.
It's also a sign that they lack experience and competence. So while your odds of suffering this problem are significantly lower, your odds of the infinite number of other problems are still troubling.
Reason 1: I pay for my bank account one way or another. It's bank's responsibility to keep the system running and secure, not mine. Sometimes we just can't do everything ourselves and have to rely on others.
Reason 2: Many of us host something somewhere. How many do backups? How many of us check the backups? How many do check the backups checking process? (enter infinite recursion) You have to stop at some stage. You probably don't have enough time, or your time is not worth enough to check that.
He can recover most of his blog from the caches, because it was quite popular. I bet he'll be back with an almost complete archive in less than a week.
The rest of us will be quite happy to ensure that what we think is happening, actually is.
It's akin to a pedestrian getting hit by a car and then saying "I had the right of way". Yeah, maybe so, but now you're in the hospital breathing through a tube.
Those of us who checked for traffic before crossing the street are at home watching TV.
If my income is dependent on data, I make sure it gets backed up. If it's an active project, I use my backups to build my dev environments so it's fairly obvious when a backup has failed.
However, it really isn't fair to blame this kind of problem on the end user.
Are website developers also expected to keep the servers secure? If Apache isn't patched and up to date, is that the website admin's fault?
Service providers are paid to do a job. This one failed tremendously, and should lose a large chunk of business for it. I think Atwood is being far too kind in accepting 50% of the responsibility for the data loss.
To put it another way: the only reason to maintain your own backups of your site data -- aside from healthy paranoia -- is because you expect your service provider to fail at doing their job. And if that's the case, shouldn't you be finding a service provider that does it better?
If you don't expect your service provider to fail, you don't know anything about service providers.
Our colocation facility has redundant generator systems. They're tested regularly, and have handled failures previously. Yet, when the power went out, three of the backup generators failed, and our site (as well as Craigslist, Yelp, and others) was out for 45 minutes.
The cause? A bug in the backup generator's software: http://365main.com/status_update.html
Shit happens. Sometimes it's not your fault. You still need to prepare for it.
Getting hacked isn't as potentially catastrophic as not having backups. With backups, being hacked can be recovered from and the service provider changed.
>To put it another way: the only reason to maintain your own backups of your site data -- aside from healthy paranoia -- is because you expect your service provider to fail at doing their job.
A business doesn't expect its premises to burn down, but most have fire insurance in the event that this happens. Even if I don't expect my service provider to fail, there's no way to know that they won't and it makes sense to deal with this risk if the cost of dealing with it is reasonable and the cost of not dealing with it is catastrophic.
As others have pointed out, it is your responsibility to check your bank statements to ensure that all is secure. But also, with banks you're dealing with money that is easily replaceable. If your card is compromised that money is gone but if it's their fault (or criminal) the bank will replace that money. Your data is not replaceable -- when it's gone, it's gone.
"Many of us host something somewhere. How many do backups? How many of us check the backups? How many do check the backups checking process?"
True you do have to stop somewhere -- but there is due diligence and there's negligence. Making sure you are making offsite backups and periodically testing them is due diligence. On the otherhand, completely relying on someone else to backup your critical data is negligence. If your data is at all important to you, then you need to have a copy. But at some point you have to accept that you've done enough.
You are responsible for reading your statement and ensuring that all activity is valid, much in the same way that you are responsible for ensuring the viability of your recovery strategy.
Reason 2: Many of us host something somewhere. How many do backups?
Anyone who wants to keep their data badly enough to pay (time, money) to do so. Coding Horror is a popular technology blog, and there's a significant cost in traffic and credibility when it fails.
Jeff Atwood lacked cognizance of the risks and made an incredibly poor technical and business decision in failing to validate correctness of his backups and implement a suitable recovery strategy.
My point is that you can spend lots of hours trying to backup your data and verify its correctness. But unless you put it back into the actual working environment and check every single bit of it, you cannot be sure it was a proper backup - not with 1GB of information and certainly not with 1TB. And then after you verified that you can verify that you have backups, something will fail and a bunch of people will say "if you didn't check the backups properly, it's your fault". I've seen backup systems fail in amazing ways and will probably never again believe that you can be "sure".
After all, their liability will almost certainly be less than the value of your data, in which case there are two reasons to keep extra copies, both business continuity reasons and direct economic ones.
Sure, it should be the bank's responsibility to keep accounts safe. However, the fact that bank accounts are considered to be "safe" is in a large part due to systemic factors like the existence of the Fed (as lender of the last resort), FDIC, general political climate of "too big too fail", etc. As mentioned above, banks actually do go under a lot; it's just not as noticeable to the clients as before [the central banks], but the cost is there. (And there's a whole discourse there of whether it's a good idea for monetary system to have this kind of environment in the long run.)
In any case, nothing of the sort applies to the hosting providers (or even can be applied, as insuring unique data is not the same as insuring amounts of money) -- so assuming that keeping data there is as safe as keeping money in the bank is not a viable comparison.
But for backups, at least in this (early) era of manage hosting/cloud, its probably not too onerous to test your plans etc...
It's a lesson on what can and can't be "just bought".
Security and backups both need some top-down hands involvement. Also US recent experience dikes and air defenses are important too.
The people who made your car were legally required to test your model of car by crashing it into another one and making sure the airbags work. This is also too expensive to test yourself. One does not modify an airbag system at all without retesting it.
Your hosting provider has no legal requirement to test their backup system. At best, they have a contractual obligation. And they don't know how to test recovering your site, because testing whether it works is a different process for each site. Additionally, it's cheap to test backup procedures. Most people have a spare computer somewhere (maybe not in the data center), and it should only take a few hours to restore a copy of your site. Once a year. For Chris's sake, you could probably do it in the background and put a movie on.
You can't outsource liability.
If YOU are not testing YOUR backups. YOU fail.
(I backup my Slicehost account with their official service. If they die and my backup dies with it, that's fine; I don't have the time or money to do anything better.)
So you've tested whether your toaster has proper grounding and other safety precautions, in case it shortcircuits? I bet you haven't and that why you should stop repeating that stupid soundbite. We all 'outsource' liability all the time: we pay others to perform services for us and hold them responsible for the proper execution of those services. This includes hosting content and backing up that content.
Nope. I have a fully paid and verified home owners insurance policy though that covers any probable loss from a toaster fire.
It's sort of like having an offsite backup of my important personal stuff.
Your example is really not a parallel to a data backup. It's difficult to fully test all of the toasters failure modes in a non-destructive way. It's easy to setup an automated backup to a remote location. Given the gmail storage limits, you could tar and gzip your files and email your gmail account with the backup data.
A better example in a "homeowner" realm is the flood pans that you can buy to put under your water heater or washing machine. You'd like to be able to trust that the manufacturer made a waterproof device, but at the same time it's cheap and easy to insure yourself against the most common failure modes.
The sound bite is not stupid. People who believe you can outsource all the messyness of keeping a website alive are continually bitten in the ass by EC2, Rackspace, etc. failures.
And look, I agree with you in many ways - people need to take responsibility for their own backups, sure. But there is a division of responsibility. I mean, even if all my backups are 100% perfect, I am still trusting the hosting provider to, well, keep the power on. Pay the peering bill. Keep the server temp down. Not go bankrupt tomorrow.
You can just follow this chain as far as you want. Whether you like it or not, you're utterly dependent on the DNS root server admins. There is nothing at all you can do to prepare yourself for their failure. I bet you can't generate your own electricity or grow your own food, either.
All of civilisation is built on co-dependency and delegation of responsibility. It's the only way to do anything complex. At some point, you must delegate. Atwood should have checked - but he was paying his host to do it. That's like having an employee whose job it is to do backups. At some point, you just have to let go and trust them. Otherwise you can never really do anything; you're caught up in checking minutia; I can give examples of this kind of leadership failure until my keyboard breaks.
2. Download my S3 backup script (or anyone's S3 Backup script) - http://github.com/leftnode/S3-Backup
3. Set up a cron to push hourly/daily/whatever tar's of your vhost's directory to S3.
Spend, like, $10 a month. Thats 30gb of storage, 30gb up and 30gb down. Now, I know that may not be a lot, but I doubt codinghorror.com had that much data.
In case anyone else is interested:
I wrote a bash script that uses s3sync (not written by me). You can thank the s3sync community for that! :-) http://s3sync.net/wiki
Confidentiality is only one aspect of security. Authenticity is also important in some cases: You might care if the NSA edits your backups so that after restoring them it looks like you said something you didn't really say.
A tip for folks using leftnode's setup: I didn't notice $gpgRecipient hidden at the end of the config file and chased around a bit looking for it.
I use it to backup all of my databases and entire vhosts directory every hour and night.
Let me know if you find any bugs!
Dead simple to specify what to backup, drop the config in /etc/safe and add a cron job.
This does not sound like the tweet of a man with backups.
Why would anyone consider a backup on the same VM a backup at all?
All your data should be kept in two places, hopefully geographically disparate in case there's a building fire or horrible storm or what not.
You should also test it on a regular basis to make sure it exists (and so you know what to do when the shit hits the fan)
[Also makes me wonder how to test my host's backups]
Make your own, don't rely on someone else.
Cache of the relevant blog post:
He also frequently said he was the world's worst coder, so...
If we are describing human conversation, then - um - I call bullshit. If you give advice, you most certainly imply (for a suitably non-formal definition of 'imply') authority, knowledge, etc. about the matter you give advice on.
Your example - or the way you spin it out - is very odd. If you give directions to some coffee shop, we should be able to assume that you know where the coffee shop is. I suppose at the very least we should be able to assume that you believe that you know where the coffee shop is. The guy gave advice on backing up - he passed along someone else's advice, and said he it was essential advice. The relevant switch for your analogy would be asking him to be able to describe the code in rsync or dd or whatever. Nobody is doing that. They're simply pointing out that he didn't practice what he preached (with various degrees of unfortunate serves-him-right schadenfreude, but never mind that for the moment).
Answering a question implies a claim of authority. Offering advice on the internet (unsolicited) is the equivalent of building a freaking billboard with directions to the coffee shop.
I learned it the hard way.
It should be. It might even be in some of the plans. But most of the time it doesn't happen.
I have some backups and I'll try to get it up and running ASAP!"
Handling a failing disk is the job of a sysadmin, not a programmer.
Sometimes I'm extremely shocked at how cavalier professionals are at maintaining your data and I've worked with plenty.
And it isn't just the small ones either.
For the love of - insert your favorite deity here - please try to restore your backups, and try to do so on a regular basis. If not all you might have is the illusion of a backup.
It is a very easy trap to fall in to, and I really am happy this guy writes about it because it seems there are still people that feel that if their data is in the hands of third parties that it is safe.
That goes for your stuff on flickr, but it also goes for your google mail account and all those other 'safe' ways to store your data online. In the end you have to ask who suffers the biggest loss if your data goes to the great big back-up drive in the sky, the service provider or you. If the answer is 'you' then go and make that extra backup.
Sigh. When we will ever learn?
Unfortunately we don't have the images, but it looks like most of the site is back up at least. It will probably be more work for him to re-integrate it into the cms though.
It ends in what is now irony:
"If backing up your data sounds like a hassle, that's because it is. Shut up. I know things. You will listen to me. Do it anyway."
I'm not going to imply anything negative about him for losing the site temporarily. We all have "learning experiences"...
I just hope that when he does come back, he posts an insightful analysis of how he could have done more for his own reliability, rather than point fingers at a vendor.
That's not too far from saying that it's your fault if you get hit by a car while walking across a crosswalk because you didn't jump out of the way fast enough.
They failed, they failed in a catastrophic way, and they deserve to have it made public knowledge and to lose business over it. He should do a better job backing up his own stuff, but he's right to be angry with his hosting provider and to call them out.
An analysis posted after the fact can lay out where the technical failures took place, and that is the right time to describe issues with the host.
Pointing fingers in anger is very frowned upon in all organizations I work with. It implies a lack of ownership of your own products and a lack of maturity in handling your business affairs. Organizations that pursue blame before solutions do not have a positive culture -- they have a fear-based culture.
To answer your direct question, nothing he says will change the fact that the vendor screwed up. Your analogy is correct that it is the drivers fault if someone hits you, but flawed in that you cannot absolve all responsibility for your own safety when crossing streets.
When a vendor screws up, the level of professionalism that you portray when dealing with the situation says a great deal about yourself.
I really took issue with your initial comment because "pointing the finger" has some connotation of assigning blame unfairly or unreasonably; I 100% agree that companies with a culture of blame are poisonous, and that people should err on the side of accepting too much responsibility rather than too little, and do their best to not pass blame on to other people.
There's certainly a line, however, across which I think it's reasonable to call someone else out. Where that line is depends on the situation and your relationship: the bar for doing it within your team is astronomically high (you should basically always deal with those things internally), within even the same company is still incredibly high (likewise), but it's lower when it comes to vendor relationships. Where you draw the line is probably different from where I draw it.
So while I agree that he should have waited, I don't think that publicly expressing his anger with his hosting provider after the fact would count as "pointing the finger" or "passing the buck" or otherwise indicative of a lack of personal responsibility; to me it would be understandable frustration and anger out of having been so dramatically let down by a third-party you were contracting with. And honestly, that sort of negative public publicity is one of the strongest checks we have on companies, be they hosting companies or retail stores or any other type of establishment.
More to the point: even if he did have backups, if they really lost his data and were unable to recover it themselves, I think he'd still be justified in outing their failure publicly after the fact. But again, you're definitely right that he should have waited and given the host a chance to resolve the issue before saying anything.
/me does a git pull from his weblog
--PidGin128 via bmn
When people make mistakes and save them over the top, there among the smug laughter are a few people on the user's side reminiscing about long forgotten systems of old which have versioning filesystems by default so recovery is only a moment away.
I haven't seen any replies in this thread along those lines - everyone is putting it firmly on the system administrator or the user. Would it be so hard during an install for a program to say "and now enter an encryption password and an ssh server address where I can backup to nightly"? If it's so simple you can script it yourself in a few minutes, isn't it so simple that many/most systems should come with that themselves?
It's long past the time where "computer lost my data" "well you've only yourself to blame" should be considered an old fashioned attitude.
I didn't actually lose data like Jeff, but the datacenter the server was in decided to kill the power to the machine 2 weeks before the scheduled date (poor processes and a move to a different building) and I didn't get any notification before it happened. It took another 3 weeks for them to ship my machine back to me. Becuase I had nightly backups I was able to restore email and the photo site to a new Linode instance in a few hours. Without those backups I would have been hurting bad.
<p>inane comment showing <b>strong naivete</b></p>
<p>Confident summary with <b>conclusive statement</b>
displaying no comprehension of the quoted text
<p>Buy a Visual Studio Plugin from my sponsor!</p>
I guess I can not link to the blog post...
Well, I sort of can:
This assumes you're running your blog on your own VPS.
Do not use SSH with Expect.
Do not use them here or there.
You will not use them anywhere.
ssh server tar zc /srv/http > http-backup.tar.gz
But I wonder if this eventually turns out as a case study for "How to backup an entire site using the archives and search engine caches"
I know it's hard; but everything is _there preserved_ anyway. So it _is possible_
EDIT: Back at http://superuser.com/questions/82036/recovering-a-lost-websi...
This is really bad news for any service, but seriously... codinghorror.com (shakes head)
Rule 16. If you fail in epic proportions, it may just become a winning failure.
and then looking at cached results?