Jeff Atwood asks for help restoring a site from Internet caches

idlewords · on Dec 11, 2009

I found about 200 of his pages (with images and other dependencies) in the Pinboard archives, and forwarded them along to the guy.

I wonder if there is a nice market niche for a 'panic button' recovery tool that scrapes web caches, internet archive and so on immediately after you lose a site.

pavs · on Dec 11, 2009

So If I were to make a business out of this, would you sue me for stealing your idea ala TC? :)

idlewords · on Dec 11, 2009

Just give me a freebie when all my stuff disappears and I'll call it even.

pavs · on Dec 11, 2009

So we have a legally binding contract then? Awesome!

ericb · on Dec 12, 2009

What are the Pinboard archives?

idlewords · on Dec 13, 2009

Pinboard.in crawls and stores people's bookmarks as a paid service.

A number of Pinboard users had bookmarked articles from Atwood's site before the data loss, so I had a stored copy of all the page content and dependencies for those articles.

niyazpk · on Dec 12, 2009

I posted a solution here (http://bit.ly/4AGCju) (to get back the images), but looks like nobody likes it. May be that was a bad way to approach the problem.

Anyway I am looking forward to more news on how they are going to resolve the issue.

karanbhangui · on Dec 12, 2009

I saw this on superuser too (http://superuser.com/questions/82036/recovering-a-lost-websi...), i'm assuming it was you. fairly clever solution :)

yannis · on Dec 12, 2009

Brilliant!

tectonic · on Dec 12, 2009

Clever!

xenophanes · on Dec 11, 2009

it's easy to retrieve google caches with a ruby script. here's one i used in the past:

http://pastie.org/739757

edit: if you use this, add a sleep! whoops. i didn't get banned though, shrug.

pronoiac · on Dec 11, 2009

Warrick works better for that, at least: http://warrick.cs.odu.edu/warrick.html

It sleeps in between queries, so you don't get temporarily banned from Google.

I think it's not currently working for Yahoo or MSN/Bing. Fixing that might be easier than doing everything else manually.

Edit: I've gotten a response from Frank McCown, creator of Warrick, that he's looking into it.

Edit 2: He'll try to update it next week.

tectonic · on Dec 12, 2009

Warrick looks like exactly what he needs.

pvg · on Dec 11, 2009

His biggest problem appears to be the images (and possibly other resources included in the pages). It's pretty much a given he'll be able to recover the text itself.

rayvega · on Dec 12, 2009

The permanent loss of the images makes it a greater tragedy since half the content in any given post of his consists of images.

pvg · on Dec 12, 2009

There are many, many images in the pinboard archive, a couple of hundred posts' worth. I don't know if he also has other sources from which to retrieve them, he doesn't seem to have grabbed them from pinboard yet. But a good chunk of his stuff will be recovered, images and all.

bmm6o · on Dec 12, 2009

He wrote a blog post (maybe more than one) about how he was hosting his images from Amazon S3. Did he not follow through, or did he switch away from that?

wglb · on Dec 12, 2009

Old saying is "If it ain't tested, is is broken." Old corollary: "If it is tested, it might still be broken." And: "If it is a backup, it might still be broken."

Seems like a good idea to occasionally spend the time and totally fill a sparkley clean image with your backups and see how well it fares.

My sympathies.

thorax · on Dec 11, 2009

Yeah, time to double-check your own backup procedures, everyone. You don't want to be posting similar questions, right?

Remember the old saying: if you haven't tested restoring your backups, then you don't really have backups. (Not that I've ever been good about this myself.)

brc · on Dec 11, 2009

sorry would help but I'm off to do a backup

pan69 · on Dec 11, 2009

I guess no podcast this weekend.

pavs · on Dec 11, 2009

This weeks podcast: How to empathize when someone screws up.

Edit: from his tweet -

"on the podcast this week (unpublished, because I suck) @spolsky and I discussed "worst case scenario" biz outcomes. prophetic!"

http://twitter.com/codinghorror/status/6582023681

thras · on Dec 12, 2009

These seem to be up on archive.org through 2008. The rest are available through Google Reader.