Hacker News new | past | comments | ask | show | jobs | submit login
Jeff Atwood asks for help restoring a site from Internet caches (superuser.com)
59 points by RyanMcGreal on Dec 11, 2009 | hide | past | favorite | 23 comments



I found about 200 of his pages (with images and other dependencies) in the Pinboard archives, and forwarded them along to the guy.

I wonder if there is a nice market niche for a 'panic button' recovery tool that scrapes web caches, internet archive and so on immediately after you lose a site.


So If I were to make a business out of this, would you sue me for stealing your idea ala TC? :)


Just give me a freebie when all my stuff disappears and I'll call it even.


So we have a legally binding contract then? Awesome!


What are the Pinboard archives?


Pinboard.in crawls and stores people's bookmarks as a paid service.

A number of Pinboard users had bookmarked articles from Atwood's site before the data loss, so I had a stored copy of all the page content and dependencies for those articles.


I posted a solution here (http://bit.ly/4AGCju) (to get back the images), but looks like nobody likes it. May be that was a bad way to approach the problem.

Anyway I am looking forward to more news on how they are going to resolve the issue.


I saw this on superuser too (http://superuser.com/questions/82036/recovering-a-lost-websi...), i'm assuming it was you. fairly clever solution :)


Brilliant!


Clever!


it's easy to retrieve google caches with a ruby script. here's one i used in the past:

http://pastie.org/739757

edit: if you use this, add a sleep! whoops. i didn't get banned though, shrug.


Warrick works better for that, at least: http://warrick.cs.odu.edu/warrick.html

It sleeps in between queries, so you don't get temporarily banned from Google.

I think it's not currently working for Yahoo or MSN/Bing. Fixing that might be easier than doing everything else manually.

Edit: I've gotten a response from Frank McCown, creator of Warrick, that he's looking into it.

Edit 2: He'll try to update it next week.


Warrick looks like exactly what he needs.


His biggest problem appears to be the images (and possibly other resources included in the pages). It's pretty much a given he'll be able to recover the text itself.


The permanent loss of the images makes it a greater tragedy since half the content in any given post of his consists of images.


There are many, many images in the pinboard archive, a couple of hundred posts' worth. I don't know if he also has other sources from which to retrieve them, he doesn't seem to have grabbed them from pinboard yet. But a good chunk of his stuff will be recovered, images and all.


He wrote a blog post (maybe more than one) about how he was hosting his images from Amazon S3. Did he not follow through, or did he switch away from that?


Old saying is "If it ain't tested, is is broken." Old corollary: "If it is tested, it might still be broken." And: "If it is a backup, it might still be broken."

Seems like a good idea to occasionally spend the time and totally fill a sparkley clean image with your backups and see how well it fares.

My sympathies.


Yeah, time to double-check your own backup procedures, everyone. You don't want to be posting similar questions, right?

Remember the old saying: if you haven't tested restoring your backups, then you don't really have backups. (Not that I've ever been good about this myself.)


sorry would help but I'm off to do a backup


I guess no podcast this weekend.


This weeks podcast: How to empathize when someone screws up.

Edit: from his tweet -

"on the podcast this week (unpublished, because I suck) @spolsky and I discussed "worst case scenario" biz outcomes. prophetic!"

http://twitter.com/codinghorror/status/6582023681


These seem to be up on archive.org through 2008. The rest are available through Google Reader.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: