Hacker News new | comments | show | ask | jobs | submit login

Is there any small chance that you could release some of the code that you used to scrape it? I'm interested in archiving some site and wondering what you used to execute it? Just scripting a lot of wgets?

Yes, just a bunch of wgets. That's the principle anyway.

But it is quite a bit more involved because you somehow have to avoid duplication and retrying of stuff that simply doesn't exist. Then there's the problem that the urls weren't case sensitive, which causes wget to retrieve much more than necessary.

The code I wrote is pretty geocities specific, I highly doubt it has any value outside of that (other than a sustained DDOS maybe ;) ).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact