Hacker Newsnew | comments | show | ask | jobs | submit login

Last time I checked, Geocities didn't support HTTP keep-alive. They do however support gzip compression, which wget doesn't support. Also, as you noticed, including "Googlebot" in the User-Agent works around the bw limit.

Did you archive regional versions (de.geocities.com, etc.) too? I see that (any-subdomain).reocities.com currently is just an alias for reocities.com. I have ~ 12 GB of archived data from de.geocities.com (couldn't find more links and didn't have the time later on), saved in the same format as wget (mtime set according to Last-Modified header etc.) if you are interested.

Anyway, good work!




I thought about doing those too (esp. uk is large) but never got around to it, so yes, if you have it I would be much obliged.

Can you send me an email on how to receive the data ?

-----


We snagged a bunch of the other domains.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: