... and you wouldn't have.
Reminiscent of the Compaq one:
BIOS NOT (C) IBM 1982
Or something to that effect!
Here is what I use, now modified with your trick:
wget -r -nv -np -nc -i $URLFILE
With a separate process putting file in batches of 50,000 files.
That way you get one wget process to do a boatload of work instead of firing up a new one for every file. That also helps in re-using the connections.
Did you archive regional versions (de.geocities.com, etc.) too? I see that (any-subdomain).reocities.com currently is just an alias for reocities.com. I have ~ 12 GB of archived data from de.geocities.com (couldn't find more links and didn't have the time later on), saved in the same format as wget (mtime set according to Last-Modified header etc.) if you are interested.
Anyway, good work!
Can you send me an email on how to receive the data ?