Hacker News new | comments | show | ask | jobs | submit login

Yes, absolutely. That's how most of the work got done. The biggest problem when you start distributing it is to avoid duplication.

I took some shortcuts there, so I'm fairly sure that a portion of what I've downloaded is in duplicate, but that will be resolved in a merge step.

Right now the files are spread out over 7 machines, the one I started on is the 'master', and then there are 6 others that have a portion of the data on them.

Each of those has been told to fetch only from a restricted area of geocities, but the master one had no such restrictions, so chances are there is some duplication between the master and the individual slaves.

Merging all the data and importing the user accounts is going to take a couple of days at least, it's quite a collection of files. I have no stats yet but when I'm done I'll do a write-up on the main statistics.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact