To keep your Y Combinator dataset of posts ( http://news.ycombinator.com/item?id=182374 ) and users ( http://news.ycombinator.com/item?id=183706 ) current and indexed, try our command line and web service search utilities. This 15KB archive is available by accessing http://www.rushy.com/shetland-pub20080508.tar.gz
Having a look through the utilities, these seem to be pulling the data from here directly.
Would it not be more efficient to co-ordinate with pg on some kind of organised periodic data dump that could be mirrored efficiently?
I've got a decent amount of space on a decent box with a sufficiently fat connection. Mirroring pg's data dump on there wouldn't be a problem and would save this site being spidered by umpteen different people.
Would it not be more efficient to co-ordinate with pg on some kind of organised periodic data dump that could be mirrored efficiently?
I've got a decent amount of space on a decent box with a sufficiently fat connection. Mirroring pg's data dump on there wouldn't be a problem and would save this site being spidered by umpteen different people.