Hacker News new | past | comments | ask | show | jobs | submit login

> > Google have manually mocked up their early product? How would

> Crawl an intentional community (remember webrings?) or other small directed subset of the web and see if you're able to get better search results using terms you know exist in the corpus, rather than all of the Internet.

But that isn't a mock up, it's the real thing but on a smaller dataset. If you're going to do the real thing anyway, why not run it on all the data you can?

After all, the throttling factor to release is in the engine, not in the dataset. If you're going to write the full engine anyway, there's nothing to be saved by limiting it to a subset of the data.




> why not run it on all the data you can

Because more data requires more cleaning and standardization (with more edge cases). It also requires a bigger scale to obtain and process.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: