
Announcing the Public Terabyte Dataset project - MaysonL
http://bixolabs.com/2009/11/01/announcing-the-public-terabyte-dataset-project/
======
adatta02
"This is a high quality crawl of top web sites, using AWS’s Elastic Map
Reduce, Concurrent’s Cascading workflow API, and Bixolab’s elastic web mining
platform."

so two questions. what exactly is a "high quality" crawl? and who is going to
be the judge of "top websites"?

