
CommonCrawl: an open repository of web crawl data that is universally accessible - abhishektwr
http://www.commoncrawl.org/
======
abhishektwr
Just a pointer, the code for CommonCrawl Project is available on Github
<https://github.com/commoncrawl/commoncrawl>

------
pooyak
thread on HN from when common crawl was announced, interesting info there:
<http://news.ycombinator.com/item?id=3209690>

------
fungi
If you into said things then maybe <http://yacy.net/> (p2p crawler and search)
will be useful to you as well.

------
Titanous
The latest data available is from 2010-09-25, which seems to be too old to be
useful for most things.

------
rgrieselhuber
It would be great to hear more about the tools they are using to crawl and
potentially open it up to more people who want to contribute computing
resources.

------
emilis_info
This one may be also interesting for open data devs: <http://scraperwiki.com/>

------
Aloisius
I hear a lot of people are crunching on CommonCrawl's data. It'll be
interesting the type of stuff people come up with!

------
nithinag
This looks really nice!

