Common Crawl announces Open Source Big Data code contest winners (commoncrawl.org)
27 points by Aloisius 1743 days ago

I like how this contest shows that anyone can ask questions that previously only Google, Microsoft (Bing) and a handful of 2nd tier search engines could ask. Now if we can just get a simple query language for it all. I could then pump in $100 in quarters, ask my question, and wait a couple hours. Isn't this what the information ask was supposed to get us?

Linking Entities to Wikipedia is awesome. I love the idea of Online Sentiment Towards Congressional Bills but it's too bad they didn't show their results.

Don't worry, we're working on it - we should have our results up by this weekend. We're planning on doing something like http://www.albertwavering.com/projects/commoncrawl/bill.html to show our results, but I would love to hear suggestions.

Oh cool. How many bills are you going to show? Will there be histograms or some kind of visualization or just the lists?

We are looking at about 50 bills this time around. We really wanted to do a histogram, but we didn't have time to solve the problem of distinguishing between when things are crawled and when things are published.

So cool you are putting together a presentation of results!!

Glad you like my linking entities to wikipedia entry. Still need some extra work to make a complete/ready-to-use corpus out of the 5 billion webpages

The YC crowd needs to pay attention to this. There will be a new wave of startups based on the common crawl as it develops.

Is the "Facebook infection" code open source? There is no link to the code.

