

Ask HN: Learning by doing : Hadoop  - phenomenon

Hi,<p>I picked up a Hadoop couple of months back and am finding different ways to use it so as to apply what I have learned so far (MapReduce, Hive, Pig, etc).<p>As Hadoop is really used in environments where the data to be queried is large, I started looking around for such kind of data. I came across the Wikipedia data (available for download).<p>Now I am trying to list out the questions that I could as this data.<p>What are the questions that you want answered from the data available in the Wikipedia data?<p>This will help me write some useful MapReduce code , Hive quries or Pig scripts to improve my skills.<p>I just feel that learning by doing is the best form of learning.<p>Thanks.
======
peder541
You could replicate
<http://en.wikipedia.org/wiki/Most_common_words_in_English> using Wikipedia as
your Corpus

~~~
eb0la
More stuff to find: 1\. - Most referenced wikipedia articles. 2\. - Most
referenced websites in wikipedia. 3\. - Calculate the deegrees of separation
from Kevin Bacon (that's en.wikipedia.org/wiki/Kevin_Bacon ) for a given
wikipedia article.

#1 is interesting. #2 is valuable for SEO. #3 makes a good post on HN and will
get you hired somewhere.

