

Mining Wikipedia with Hadoop and Pig for Natural Language Processing - ogrisel
http://blogs.nuxeo.com/dev/2011/01/mining-wikipedia-with-hadoop-and-pig-for-natural-language-processing.html
How to build multilingual training corpora for OpenNLP Named Entity Recognition using the Apache BigData tools.
======
torial
For others, a warning. There isn't any technical discussion, it is really just
a demo.

~~~
ogrisel
Have you scrolled down the page a bit? There are commented source code samples
and links to the generated outputs, source code and instructions to run it
yourself if you wish.

