

Ask HN: Reference or Reading materials for Learning Hadoop - vbv

I am trying to learn Hadoop and was wondering if there are any references, tutorials, or papers that HNers use that I could make use of and make learning Hadoop more simpler, efficient, and productive.
======
eshvk
So learning Hadoop can be split up into several pieces:

1\. Learning the idea of Map-Reduce. This is fairly easy and you could browse
through the original research paper and figure that out.

2\. Learning the weird, wild animal called Hadoop (with its multiple API
clusterfuck). This is going to be much harder. Presuming you know Java, the
first thing you want to do is get a Cloudera VM (because you don't really want
to spend time learning how to install hadoop at first) and start figuring out
how to build Word Count inside the VM. This should give you some insight (not
much though) in how the API works.

3\. Figure out more complicated stuff you want to do with Hadoop and start
working on it. Get a copy of Tom White's Hadoop book (From what I remember six
months back, the API was hopelessly outdated but the ideas are awesome) and
Jimmy Lin's book on text processing with Map Reduce (
<http://lintool.github.com/MapReduceAlgorithms/> ). Personally, I loved
Jimmy's book not because of the machine learning content but because of the
design patterns for Hadoop that he had embedded in there.

~~~
vbv
Thank you for your reply. I have ordered Tom White's book and it should be
here in 5 days I hope. In the meantime I am reading the Jimmy Lin's book
online. One of the problems I am trying to solve is that I have 100+ VMs that
are collecting logs and I wanted to find a better way to parse those logs and
be able to make sense out of it as they are running against a single machine
and it's hard to understand what happened in which VM at any given time. Many
thanks for the information.

