

Ask HN: Setting up a cluster to learn Hadoop (using Amazon EC2)? - tomrod

I've heard from various people that a good way to learn Hadoop is to set up a cluster on Amazon EC2. Do any other HNers have experience with this, or tutorials/resources available to learn more about this before I commit resources to Amazon?<p>Thanks in advance!
======
stevencorona
Are you looking to learn the ops side or just want to play with Hadoop to
learn how it works and how to use create jobs?

If you want to learn the "ops" side, EC2 is great for building clusters,
tearing them down, testing different configurations, etc.

If you just want to learn how to start using Hadoop, writing jobs - don't even
bother setting up a cluster. You can use it directly with Eclipse to run a
single-node setup on your machine. Way less complexity and much easier to
start learning with.

~~~
tomrod
Definitely not looking to learn the ops side--wanting hands on with Hadoop. Is
working with Hadoop just an Eclipse plugin then, like with Python? Can I run
it outside of Eclipse (I tend to stick to vim or geany for coding)?

------
robdoherty2
Echoing the previous comment somewhat-- what about hadoop do you want to
learn?

If you want to get practice with map-reduce patterns themselves, but not spend
much time worrying about the ops associated with setting up hadoop clusters,
you can play with Elastic-Map-Reduce (EMR) on AWS.

There is even a great python-based open source tool called mrjob
(<http://packages.python.org/mrjob/index.html>) made by some folks at Yelp
that makes using EMR really easy. There are a bunch of great tutorials already
out there.

