

Infochimps cluster chef - helwr
http://github.com/infochimps/cluster_chef

======
mrflip
DrawnToScale, Infochimps and Cloudera are having a [Hadoop on Chef on Cloud]
hack day this Monday before the Hadoop Summit. Holler (@mrflip) if you're
interested in the hack day, or if you'll be at Hadoop Summit and want a demo.

We've been using cluster_chef for a month+ at infochimps, and it's awesome.
Want a throwaway 4-machine DB cluster to pound on? bam. done. Need to shut
down two dozen m1.large instances, spin them back up as 30 c1.xlarge to
process a massive CPU-intensive job, then put the whole thing away for the
weekend? It spot-prices, instantiates, provisions and designates all the
nodes, including EBS volume attachment and service discovery. The kind of
thing you'd allot a platoon-day for is now more like an intern-morning, most
of it waiting for the spot-price bid to come through.

------
hooande
From my understanding, the biggest barrier to using hadoop/cassandra is that
they are very difficult to install. Unless you have a lot of experience with
similar technologies, the time spent learning new concepts isn't worth it for
most startups.

Simple question: Does chef make it any easier to install/configure hadoop and
cassandra? Or does this make it easier to deploy them once you already have
things set up?

~~~
mrflip
Much easier to configure and install.

* You get Pig, Wukong and Dumbo out of the box. Don't learn Java hadoop, use one of those. * Standardized setup which really helps if you have to seek IRC or other outside help * Reasonable default parameters tuned for each of the EC2 instance sizes. * Spinning up and tearing down a cluster is so much easier: experiment away. EC2-backed instances help this too.

Now of course you're trading the complexity of setting up Hadoop with the
complexity of setting up chef and poolparty and EC2 -- but those are far less
esoteric; and they either work or they don't.

