Why Hadoop MapReduce needs Scala

posco · on March 28, 2012

If you're interested in scalding, Edwin Chen's awesome recommendation post can't be beat:

http://blog.echen.me/2012/02/09/movie-recommendations-and-mo...

oacgnol · on March 28, 2012

Having written MR jobs in the original Java format, I can tell you that it's quite valuable to do so before learning a DSL like Scalding or Pig.

I found http://www.michael-noll.com/tutorials/running-hadoop-on-ubun... to be a very good tutorial for a quickstart guide to Hadoop.

eshvk · on March 29, 2012

This is very true. It definitely helps to think at multiple layers of abstraction when writing Pig code so that you can optimize it carefully. Ultimately though, I have been rather unsatisfied with the limits to which I can push Pig. There are performance gains to be had by writing raw java code (I understand the irony of referencing java as if it was c) and there are some optimizations which you can do in Java which are relatively harder to in Pig. I am pretty excited about Scoobi though. If it does allow me to write in a more concise language without sacrificing speed or power of the underlying Hadoop API, that would be awesome.

oacgnol · on March 29, 2012

Agree. Some of the performance tuning with Hadoop could only be done if you knew how it worked underneath and if you know how the clusters are set up. I'm also not sure if Pig/Scalding/Scoobi can do this already, but it'd also be nice if they could abstract tunable settings as well on a job-level basis.

By the way, hook 'em!

michaelochurch · on March 28, 2012

My personal experience is that, while Java has a lot of great technologies associated with it, such as Hadoop, they're hard as hell to learn. The problem isn't the systems. It's that large systems in Java become cluttered with accidental complexity, which makes easy things hard to write and read and hard things nearly insurmountable. I'm a major fan of any effort to put support the best parts of the Jawva ecosystem on these new, far better, JVM languages.