

Why Hadoop MapReduce needs Scala - mischa_u
http://speakerdeck.com/u/agemooij/p/why-hadoop-mapreduce-needs-scala

======
posco
If you're interested in scalding, Edwin Chen's awesome recommendation post
can't be beat:

[http://blog.echen.me/2012/02/09/movie-recommendations-and-
mo...](http://blog.echen.me/2012/02/09/movie-recommendations-and-more-via-
mapreduce-and-scalding/)

------
oacgnol
Having written MR jobs in the original Java format, I can tell you that it's
quite valuable to do so before learning a DSL like Scalding or Pig.

I found [http://www.michael-noll.com/tutorials/running-hadoop-on-
ubun...](http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-
single-node-cluster/) to be a very good tutorial for a quickstart guide to
Hadoop.

~~~
eshvk
This is very true. It definitely helps to think at multiple layers of
abstraction when writing Pig code so that you can optimize it carefully.
Ultimately though, I have been rather unsatisfied with the limits to which I
can push Pig. There are performance gains to be had by writing raw java code
(I understand the irony of referencing java as if it was c) and there are some
optimizations which you can do in Java which are relatively harder to in Pig.
I am pretty excited about Scoobi though. If it does allow me to write in a
more concise language without sacrificing speed or power of the underlying
Hadoop API, that would be awesome.

~~~
oacgnol
Agree. Some of the performance tuning with Hadoop could only be done if you
knew how it worked underneath and if you know how the clusters are set up. I'm
also not sure if Pig/Scalding/Scoobi can do this already, but it'd also be
nice if they could abstract tunable settings as well on a job-level basis.

By the way, hook 'em!

------
michaelochurch
My personal experience is that, while Java has a lot of great technologies
associated with it, such as Hadoop, they're hard as hell to learn. The problem
isn't the systems. It's that large systems in Java become cluttered with
accidental complexity, which makes easy things hard to write and read and hard
things nearly insurmountable. I'm a major fan of any effort to put support the
best parts of the Jawva ecosystem on these new, far better, JVM languages.

