This is very true. It definitely helps to think at multiple layers of abstraction when writing Pig code so that you can optimize it carefully. Ultimately though, I have been rather unsatisfied with the limits to which I can push Pig. There are performance gains to be had by writing raw java code (I understand the irony of referencing java as if it was c) and there are some optimizations which you can do in Java which are relatively harder to in Pig. I am pretty excited about Scoobi though. If it does allow me to write in a more concise language without sacrificing speed or power of the underlying Hadoop API, that would be awesome.
Agree. Some of the performance tuning with Hadoop could only be done if you knew how it worked underneath and if you know how the clusters are set up. I'm also not sure if Pig/Scalding/Scoobi can do this already, but it'd also be nice if they could abstract tunable settings as well on a job-level basis.
My personal experience is that, while Java has a lot of great technologies associated with it, such as Hadoop, they're hard as hell to learn. The problem isn't the systems. It's that large systems in Java become cluttered with accidental complexity, which makes easy things hard to write and read and hard things nearly insurmountable. I'm a major fan of any effort to put support the best parts of the Jawva ecosystem on these new, far better, JVM languages.
http://blog.echen.me/2012/02/09/movie-recommendations-and-mo...