
Goodbye MapReduce, Hello Cascading - soundsop
http://blog.rapleaf.com/dev/?p=33
======
jacobscott
This is cool. For anyone who has cascading experience: what tuning can you do
for the Hadoop jobs/does it autotune? How does performance compare to running
multiple MapReduce jobs in sequence?

It would be awesome to see this compared to Microsoft's Dryad
(<http://research.microsoft.com/research/sv/Dryad/>) which also supports DAG-
like large scale computing. I don't think Dryad is publicly available
though...

------
mmcgrana
I'd be interested in learning what computations they do that "require up to
TEN MapReduce jobs to execute in sequence". As a point of comparison, the
Goolge MapReduce paper from 2004 says that their production web search
indexing system "runs as a sequence of five to ten MapReduce operations."
[labs.google.com/papers/mapreduce-osdi04.pdf]

~~~
fizx
I routinely run composite jobs of thousands of map-reduces. Imagine an
iterative machine learning algorithm where each epoch is a map-reduce job.
Imagine a meta-job that runs dozens of these. It's not to bad with Hadoop's
job control system. Where cascading really would shine (haven't tried it) is
when these jobs require data joins.

Edit: This could be a case of map-reduce fail(tm). But I don't think so. I
imagine Google's PageRank computation takes more than ten iterations to
converge, and each iteration is a map-reduce job.

------
mikkom
Looks like someone is trying to implement erlang in java.

