Say you wanted to generate a heatmap using MapReduces. How would you do it? You'd probably need something like this:
1. Map location data points to (region -> weight)
2. Reduce (region -> weight) to (region -> sum of weights)
3. Map data points to (region -> 1)
4. Reduce (region -> 1) to (region -> sum of points in region)
5. Shuffle output of #2 and #4
6. Reduce (region -> sum of points) and (region -> sum of weights) to (region -> average weight)
The pipeline API makes it easy to describe the dependencies between these separate MR jobs, wait for each segment to complete before triggering the next, and lets you reuse this logic as part of a larger computational workflow.
The Mapper framework/MapReduce integration part is not ready yet, but we're getting there. Release early/often~
ps. For those of you who know how to do a heatmap in a single MR: I'm just trying to demonstrate why you may need to pass inputs/outputs between MR jobs.
When you break up your application up as a set of tasks, which GAE pretty much requires, it's hard to execute them in any sort of order. Your program is not executing sequentially so there aren't any while loops and if-then statements to make sure calculations happen in a certain sequence. So you need a higher level kind system that can order everything. Node.js which is a similar sort of event driven system, but not distributed, has a number of similar libraries https://github.com/ry/node/wiki/modules under Flow control / Async goodies.
What I mean by this is that we're not doing the same thing as Cascading (http://www.cascading.org/), which requires you to transform your problem into the tuple-space domain. Stream processing frameworks like Cascading are for green-field implementations that maximize incremental performance.
On the other hand, the Pipeline API is task oriented. Developers use it with a procedural approach. The focus is on parameter and return value passing and scheduling. It's easy to reuse your existing code in this framework. Think of it as something closer to a parallelizable Bash than a data processing framework.