

Why Isn't There a Map-Reduce Language? - stuartaxelowen

This has been eating me up lately - it seems like emerging cloud data processing systems are increasingly similar: they rely on mostly functional transformations of datasets in directed acyclic graphs on clusters.  For example:<p>- Apache Spark<p>- Apache Storm<p>- Microsoft Dryad<p>- Apache Hadoop<p>- etc.<p>Other differences are how data moves in&#x2F;out&#x2F;around these systems - but these seem less important.  If we had a single language, we could have greater portability between cluster computing solutions, and it would allow more work to go into optimizing performance instead of designing the next way of talking about map reduce.  Apache Pig exists, but it does not seem general enough.<p>Is there anything else out there?
======
scott_s
I work on Streams at IBM. We target streaming, not MapReduce, but we may be
closer to what you're thinking of. Our language, SPL, has three main
abstractions: operators, tuples and streams.

Development resources:
[https://developer.ibm.com/streamsdev/](https://developer.ibm.com/streamsdev/)

Academic paper on the language, SPL: [http://www.scott-
a-s.com/files/ibm_tr_2014.pdf](http://www.scott-a-s.com/files/ibm_tr_2014.pdf)

Some code, to give you more of a feel what it looks like; project head:
[https://github.com/scotts/streamsx.demo.logwatch;](https://github.com/scotts/streamsx.demo.logwatch;)
actual source code:
[https://github.com/scotts/streamsx.demo.logwatch/blob/master...](https://github.com/scotts/streamsx.demo.logwatch/blob/master/language/com.ibm.streamsx.demo.logwatch.language/LogWatch.spl)

