It's been talked about a few times in this thread already so I'll try not to rehash too much. We think Spark is way cool but disagree that it's a complete replacement for MapReduce. There were people else where in this thread who described pretty well their use cases that didn't work with Spark.

My experience is that Hive is pretty widely used, I have a lot fewer data points about Pig. We're not particularly attached to either of these. That sentence was more to indicate that we'd be adding some sort of interfaces for specific data types.

Seems like we got our history wrong here. When was YARN added we'll add a correction.

Hope this helps clear things up.

YARN was introduced with Hadoop 2.0, by MAPREDUCE-279: https://issues.apache.org/jira/browse/MAPREDUCE-279

Note that YARN also supports Docker containers as of Hadoop 2.6...

YARN-1964: Create Docker analog of the LinuxContainerExecutor in YARN https://issues.apache.org/jira/browse/YARN-1964


I'll stick with http://www.dbms2.com/2014/04/30/spark-on-fire/ however, for the most part. (The mid-2014 PR blitz I thought I was engineering didn't actually happen.)

