Hacker News new | comments | ask | show | jobs | submit login

This sounds bizarre. Spark is the replacement for MapReduce, with Map and Reduce being just 2 of its 15 (?) primitives. Who needs another MapReduce engine?

The bit about "We'll get around to Hive and Pig eventually" is also odd, as they're approaching legacy status too. (Albeit not there yet; please recall that I'm somewhat of an Impala skeptic, and of course there's the Tez booster to Hive.)

Also bizarre are some of the history claims in the article, e.g. that YARN is part of more or less original Hadoop.

It's been talked about a few times in this thread already so I'll try not to rehash too much. We think Spark is way cool but disagree that it's a complete replacement for MapReduce. There were people else where in this thread who described pretty well their use cases that didn't work with Spark.

My experience is that Hive is pretty widely used, I have a lot fewer data points about Pig. We're not particularly attached to either of these. That sentence was more to indicate that we'd be adding some sort of interfaces for specific data types.

Seems like we got our history wrong here. When was YARN added we'll add a correction.

Hope this helps clear things up.

YARN was introduced with Hadoop 2.0, by MAPREDUCE-279: https://issues.apache.org/jira/browse/MAPREDUCE-279

Note that YARN also supports Docker containers as of Hadoop 2.6...

YARN-1964: Create Docker analog of the LinuxContainerExecutor in YARN https://issues.apache.org/jira/browse/YARN-1964


I'll stick with http://www.dbms2.com/2014/04/30/spark-on-fire/ however, for the most part. (The mid-2014 PR blitz I thought I was engineering didn't actually happen.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact