
Hadoop 3: Comparison with Hadoop 2 and Spark - viktoriia_sh
https://activewizards.com/blog/hadoop-3-comparison-with-hadoop-2-and-spark/
======
trengrj
This article (like a lot of others) confuses Spark and Hadoop. To clarify:

Spark is a data processing engine. Spark works with many persistent storage
layers but provides none itself. So Spark can run on object storage (S3 etc),
additionally it can run on HDFS (the Hadoop file system).

The right layer to compare Spark with is MapReduce or to compare its SQL
engine vs other SQL engines that run on Hadoop (Impala, Hive LLAP, Presto).

YARN is used for resource scheduling in the Hadoop world but Spark has other
options like its internal scheduler, Mesos, and now Kubernetes.

When people say Hadoop, the technical meaning is often HDFS + Hive + YARN +
MapReduce (for legacy) + implicitly Spark on Hadoop. All Hadoop distributions
support Spark.

So the comment Spark is faster that Hadoop is like saying Postgres is faster
than Google Cloud.

------
apichenok
This is useful. Thanks for comparison table, guys

------
DossTheFlame
nice article, thank you. could you make comparison infografics for main data
science languages and frameworks please

