

Bagel: Large-scale graph processing on Spark  - yarapavan
https://github.com/mesos/spark/pull/48

======
rxin
I dogfooded Spark by writing a distributed machine learning aligner. It is
tightly integrated with Scala. It obviously still has a long way to go, but it
does provide some very nice properties, e.g.:

\- Support map-reduce style computation model (i.e. map, reduce, groupByKey
style computations). It is important to differentiate map-reduce
infrastructure (e.g. Google MapReduce, Hadoop MapReduce) from the computation
model (e.g. map/reduce function).

\- For many use cases, minimal change required going from serial code to
distributed, parallelized code.

\- Very nice REPL interface that's very convenient in exploring data.

\- Much better support for iterative (machine learning) computations through
in-memory cache of data sets and minimal per iteration scheduling overhead.

\- Good integration with existing infrastructures, e.g. Hadoop HDFS

To find out more about the framework: <http://www.spark-project.org/>

~~~
HairyFotr
Is Spark really as fast in real use, as the graph on the site claims? I don't
think I've ever seen anything beat Hadoop by that margin before.

~~~
danjo
We currently use Spark for generating reports that scan the same data set
multiple times. We got a 20-30x performance gain over hadoop+hive. Spark's
ability to keep data sets in memory works very well for our use case.

