
Storm : a Realtime Computation System Similar to Hadoop - EzGraphs
https://github.com/nathanmarz/storm/wiki
======
terhechte
Previous Discussion: <http://news.ycombinator.com/item?id=3014039>

------
Xorlev
It's barely similar, it's a fault-tolerant system for scaling computation.
Storm provides real-time streaming computation. Your spouts provide infinite
streams of tuples, small objects which store serialized other types that you
then emit 0 or more tuples out of that tuple.

You could liken it to a streaming mapreduce that you can rearrange into
directed graphs of data flows called a topology.

Re: Spark, it's a totally different paradigm that's like a map reduce which
takes advantage of memory locality where Hadoop takes advantage of disk
locality. Hive on Spark is a pretty beastly system.

------
_jmar777
Storm is actually not similar to Hadoop at all. I think this title resulted
from a misreading of the README, which states: "Storm is a distributed
realtime computation system. Similar to how Hadoop provides a set of general
primitives for doing batch processing, Storm provides a set of general
primitives for doing realtime computation."

/nitpick

------
dkhenry
So which is better Storm[1] or Spark[2] ?

1\. <http://storm-project.net/> 2\. <http://www.spark-project.org/>

~~~
cf
They are for different usecases. Storm is more for realtime data like
analyzing twitter or bidding on ads. Spark is very much a in-memory map-reduce
and designed for batch computations. Spark makes sense when you just have a
lot of data you want to analyze or get statistics on.

------
EzGraphs
JRuby DSL and Integration for Storm here:
<https://github.com/colinsurprenant/redstorm>

------
t-crayford
How is this better than Hadoop?

~~~
lmirosevic
Apples and oranges. Hadoop is a framework for parallel batch processing of
existing data, like server logs and sales statistics. Storm is a framework for
processing streaming data in realtime, like Twitter's live stream of tweets.

