This looks interesting. Questions: (1) What do you mean by a processing topology...

nathanmarz · on May 28, 2011

1. A processing topology is a graph of how data flows through workers. The roots of the topology are "spouts" that emit tuples into the topology (an example spout is one that reads off of a kestrel queue). The other nodes take in tuples as input and emit new tuples as output. Each node in the graph executes as many threads across the cluster.

2. To deploy a topology, you give the master machine a jar containing all the code and a topology. The topology can be created dynamically at submit time, but once it's submitted it's static.

3. You receive the records in the order the spouts emit them. Things like sorting, windowed joins, and aggregations are built on top of the primitives that Storm provides. There's going to be a lot of room for higher level abstractions on top of Storm, ala Cascalog/Pig/Cascading//Hive on top of Hadoop. We've already started designing a Clojure-based DSL for Storm.