Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks interesting. Questions:

(1) What do you mean by a processing topology -- is this a data dependency graph?

(2) How does one define a topology? Is this specified at deployment time via the jar file, or can it be configured separately and on the fly?

(3) Must records be processed in time order, or can they be sorted and aggregated on some other key?




1. A processing topology is a graph of how data flows through workers. The roots of the topology are "spouts" that emit tuples into the topology (an example spout is one that reads off of a kestrel queue). The other nodes take in tuples as input and emit new tuples as output. Each node in the graph executes as many threads across the cluster.

2. To deploy a topology, you give the master machine a jar containing all the code and a topology. The topology can be created dynamically at submit time, but once it's submitted it's static.

3. You receive the records in the order the spouts emit them. Things like sorting, windowed joins, and aggregations are built on top of the primitives that Storm provides. There's going to be a lot of room for higher level abstractions on top of Storm, ala Cascalog/Pig/Cascading//Hive on top of Hadoop. We've already started designing a Clojure-based DSL for Storm.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: