
History of Apache Storm and Lessons Learned - plinkplonk
http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html
======
jtheory
I think I get the concept (and it's interesting); but I'm a bit baffled by the
terminology:

> That led me to the idea of "spouts" and "bolts" – a spout produces brand new
> streams, and a bolt takes in streams as input and produces streams as
> output.

Spout I can grasp; why "bolt"? Why these two, together? I'd have really
appreciated an extra sentence or two here as to why he chose these terms --
naming is so crucial, and the fact that both "spouts" and "bolts" sound like
sources/producers to me (one of water, another of electricity...?) made it
harder to grasp how Storm works.

~~~
siddboots
I always assumed that "bolt" refers to the threaded fasteners, not electrical
bolts. The analogy they are going for is that they are the components
responsible for "bolting" spouts together (although, the mixing of metaphors
bothers me a little bit... why are we using bolts to fasten fluids together?)

To be super clear about it, a "topology" in Storm is a directed acyclical
graph. The nodes are called "bolts" and the edges are called "spouts".

~~~
dwenzek
> a "topology" in Storm is a directed acyclical graph. The nodes are called
> "bolts" and the edges are called spouts"

Not really. Edges are streams and nodes are either spouts or bolts. Spouts
have no predecessors and inject external data into the system. Bolts gather
one or more input streams and produce zero or more output streams.

These names are a bit confusing, but only during the very first steps of a
project. Quickly every developers grasp the concepts.

I have nevertheless a concern regarding spouts and bolts. In practice, when a
topology is refactored, bolts have frequently to be transformed into spouts.
This occurs when a topology is split in two to ease deployment; and when a
persistent queue is added along a stream. This highlights that the distinction
between spouts and bolts is bit artificial.

------
discardorama
Great read.

An aside: when Nathan started work on Storm, there was a competing streaming
platform being worked on inside Yahoo, called "S4". It had many similarities
to Storm, and some key differences.

It is still in the incubator:
[http://incubator.apache.org/s4/](http://incubator.apache.org/s4/) (I have no
connection with S4, just know some people who worked on it)

~~~
dimon
The key advantage Storm brought to the table was, as briefly mentioned in the
article, the at-least once processing guarantee Storm offers thanks to its
efficient tracking of tuples and their descendants (see
[https://storm.incubator.apache.org/documentation/Guaranteein...](https://storm.incubator.apache.org/documentation/Guaranteeing-
message-processing.html)). To my knowledge S4 offers no such guarantees. I
think this made Storm more attractive for many use-cases, it did for me.

------
clubhi
I've been using Storm for a few years in production. Thanks for all the hard
work Nathan.

You and Kyle Kingsbury are two "younger" guys that I always enjoy reading
whatever you write.

------
skybrian
How does it compare to Apache Kafka? (I haven't used either.)

~~~
akbar501
Kafka is a high performance, persistent message queue.

Storm is distributed stream processing.

The two work together very well. Often, a server will publish messages to
Kafka (or another queue), which are then read and processed by Storm.

~~~
throwaway1979
Interesting question and response. I haven't used either but read about both
systems independently. A bit more googling and I found:

[http://www.slideshare.net/gschmutz/kafka-
andstromeventproces...](http://www.slideshare.net/gschmutz/kafka-
andstromeventprocessinginrealtime)

Slides 42 and 43 describe an architecture with all three ... hadoop, storm and
kafka. Seems like a beast and I don't have applications but pretty cool combo
of technologies.

~~~
erichmond
[http://manning.com/marz/](http://manning.com/marz/) \- this is often the
trifecta used in the "lambda architecture", coined by nathan marz. This is
also the system they built @ backtype to do their real-time analytics system.

~~~
phunge
Also worth checking out [http://radar.oreilly.com/2014/07/questioning-the-
lambda-arch...](http://radar.oreilly.com/2014/07/questioning-the-lambda-
architecture.html) from Jay Kreps, who's part of the Kafka braintrust.

------
dmritard96
Nice read. At my previous job we used storm for all sorts of cool things. It
worked great and it was a pleasure to be the one at the company that got to
really cozy up with it. Deployment (with or without storm-deploy) was also
well thought out.

------
Terr_
I've seen Samza's documentation comparing itself to Storm, I wonder what the
reverse-perspective is...

~~~
wink
As I understood it Samza ties in with Hadoop/HBase, so if you already have
that, it might be easier (or even possible at all) to integrate, whereas Storm
has nothing to do with with that and only does stream processing.

~~~
bhangi
Samza can run on a Hadoop 2 cluster using YARN. As far as I know, there is no
other connection to Hadoop or HBase.

------
schultz9999
Looks like we overloaded the site...

