
Apache Storm 1.0 released - wener
https://storm.apache.org/2016/04/12/storm100-released.html
======
ausjke
so a newbie question, when do I need this? the only thing I understand is that
hadoop is for batch process while storm is for realtime big-data processing.

~~~
pixelmonkey
If you're a Python user, I gave a talk at PyData called "Beating the GIL with
Python" that covers all the tooling for parallel computing in Python, from the
simplest (e.g. multiprocessing) to the more complex (Storm/Spark), with many
tools in between, and what the trade-offs are. That may help you out.

[https://www.youtube.com/watch?v=gVBLF0ohcrE](https://www.youtube.com/watch?v=gVBLF0ohcrE)

The rough summary is, if you need to do distributed __multi-node __computation
with __very low latency __\-- that is, end-to-end data processing time
measured in millis or seconds -- then Storm will help you out. The
"competitive" product in open source right now is pyspark + Spark Streaming.

The benefit of Storm is that it is an older (more mature) project with many
production deployments. My team at Parse.ly wrote the open source Python +
Storm integration libraries, pystorm and streamparse, which help Python
programmers make use of Storm for large-scale stream processing:
[https://github.com/Parsely/streamparse](https://github.com/Parsely/streamparse).
We also wrote pykafka, which is a Kafka client for Python:
[https://github.com/Parsely/pykafka](https://github.com/Parsely/pykafka) \--
these projects are related since Kafka is often the streaming data source used
as the input to a Storm topology.

~~~
iagooar
> The rough summary is, if you need to do distributed multi-node computation
> with very low latency -- that is, end-to-end data processing time measured
> in millis or seconds -- then Storm will help you out.

I still don't really know what kind of computation that is. Any specific,
real-world examples? I'm genuinely curious.

~~~
pixelmonkey
We use it for real-time web/content analytics. You can see our product tour @
[http://parse.ly/tour](http://parse.ly/tour).

Other examples of "common" steaming data with real-time use cases:

\- Twitter firehose

\- Network packet analysis

\- Financial market data

\- Device and sensor data

------
rar_ram
Excellent work guys! I had developed a custom caching solution for storm in
v0.9.6. Its great that distributed caching is being supported now. I would
love to see some more documentation and examples on it.

------
therealdrag0
Wow. Good work Storm people! Some nice features here.

Anyone privy to what changes they made to yield the performance improvements?

~~~
lovelearning
From the issue log :

"Profiling a simple speed of light topology, shows that a good chunk of time
of the SpoutOutputCollector.emit() is spent in the clojure reduce() function..
which is part of the ACK-ing logic. Re-implementing this reduce() logic in
java gives a big performance boost in both in the Spout.nextTuple() and
Bolt.execute()"

[1]:
[https://issues.apache.org/jira/browse/STORM-1539](https://issues.apache.org/jira/browse/STORM-1539)

------
pknerd
Spark || Storm?

Your take, guys?

