Wonder why this is getting posted today in particular?
The quick summary here is that this was a clean-house rewrite of Apache Storm done by an internal team at Twitter. As an open source project history refresher, Apache Storm was originally built by a startup called Backtype, and the project was led by Nathan Marz, the technical founder of Backtype. Then, Backtype was acquired by Twitter, and Storm became a major component for large-scale stream processing (of tweets, tweet analytics, and other things) at Twitter.
I wrote a summary of the "interesting bits" of Apache Storm here:
However, at a certain point, Nathan Marz left Twitter, and a different group of engineers tried to rethink Storm inside Twitter. There was also a lot of work going on around Apache Mesos at the time. Heron is kind of a merger of their "rethinking" of Storm while also making it possible to manage Storm-like Heron clusters using Mesos.
But, I don't think Heron really took off. Meanwhile, Storm got very, very stable in the 1.x series, and then had a clean-house rewrite from Clojure to Java in the 2.x series, mainly to improve performance even more. The last stable/major Storm release was in 2020.
Storm provides a stream processing programming API, a multi-lang wire protocol, and a cluster management approach. But certain cluster computing problems can probably be better solved at the infrastructure layer today. (For example, Storm was developed before the whole container + docker + k8s focus in cloud ops.) That said, it's still a very powerful system; on my team, we process 75K+ events per second across hundreds of vCPU cores and thousands of Python processes with sub-second latencies by combining Storm and Kafka with our open source Python project, streamparse.
The core problems Storm solves: modeling data processing as a computation graph; high-speed network communication between threads, processes, and nodes; message delivery guarantees and retry capabilities; tunable parallelism; built-in monitoring and logging; and much more.
(Also, I'd be remiss if I didn't mention -- if you're interested in stream processing and distributed computing, we are hiring Python Data Engineers to work on a stack involving Storm, Spark, Kafka, Cassandra, etc.) -- https://www.parse.ly/careers/python_data_engineer
I was in charge of the Twitter data platform team at the time we developed Heron and deprecated Storm. The Mesos component of your retelling is not quite right. Take a look at this comment I wrote around the time we started talking about Heron, addressing the same misconception: https://news.ycombinator.com/item?id=10056479
Is there a discussion as to the decision to dump clojure --> java for "performance reasons"?
I'm not even a clojure user, but my impression was that it was pretty performant. I remember a discussion that they didn't even really need the JVM invokedynamic because they were doing pretty well without it, so that made me think it was close to pure JVM speed.
A lot of Storm was written in Java, but the "core" was written in Clojure. There wasn't so much a "decision" to dump Clojure as much as a community "opportunity" to do so. My understanding is that Alibaba, one of Storm's production adopters, did a clean-house port from Clojure to Java, which they called jstorm. They then donated/offered that implementation to the Apache Storm project, and the project decided to base the Storm 2.x line on it. So Storm 1.x still has the Clojure core, with lineage to the original Backtype release, but 2.x is sourced from jstorm. A big focus of Storm 2.x was high-scale performance, latency, and backpressure management. I also heard that some folks in the open source Storm community suspected it might be easier to find contributors/committers for Storm if it were implemented in Java. Meanwhile, Heron sprung up as a performance-focused Storm alternative with API compatibility to Storm, before Storm 2.x took shape.
The quick summary here is that this was a clean-house rewrite of Apache Storm done by an internal team at Twitter. As an open source project history refresher, Apache Storm was originally built by a startup called Backtype, and the project was led by Nathan Marz, the technical founder of Backtype. Then, Backtype was acquired by Twitter, and Storm became a major component for large-scale stream processing (of tweets, tweet analytics, and other things) at Twitter.
I wrote a summary of the "interesting bits" of Apache Storm here:
https://blog.parse.ly/storm/
However, at a certain point, Nathan Marz left Twitter, and a different group of engineers tried to rethink Storm inside Twitter. There was also a lot of work going on around Apache Mesos at the time. Heron is kind of a merger of their "rethinking" of Storm while also making it possible to manage Storm-like Heron clusters using Mesos.
But, I don't think Heron really took off. Meanwhile, Storm got very, very stable in the 1.x series, and then had a clean-house rewrite from Clojure to Java in the 2.x series, mainly to improve performance even more. The last stable/major Storm release was in 2020.
Storm provides a stream processing programming API, a multi-lang wire protocol, and a cluster management approach. But certain cluster computing problems can probably be better solved at the infrastructure layer today. (For example, Storm was developed before the whole container + docker + k8s focus in cloud ops.) That said, it's still a very powerful system; on my team, we process 75K+ events per second across hundreds of vCPU cores and thousands of Python processes with sub-second latencies by combining Storm and Kafka with our open source Python project, streamparse.
https://github.com/Parsely/streamparse
The core problems Storm solves: modeling data processing as a computation graph; high-speed network communication between threads, processes, and nodes; message delivery guarantees and retry capabilities; tunable parallelism; built-in monitoring and logging; and much more.
(Also, I'd be remiss if I didn't mention -- if you're interested in stream processing and distributed computing, we are hiring Python Data Engineers to work on a stack involving Storm, Spark, Kafka, Cassandra, etc.) -- https://www.parse.ly/careers/python_data_engineer