Hacker News new | comments | show | ask | jobs | submit login
Storm's first birthday (nathanmarz.com)
115 points by nathanmarz 1832 days ago | hide | past | web | 24 comments | favorite

What I'd like to understand at a glance about Storm is durability, idempotency, and specifically those two things in the face of all your standard failure modes.

Is Storm something that is going to get 99.9% of my data there which is good enough for some side-channel processing like ad targeting, or is it something that's D durable in the strongest sense of the word? If I'm ultimately delivering data to something that is 5 9's durable, or better yet S3 10-11 9's durable, is Storm going to be my "low point" of durability?

Read the wiki:

"Guarantees no data loss: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4."


Storm supports transactional topologies that can provide durability. They are dependent on a spout that is repeatable- where you can request the failed input again. https://github.com/nathanmarz/storm/wiki/Transactional-topol...

Actually transactional topologies are superseded by Trident, which is much easier to use:


Here's an explanation on how Trident achieves fully fault-tolerant, idempotent semantics:


Nice to see this. I started trying to implement storm towards the beginning of this year since it seems like something I've wanted/reimplemented partially so many times and just found it to be so time consuming trying to get good resources or find a community for misc support that I gave up and reimplemented simple parts of it as I needed to.

I'm off the buy the books and bookmark this page for future reference. Great work Nathan.

Sorry to hear that you had trouble finding support for your issues. The mailing list is very well trafficked so hopefully you have more luck there next time.

No need to apologize, it was quite early in the development of the project(and expected). I'm just glad to see the progress you've made in a year and look forward to implementing Storm to solve a few of our issues in the near future.

Even if you have no use for storm, I'd strongly suggest you check out the wiki on github and browse the source code. It is IMO the nicest, cleanest, most exciting OSS project out there.


Can you expand on that? What's nice and clean and exciting about it?

Someone else posted link to Apache Kafka developed at LinkedIn.


It seems like they address similar use cases... it would be interesting see a comparison.

Oh easy questions:

The code itself is nice and clean, take a look yourself (chosen randomly) [1][2]

It's exciting because if you've ever done any real time processing of data streams at scale or even been daunted by the idea of it, and you read the wiki, it's clear storm is an exciting option. It does for Big Data Streams what Hadoop did for Big Data.

Kafka is interesting, but they don't really address similar use cases. They are more complimentary and in fact storm has a spout implementation for kafka[3].

[1] https://github.com/nathanmarz/storm/blob/master/src/clj/back... [2] https://github.com/nathanmarz/storm/blob/master/src/jvm/back... [3] https://github.com/nathanmarz/storm-contrib/tree/master/stor...

They are more complimentary and in fact storm has a spout implementation for kafka

I would add to this that a lot of Storm users (myself included) use Storm together with Kafka - that is, Kafka is used to get data into (and possibly out of) Storm while Storm does the actual processing. Kafka is more along the lines of Kestrel and RabbitMQ.

One of my major wishlist items for Storm is an end-to-end tutorial on how to get it working for a non-JVM language. I'm not a JVM coder, nor do I know much about it, and the documentation on how to get it working for Python is hard to follow at best, IMHO.

There's about a 9 page chapter in the new storm book (more of a large pamphlet really) on that. http://www.amazon.com/Getting-Started-Storm-Jonathan-Leibius...

There was a Python interface to Storm announced at PyCon called "Umbrella", but as far as I can tell it was never released or heard from again.

Keep up the good work, Nathan. I presented Storm as a tech-talk at Formspring and no question, you made my job easy by building a platform that makes it easy to get a topology running without a ton of boilerplate.

I've used Storm for a couple of projects and really liked it. It significantly simplifies building real time data pipelines and makes deploying them KISS easy. I'm really excited to see some improved metrics coming out as well--if I had one gripe with Storm .7 it was that I wanted some more metrics to monitor cluster performance. Congrats on a year and keep up the good work, Storm is awesome!

I know this has been repeated ad nauseum, but I'd really like to see blogs put what their product is somewhere on the page.

I've been using Storm for a few weeks and it works great, especially Trident, which I've been using for the past 2 or so weeks. Nathan has also been very helpful answering my n00b questions :)

so trident is apparently a high level api for Storm, but I don't see any mention of it in the Storm book. Is that right?

I am very interested in Storm, but wondering if the book is already out of date.

It is a little bit out of date, in that it doesn't cover anything in 8.0, but most of the basic stuff didn't change, it was more of new features being added.

Are there plans to integrate Storm with Datomic?

Not by the core team, though I think you could make a pretty cool Trident state implementation for Datomic.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact