

Storm: distributed and fault-tolerant realtime computation (slides) - nathanmarz
http://www.slideshare.net/nathanmarz/storm-distributed-and-faulttolerant-realtime-computation

======
michaelschade
I was front row at his talk during Strange Loop (he did an amazing job on the
talk; watching Storm and friends be open-sourced halfway through was
fantastic) and tried to write fairly detailed yet correct notes. I was typing
pretty quickly though, so I can't guarantee 100% accuracy, but hopefully they
go along well with the slides.

[http://mschade.me/notes-storm-twitters-scalable-realtime-
com...](http://mschade.me/notes-storm-twitters-scalable-realtime-comput)

------
nirvana
I'm trying to absorb all this, and maybe I'm missing something but they kind a
lost me at "master node". I've plans to do this kind of realtime processing in
the near future (starting the project in 4-6 months) and have been thinking
about how to do it.

At this point, I think that there's some good ideas here, but that I would
want to build it using erlang and javascript/coffeescript workers and using
riak pipe to distribute the load.

riak pipe lets you set arbitrary topologies by using a function to determine
which node a piece of data should be processed on. But there are no master
nodes... The supervisor process in storm is analogous to the vnode in riak
pipe. You could do fittings (bolts) in erlang, or have them run an erlang_js
or erlv8 virtual machine and run javascript.

Certainly there are higher level features that would have to be replicated to
replicate the functionality of storm. I think Storm is really instructive in
terms of pointing out something new (as google's map-reduce paper did, etc.)

I'm not sure it is the solution I'm looking for, but I think that I may adopt
some of its ideas... and I wanted to let people know about riak pipe, which is
very new, but lower profile, yet very flexible and interesting.....

~~~
nathanmarz
The master node coordinates the Storm cluster. It distributed code around the
cluster, monitors topologies, and reassigns any tasks that have failed. Check
out the wiki for more information about how Storm works and how to use it:
<https://github.com/nathanmarz/storm/wiki>

Some of the things that Storm does are non-trivial, especially guaranteeing
data processing. I recommend thinking more about whether you want to re-
implement these kinds of things on other technologies.

