

Twitter will open-source Storm, BackType's "Hadoop of Real-Time Processing" - canistr
http://www.readwriteweb.com/enterprise/2011/08/twitter-will-open-source-storm.php

======
nathanmarz
There's a lot more details about Storm on the actual announcement:
[http://engineering.twitter.com/2011/08/storm-is-coming-
more-...](http://engineering.twitter.com/2011/08/storm-is-coming-more-details-
and-plans.html)

If you have any questions about Storm, feel free to ask me here or on the
Google group (<http://groups.google.com/group/storm-user>).

~~~
jackowayed
I'm interested in hearing about how you guys upgrade topologies in production
(assuming you do). It's designed to be able to run forever, but obviously once
in awhile you find a bug, want to track a new stat, etc. I guess if you're
pulling data off of a queue, you might be able to get away with letting things
queue up for a few seconds as everything restarts with the new code and then
catching up. Is that how you handle it, or do you do something more clever?

~~~
nathanmarz
Currently you let things queue up while you redeploy, but I'm working on a new
feature that lets you "swap" two topologies. The new one is deployed in an
inactive state, and then the two topologies are swapped. This lets you
minimize the downtime to almost nothing.

------
jackowayed
The Twitter Engineering Blog post is way more interesting
[http://engineering.twitter.com/2011/08/storm-is-coming-
more-...](http://engineering.twitter.com/2011/08/storm-is-coming-more-details-
and-plans.html)

Despite having a "master" node, it sounds like this actually has no single
points of failure. Since all the state for the master is in ZooKeeper, I think
you could just fail over to another server running the master if your first
main one gets messed up. Pretty cool. (I may be totally wrong here ... All my
distributed systems knowledge has come from being around people who know about
distributed systems.)

Though one thing that's not especially satisfying is that if his answer for
when a Bolt needs to store state is "use a database". I guess the Hadoop
answer is that if your reducer fails, it just runs it again, and there's no
real analog to doing that when your bolts are meant to run infinitely

~~~
nathanmarz
That's correct, the design will make it easy to cluster the master node later
on.

------
ldng
I'll believe when I see it. Not that twitter never open some of it's code
(<http://twitter.com/about/opensource>) but ... in the past they've said
they'll open some tools and end up not doing it (I'm thinking about Crane
here).

It's nice to even think about opensourcing things and even better when they
though :-)

~~~
squarecog
(I am one of the authors of Crane) The thing that happened with Crane is that
it grew a lot of Twitter-specific cruft and separating that from the
generally-useful bits is a serious undertaking -- and I'm not sure how much
useful stuff would be left once we cut the twittery bits out. Now and then I
chat with the Sqoop folks about how we can merge the two but that path isn't
clear either.. so it's in a bit of a limbo, as far as being open-sourced.
Sorry about that.

~~~
ldng
Not a problem you have to be sorry about, it happens. But the thing here is
that announces like that implies expectation. So, IMHO, you shouldn't announce
to much time ahead and announce when you know for sure there will be an actual
release. Otherwise grumpy sceptics like me criticize ;-)

------
adw
This is an almighty big deal. I missed it (everyone was talking about
distributed computing), but what Hadoop's done is make conventional BI
providers look increasingly unimpressive. Systems like Storm are going to do
the same - as the Twitter Engineering blog post points out - to CEP.

------
jwr
Frankly, I'm getting tired of the hype. Just release it already. It has been
promised for a long time now, with no code in sight. Now new owners have re-
promised it and it is a hackernewsworthy item all of a sudden?

Don't get me wrong - the description sounds great and I'll be one of the first
users. But this is not techcrunch.

------
michaelschade
Perhaps this will help quell some of the skepticism about them actually
releasing it–the abstract for the talk in which Nathan will be releasing Storm
as open-source: [https://thestrangeloop.com/sessions/storm-twitters-
scalable-...](https://thestrangeloop.com/sessions/storm-twitters-scalable-
realtime-computation-system)

------
mark_l_watson
Twitter acquired a good team with BackType. Too bad for BackType's customers
though. Not the end of the world, but yet another example of the perils (in
addition to benefits) of using other company's web services.

~~~
spooneybarger
I think that goes further than 'a companys web services'.

Every business should be looking at their weaknesses. If you are using someone
else's service at the core of your business, that could go away at any time-
you should have a contingency plan. The same for if you source something for
manufacturing from another country, or have your factories in another country.
What if that country becomes unstable? What will the impact on your business
be?

There is nothing wrong with building your business on something that might
disappear, you just need to understand that.

------
bengl3rt
_If_ this is true, I'm glad that Twitter is choosing to honor a promise that
Backtype made pre-acquisition.

