Your criticism is totally fair. People have been curious about Storm so we wanted to provide a little bit of information about it. We'll have demos soon, and of course it will be open sourced within a few months.
If you're curious about our credibility, I think our other open source projects speak to the quality of software we produce:
Now, I agree that it's kind of a bummer we can't play with it right now, but the fact that you guys made this are are going to open source it is already awesome in itself.
That said, I'd love to see some code released, even if it isn't ready for primetime.
Our HBase cluster (3 boxes serving 30 human oracles, each submitting data at a rate of 1 record every 5-10 seconds) choked frequently - i.e., it stopped accepting new records. Ultimately what I had to do is have the human data go into postgres and a cron job flushed that into HBase every half hour or so.
I'll emphasize that this is probably my fault. I'm not claiming HBase doesn't scale to 30 concurrent users - clearly Facebook demonstrates it can. But I couldn't figure out how to make that happen. HBase is a complex system and I make no claim of understanding it.
ElephantDB + MaryJane are simple. There is almost nothing that can go wrong - put together they probably amount to 5000 lines of code and have as many as 10 minimally interacting configuration options. The effort required to manage them is minimal - I had EDB working flawlessly in less than a day.
HBase is an enterprise tool. It works well if you are Facebook and can put a couple of people on maintenance duty. It's overkill if you are Styloot (my stealth mode startup, currently smaller than Backtype).
The data I'm loading is stuff like tags - e.g., <itemid>\t<tagid>. In human terms, "Dress A has a ruched collar." Mapreduce can handle data like this, even when it comes unordered.
The data I'm reading is computational results based on the loaded data - e.g., an index: <tagid>\t[<itemid1>, <itemid2>, ...] (where each itemid has been tagged with tagid). E.g., "here are all the dresses with a ruched collar."
(Actually, we do considerably more than this, nor do we need Hadoop for an index. But an index is the simplest example I could give.)
The original data is very boring. It's only after aggregation and calculation that it becomes worth reading.
Response - http://twitter.com/#!/kevinweil/status/73263430873792512
If you can't make it opensource, at least write a serious paper to support the claims. Like Google did for Big-Table.
A lot of people think their systems are scalable and fault-tolerant. Most are not. And from the information provided, we can't tell.
We're a startup — we're not going to write an academic paper supporting the claims in the post. Nevertheless, Storm's an exciting project many people are curious to learn more about; that's why we've written something about it now.
We have a demo coming soon, and Storm itself will be open sourced soon enough.
It seems like this is buzz-worthy, (like http://mailchimp.com/omnivore/), but this pitch is nerd-focused, not potential-customer focused. If you pitch to nerds, you want a github link. If you pitch to potential customers, highlight the benefits that are now possible due to this innovation.
At least in our batch, we got drilled this repeatedly: Don't talk features. Talk benefits.