Hacker News new | comments | show | ask | jobs | submit login

Storm sounds great, but this post probably should have waited until it was actually open-sourced. As it is, it just comes across as naked self-promotion based on a technology that could for all we know be vaporware.

(I'm the author of Storm)

Your criticism is totally fair. People have been curious about Storm so we wanted to provide a little bit of information about it. We'll have demos soon, and of course it will be open sourced within a few months.

If you're curious about our credibility, I think our other open source projects speak to the quality of software we produce:

https://github.com/nathanmarz/cascalog https://github.com/nathanmarz/elephantdb

I think people here are a little too harsh. Storm sounds like an amazing product and I can't wait to play with something like that. Right now, we run a bunch of cron jobs every minute with intense MapReduce queries on mongodb to generate relatively up-to-date analytics. Something like this would be immensely useful. (As well as Mongo's new 2.0 Aggregation pipeline features.)

Now, I agree that it's kind of a bummer we can't play with it right now, but the fact that you guys made this are are going to open source it is already awesome in itself.

As a happy user of ElephantDB, I'd say people are definitely too harsh. Elephantdb is awesome - my company has completely replaced HBase with ElephantDB and MaryJane (a lightweight way of putting data into hadoop that we wrote, https://github.com/stucchio/MaryJane- ).

That said, I'd love to see some code released, even if it isn't ready for primetime.

Can you say anything about what made ElephantDB + MaryJane better than HBase for your workload? (Occasional batch loads that then need random reads but not random inserts?)

Absolutely - I need batch loads, and random reads. The term "insert" is somewhat meaningless - I have random appends and a periodic mapreduce job compiles the randomly appended data into structured data to served via ElephantDB. The structured data requires random queries. In principle, HBase should have filled my needs completely. But in practice, I couldn't make it work.

Our HBase cluster (3 boxes serving 30 human oracles, each submitting data at a rate of 1 record every 5-10 seconds) choked frequently - i.e., it stopped accepting new records. Ultimately what I had to do is have the human data go into postgres and a cron job flushed that into HBase every half hour or so.

I'll emphasize that this is probably my fault. I'm not claiming HBase doesn't scale to 30 concurrent users - clearly Facebook demonstrates it can. But I couldn't figure out how to make that happen. HBase is a complex system and I make no claim of understanding it.

ElephantDB + MaryJane are simple. There is almost nothing that can go wrong - put together they probably amount to 5000 lines of code and have as many as 10 minimally interacting configuration options. The effort required to manage them is minimal - I had EDB working flawlessly in less than a day.

HBase is an enterprise tool. It works well if you are Facebook and can put a couple of people on maintenance duty. It's overkill if you are Styloot (my stealth mode startup, currently smaller than Backtype).

Thanks! So on each batched load, is the previous data rewritten with interleaved new data? Or is the key ordering such that's never necessary?

Each batched load has no ordering. But the data I'm loading is not the same as the data I'm reading.

The data I'm loading is stuff like tags - e.g., <itemid>\t<tagid>. In human terms, "Dress A has a ruched collar." Mapreduce can handle data like this, even when it comes unordered.

The data I'm reading is computational results based on the loaded data - e.g., an index: <tagid>\t[<itemid1>, <itemid2>, ...] (where each itemid has been tagged with tagid). E.g., "here are all the dresses with a ruched collar."

(Actually, we do considerably more than this, nor do we need Hadoop for an index. But an index is the simplest example I could give.)

The original data is very boring. It's only after aggregation and calculation that it becomes worth reading.

Cool, great to see you're making use of EDB. Would love to hear more about how you're using, how the transition was, etc. mm@backtype.com.

I'm still waiting on Twitter's rainbird (http://www.slideshare.net/kevinweil/rainbird-realtime-analyt...) to come out!

Me too, I asked them the other day about it,

Response - http://twitter.com/#!/kevinweil/status/73263430873792512

Also, the lack of any scalability charts or diagrams of architecture is suspicious.

If you can't make it opensource, at least write a serious paper to support the claims. Like Google did for Big-Table.

A lot of people think their systems are scalable and fault-tolerant. Most are not. And from the information provided, we can't tell.

We've released open source projects (most notably ElephantDB and Cascalog) in the past that are successfully used in production by us as well as other companies. You should check them out if you're interested in a measure of quality, though I understand your concern.

We're a startup — we're not going to write an academic paper supporting the claims in the post. Nevertheless, Storm's an exciting project many people are curious to learn more about; that's why we've written something about it now.

We have a demo coming soon, and Storm itself will be open sourced soon enough.

I absolutely understand the issue of being resource constrained.

It seems like this is buzz-worthy, (like http://mailchimp.com/omnivore/), but this pitch is nerd-focused, not potential-customer focused. If you pitch to nerds, you want a github link. If you pitch to potential customers, highlight the benefits that are now possible due to this innovation.

At least in our batch, we got drilled this repeatedly: Don't talk features. Talk benefits.

Dont call it "the Hadoop of" if it is not open source. Hadoop is notable as an open source project not actually a new idea...

Precisely. I read the whole post looking for a link to the source on github or something, and then the last sentence was just a huge letdown.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact