What I'd like to understand at a glance about Storm is durability, idempotency, and specifically those two things in the face of all your standard failure modes.
Is Storm something that is going to get 99.9% of my data there which is good enough for some side-channel processing like ad targeting, or is it something that's D durable in the strongest sense of the word? If I'm ultimately delivering data to something that is 5 9's durable, or better yet S3 10-11 9's durable, is Storm going to be my "low point" of durability?
"Guarantees no data loss: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4."
Nice to see this. I started trying to implement storm towards the beginning of this year since it seems like something I've wanted/reimplemented partially so many times and just found it to be so time consuming trying to get good resources or find a community for misc support that I gave up and reimplemented simple parts of it as I needed to.
I'm off the buy the books and bookmark this page for future reference. Great work Nathan.
No need to apologize, it was quite early in the development of the project(and expected). I'm just glad to see the progress you've made in a year and look forward to implementing Storm to solve a few of our issues in the near future.
The code itself is nice and clean, take a look yourself (chosen randomly) 
It's exciting because if you've ever done any real time processing of data streams at scale or even been daunted by the idea of it, and you read the wiki, it's clear storm is an exciting option. It does for Big Data Streams what Hadoop did for Big Data.
Kafka is interesting, but they don't really address similar use cases. They are more complimentary and in fact storm has a spout implementation for kafka.
They are more complimentary and in fact storm has a spout implementation for kafka
I would add to this that a lot of Storm users (myself included) use Storm together with Kafka - that is, Kafka is used to get data into (and possibly out of) Storm while Storm does the actual processing. Kafka is more along the lines of Kestrel and RabbitMQ.
One of my major wishlist items for Storm is an end-to-end tutorial on how to get it working for a non-JVM language. I'm not a JVM coder, nor do I know much about it, and the documentation on how to get it working for Python is hard to follow at best, IMHO.
Keep up the good work, Nathan. I presented Storm as a tech-talk at Formspring and no question, you made my job easy by building a platform that makes it easy to get a topology running without a ton of boilerplate.
I've used Storm for a couple of projects and really liked it. It significantly simplifies building real time data pipelines and makes deploying them KISS easy. I'm really excited to see some improved metrics coming out as well--if I had one gripe with Storm .7 it was that I wanted some more metrics to monitor cluster performance. Congrats on a year and keep up the good work, Storm is awesome!