Hacker News new | comments | show | ask | jobs | submit login
MillWheel: Fault-Tolerant Stream Processing At Internet Scale [pdf] (unitn.eu)
62 points by pushkargaikwad 1536 days ago | hide | past | web | 8 comments | favorite

It's a good time for streaming processing infrastructure. Twitter has just released Summingbird, and LinkedIn has released Samza:

- https://blog.twitter.com/2013/streaming-mapreduce-with-summi...

- http://samza.incubator.apache.org/

Of the three I'm most interested in Summingbird. Unlike Google's system I can actually download it, and it seems to provide a better query abstraction than Samza. I haven't spent much time investigating any of these systems, so I might be incorrect in this assessment.

Aren't there also at least one other stream processing framework running atop Hadoop / HFS ?

Edit: Guess I was thinking about spark[1] but it doesn't really fit.. others?

[1] http://spark.incubator.apache.org/

I'm not sure what you'd be able to do with Google's system if you could download it; the code is thoroughly dependent on the rest of Google's idiosyncratic internal codebase.


Probably a commercial arrangement. Clicking within [scribd] takes you to a version hosted by scribd.com. Is probably quite useful, as tiny servers which would handle an academic's papers very nicely thank you can become easily flooded by something hitting the HN front page.

Sorry, I was being obtuse. I meant that Google's code is not OSS, where as Summingbird and Samza are. So while MillWheel might be great in theory, in practice I'm not going to use it and I'm not motivated enough to attempt to reimplement it.

I think @timclicks was replying to a deleted OT comment of mine asking why there was an automatically generated [scribd] in the submission title when the document isn't hosted on scribd. I deleted the comment when I realised that its not part of the title but is actually a (useful?) link to view the doc in scribd. I didn't want to add noise to the discussion, but it seems I failed...

Is Internet scale bigger or smaller than web scale?

The web (http) is a subset of the Internet, so by definition Internet scale would be bigger.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact