Of the three I'm most interested in Summingbird. Unlike Google's system I can actually download it, and it seems to provide a better query abstraction than Samza. I haven't spent much time investigating any of these systems, so I might be incorrect in this assessment.
I'm not sure what you'd be able to do with Google's system if you could download it; the code is thoroughly dependent on the rest of Google's idiosyncratic internal codebase.
Probably a commercial arrangement. Clicking within [scribd] takes you to a version hosted by scribd.com. Is probably quite useful, as tiny servers which would handle an academic's papers very nicely thank you can become easily flooded by something hitting the HN front page.
Sorry, I was being obtuse. I meant that Google's code is not OSS, where as Summingbird and Samza are. So while MillWheel might be great in theory, in practice I'm not going to use it and I'm not motivated enough to attempt to reimplement it.
I think @timclicks was replying to a deleted OT comment of mine asking why there was an automatically generated [scribd] in the submission title when the document isn't hosted on scribd. I deleted the comment when I realised that its not part of the title but is actually a (useful?) link to view the doc in scribd. I didn't want to add noise to the discussion, but it seems I failed...
- https://blog.twitter.com/2013/streaming-mapreduce-with-summi...
- http://samza.incubator.apache.org/
Of the three I'm most interested in Summingbird. Unlike Google's system I can actually download it, and it seems to provide a better query abstraction than Samza. I haven't spent much time investigating any of these systems, so I might be incorrect in this assessment.