

MillWheel: Fault-Tolerant Stream Processing At Internet Scale [pdf] - pushkargaikwad
http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p734-akidau.pdf

======
noelwelsh
It's a good time for streaming processing infrastructure. Twitter has just
released Summingbird, and LinkedIn has released Samza:

\- [https://blog.twitter.com/2013/streaming-mapreduce-with-
summi...](https://blog.twitter.com/2013/streaming-mapreduce-with-summingbird)

\- [http://samza.incubator.apache.org/](http://samza.incubator.apache.org/)

Of the three I'm most interested in Summingbird. Unlike Google's system I can
actually download it, and it seems to provide a better query abstraction than
Samza. I haven't spent much time investigating any of these systems, so I
might be incorrect in this assessment.

~~~
gizzlon
Aren't there also at least one other stream processing framework running atop
Hadoop / HFS ?

Edit: Guess I was thinking about spark[1] but it doesn't really fit.. others?

[1] [http://spark.incubator.apache.org/](http://spark.incubator.apache.org/)

------
bradhe
Is Internet scale bigger or smaller than web scale?

~~~
WayneDB
The web (http) is a subset of the Internet, so by definition Internet scale
would be bigger.

