Wow ... this is awesome! Quoting from the paper, "live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable". That's another amazing mindshift!
Why is there no related work section in this paper? I'm not sure simply calling it 'The' dataflow model is very friendly either to all the other previous dataflow models for parallelism that have been developed over the last four decades or so. Why can this implementation be the definitive one so much that it doesn't even need a qualified name and why aren't any of the others even worth a mention?
They're good summaries, but after thinking about this a fair bit, I don't think they're enough to trump HN's preference for original sources. Anyone who wants to can read both, since when we change a URL we include the previous one in the comments, as above.