A really interesting proposal to unify stream processing and relational querying...

asavinov · on May 31, 2019

> However, I'm wondering if SQL is the right tool for the task. For instance, the listing 2 seems complex when compared to the query expressed in plain English

For me Listing 2 is more complex and less intuitive than Listing 1 (CQL). Yet, it is a general problem when we try to adopt relational (and SQL) concepts for solving such kind of tasks (windowing, grouping, aggregation etc.) One solution is to switch to some kind of column algebra rather than relational algebra as described in [1] and [2] (it has also been applied to stream processing).

[1] Joins vs. Links or Relational Join Considered Harmful: https://www.researchgate.net/publication/301764816_Joins_vs_...

[2] From Group-By to Accumulation: Data Aggregation Revisited: https://www.researchgate.net/publication/316551218_From_Grou...

dwenzek · on June 1, 2019

Thanks for the two links. I will have a look.

michaelmior · on May 31, 2019

I agree that listing 2 is a little verbose, but once you understand the syntax (not much different from vanilla SQL), it's pretty powerful. I think writing queries this way will ultimately allow for more expressiveness than something more concise. That isn't to say that this particular SQL is necessarily the best, but I think it works well for the problem at hand.

dwenzek · on June 1, 2019

Sure, SQL is powerful and I agree that any alternative would also bring its own complexity. However, my concerns are more about encapsulation rather than expressiveness and conciseness. Even if a watermark is used the same way in different queries, I will have to use the same query clause again and again; as I already have to with the join conditions. Not a fundamental issue, but something which makes me more difficult to focus on the kernel of a query.

EdwardDiego · on May 31, 2019

> However, I'm wondering if SQL is the right tool for the task.

In my anecdotal experience, yep, it's a data analysis language everyone in the industry knows - BAs, product managers, developers, and data scientists.

dwenzek · on June 1, 2019

Having a standard SQL extension definitely eases adoption and ability to migrate from one tool to another. But rather than adding and adding features to SQL; we maybe need a simpler way to extend/compose features.

On that point, I see a contradiction in the future work section of the paper where it said that "Experience has also shown that pre-built solutions are never sufficient for all use cases; ultimately, users should be able to utilize the power of SQL to describe their own custom-windowing TVFs". But, how can a SQL user add its own custom operator?

ddmd · on May 31, 2019

It seems that the main proposal is to make SQL watermark-aware. This additional "knowledge" will allow for producing groups and windows taking into account some late and out-of-order records.