
The SQL filter clause: selective aggregates - MarkusWinand
http://modern-sql.com/feature/filter
======
MichaelBurge
At an old job, me and another person wrote an in-house SQL-like dialect that
compiled to access our C++ data structures. It was kind of nice just throwing
whatever junk you wanted into your SQL without having to worry about any
standards.

Want a filter clause? Got it.

Need a weighted distinct count? Now it's part of the language.

Also, you can rephrase a lot of these SQL features as subqueries. It's
surprising how many database bugs you can find when you do it. Not so much in
Postgres, but I probably found a dozen in Redshift. I mean rephrasing this:

    
    
        select
          sum(x) filter (where x < 5),
          sum(x) filter (where x < 7)
        from
          generate_series(1,10,1) s(x)
        ;
    

as this(where t is either a permanent table, view, or CTE as appropriate):

    
    
        with t as (
          select * from generate_series(1,10,1) s(x)
        )
        select
          (select sum(x) from t where x < 5),
          (select sum(x) from t where x < 7)
        ;
    

Though for Postgres the filter will likely have better performance.

~~~
brudgers
Depending on the number of years ago, it may have been ahead of its time. C#
and Scala have SQL like language abstractions a few years ago via LINQ. Scala
has something similar.

~~~
duaneb
Scala doesn't have anything like this where the query is compiled monadically
before execution, although the data structures do expose excellent ways to
chain functional combinators. Squeryl (and there's a similar competitor)
implement this for SQL, and I'd say it's 80% there--still a lot of rough edges
fitting dynamic queries (e.g. number of tables joinable in a single statement)
into a static language.

~~~
virtualwhys
> Scala doesn't have anything like this where the query is compiled
> monadically before execution

Nonsense, Scala is arguably leading the way in the statically typed composable
query dsl department; see Slick [1] and Quill [2] among others.

Outside of Haskell's Esqueleto I am not aware of any statically typed query
dsl that comes remotely close to the aforementioned. LINQ to SQL/Objects
provides static query generation but doesn't compose; beyond that, what is
there?

[1] [https://github.com/slick/slick](https://github.com/slick/slick) [2]
[https://github.com/getquill/quill](https://github.com/getquill/quill)

~~~
Arnavion
>LINQ to SQL/Objects provides static query generation but doesn't compose

How do you mean? The result of a LINQ query is a sequence that can be used as
the input of another LINQ query.

~~~
virtualwhys
By LINQ query I mean LINQ to _SQL_ , not the basic LINQ collections
functionality (the equivalent in Scala being for comprehensions, BTW).

If you have an example of .NET based query composition to share I'd like to
see it; last I checked there was no such functionality available.

~~~
MichaelGG
LINQ to SQL ends up creating an expression on IQueryable. You can build this
expression up any way you want, including in bits and pieces. You can tack on
Wheres and Selects any time you like. It's more composable than pain text SQL
since the parts can go in any order.

It's only after you force it to materialize (I.e. call ToList) that you can't
further "compose" it.

------
Xophmeister
Syntactic sugar that's only natively supported in PostgreSQL? The site's
banner says, "A lot has changed since SQL-92". Be that as it may, but it seems
no one has really bothered catching up. I wonder why that is...

My guess is that such extensions, while useful, are somewhat marginalised
features in terms of usage. Thus, no one ever learns them formally and just
Googles for what they need -- if it comes up -- and get the CASE solution, in
this case (pun unintentional). Hence perpetuating that pattern. Also, of
course, the CASE solution is a lot more powerful as the returned expression,
that gets fed into the aggregate function, can be basically anything.

~~~
driusan
I think the most well-thought-out explanation I've seen for why widespread
support for the SQL standard seemed to stop after SQL-92 is this one:

[http://www.wiscorp.com/is_sql_a_real_standard.pdf](http://www.wiscorp.com/is_sql_a_real_standard.pdf)

~~~
Xophmeister
This was quite interesting; thank you for posting... I may be cheeky and
submit it to HN.

~~~
driusan
I'm fairly sure I originally saw it on HN.

------
westurner
You can do these with Ibis and various SQL engines:

* [http://docs.ibis-project.org/sql.html#aggregates-considering...](http://docs.ibis-project.org/sql.html#aggregates-considering-table-subsets)

* [https://github.com/cloudera/ibis/tree/master/ibis/sql](https://github.com/cloudera/ibis/tree/master/ibis/sql) (PostgreSQL, Presto, Redshift, SQLite, Vertical)

* [https://github.com/cloudera/ibis/blob/master/ibis/sql/alchem...](https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py) (SQLAlchemy)

------
codegeek
"with" is not supported by mysql but sqlite supports it ? wow, didn't know
that.

~~~
goldenkey
You can still do many subselect queries without with. Its really only an issue
with the more complex usages.

~~~
Xophmeister
I personally find CTEs much easier to read than subqueries.

~~~
wesd
and you can use them in multiple places. I'm not sure if you have the same
subquery twice if the query optimizer uses a single dataset.

------
rubyfan
Is this all that different from Hive's analytic functions? I don't think I can
put a WHERE filter right in the OVER clause though.

 _SELECT a, COUNT(b) OVER (PARTITION BY c) FROM T;_

[https://cwiki.apache.org/confluence/display/Hive/LanguageMan...](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics)

------
ibejoeb
For those on Oracle or MS SQL, be aware of the PIVOT and UNPIVOT features.
It's not as general as this, but it's terse and works well for the case, i.e.,
running an aggregate across specific buckets.

------
ams6110
If I'm understanding this correctly, you can also do this sort of thing as a
UNION of aggregate queries, each one with its own WHERE clause corresponding
to the desired FILTER.

------
d33
It would be nice to see an example for this kind of query.

BTW, SQL seems like a terrible language. For example, why would it need to
enforce ordering for keywords like ORDER BY, WHERE, GROUP BY? So many times I
had made a mistake that boiled down to putting them in the correct order...

~~~
return0
> SQL seems like a terrible language.

The reason you give is too trivial to justify calling it "a terrible language"

~~~
mamcx
That reason is in fact a good one. The order is "wrong" from the point of view
of a developer and _maybe_ ok from the POV of the original intent of SQL: The
_end-user_ was supposed to run SQL against databases. Few ad-hoc queries, even
complicated, can be managed. But using SQL all the time, and the language is
_clearly_ the wrong thing to talk to the DB (is like JS: Is what you have
instead of what you want).

Also: The fact that everyone run (even if eventually fail) for a ORM instead
of SQL, or the way people imagine NOSQL is "better" -is very telling is NO
_SQL_ \- show this.

The relational model is so simple and powerfull, but SQL is not a good seller
of the idea.

~~~
bluejekyll
Is there another language agnostic option?

I suppose it would be interesting if you could use something like the llvm to
target a DB's core features, but through your language of choice.

I haven't seen anything like that, if it exists. SQL is that standard API
today, and ORMs provide the language specific implementation (though sometimes
in ways that make it inobvious what the generated SQL will look like).

~~~
pjungwir
I actually like SQL, but I'm glad people are willing to ask, "Can we do
better?" I think for a working programmer, SQL is the only practical choice,
but Tutorial D is an interesting attempt to improve (by C.J. Date, described
at length in his books Database in Depth and The Third Manifesto), with an
open source implementation called Rel. There is also a commercial database
called Dataphor that uses D4, a cousin to Tutorial D. There is also a newer
project called Andl that is another non-SQL relational query language.

Personally, although I'm not a researcher, I've become intrigued by ways we
model change-over-time in relational databases. Snodgrass's book _Developing
Time-Oriented Database Applications in SQL_ is a good entry point to all that,
and more recently we have SQL2011's Temporal standard. I still have a lot to
read here myself, but I am keen to see what happens in the next ten years in
how the relational model can incorporate _change_.

~~~
smt88
SQL is a lot like JavaScript: a lot of people hate working with it, but it's
the only practical choice, as you say.

It's interesting to note that transpiling to SQL is even more popular than the
recent trend of transpiling to JavaScript. Every major language has a query
builder (many of which end up with syntax similar to Andl). We're abstracting
away from SQL and have been for a long time.

With that in mind, I wonder whether we need a new query language so
desperately. SQL is so highly optimized in the major RDBMS that any new
language would probably compile internally to SQL, meaning it has the same
semantics (and is essentially sugar) _OR_ it would be way behind SQL in terms
of optimization.

Am I just totally off base here?

~~~
pjungwir
After I wrote that comment I thought it might be worth submitting this Andl
story to HN about Andl-on-Postgres: [0]. They write, "Relational expressions
are converted into equivalent SQL and passed back to the Postgres SPI
interface for execution," so you are right about transpiling. But the software
world is more mature than it was 20 years ago, so it seems reasonable for a
project to get a toehold by building on top of existing systems. That needn't
limit its future growth necessarily. It sounds like Andl will work on more
than just Postgres, so I can imagine them eventually building a "native"
version too. Using existing RDBMSes is a pretty good way to get an MVP though.
:-)

[0]
[https://news.ycombinator.com/item?id=11802917](https://news.ycombinator.com/item?id=11802917)

~~~
smt88
> _Relational expressions are converted into equivalent SQL and passed back to
> the Postgres SPI interface for execution_

I actually don't think I like that approach.

If I transpile my own queries (meaning I use a query builder), then I can
cache the generated SQL. There would be some placeholders for arguments, but
the query itself doesn't change.

Andl's approach could add significant overhead in certain applications where
lots of queries are being executed at once.

And then what happens to EXPLAIN support? Is it harder to debug an Andl query?

