
Fun with SQL: Window Functions in Postgres - bgentry
https://www.citusdata.com/blog/2018/06/01/fun-with-sql-window-functions-in-postgresql/
======
btilly
Window functions are actually part of the SQL 2003 standard.

Here is the best way that I've found to think about them. Sometimes you run
across a SQL problem that makes you think, "This would be easy if I just
sorted my table, and ran a program over it breaking it into partitions then
processing each partition in order." Whenever you do that, window functions
will solve that problem.

Note that in practice I've found that queries using window functions often are
best done by something of a form: SELECT ... FROM (SELECT ...) foo WHERE ...
GROUP BY ... where the window functions are all in the inner query, and the
outer query's group by collapses each partition into a single row.

~~~
blattimwind
Window functions are super handy to turn a correlated subquery into a join,
because they make group-wise limits fairly easy (just put a ROW_NUMBER()
concerning the relevant partition, i.e. what would be your GROUP BY in the
correlated subquery, in your query and select from that query filtering on
rank; then join on whatever you previously correlated on).

Basically: SELECT ... FROM (SELECT ..., ROW_NUMBER() OVER (PARTITION BY
my_group [ORDER BY...]) AS rank) AS foo WHERE foo.rank <= 10

Without window functions you can usually only do this using a correlated
subquery, which usually is much slower for fairly obvious reasons.

~~~
da_chicken
Yup, if all you ever learn is one windowing function, the "greatest N per
group" pattern with ROW_NUMBER() is easily the most common.

------
dbkaplun
There's a nice set of challenges for learning window functions and testing
your window function knowledge:
[https://www.windowfunctions.com/](https://www.windowfunctions.com/)

------
pjungwir
I've noticed that ntile(n) can give unexpected results when there are fewer
input rows than n, so cume_dist might be a safer choice (or percentile_disc or
percentile_cont). I wrote some more details here:

[https://stackoverflow.com/questions/27883999/how-to-
calculat...](https://stackoverflow.com/questions/27883999/how-to-calculate-
percentile-in-postgres/39535688#39535688)

This behavior means that ntile(100) is not the same thing as percentile.

But I've always wondered if I was just misunderstanding something. I wonder
what others think?

~~~
da_chicken
ntile() is one of those functions that just never seems to work the way users
expect regardless of the RDBMS. I don't know if the spec's definition was just
poorly thought out or what, but it's often flaky, poorly performing, or both
in many RDBMSs.

In PostgreSQL, you should generally use cume_dist(), percentile_cont() and
percentile_disc(). Most later releases of other RDBMS have similar window
functions that work more like users expect. I'm not certain what's up, but
it's definitely led me to believe that ntile() was just poorly thought out at
the spec level.

