
Window Functions in PostgreSQL - gabriel_b
http://www.gab.lc/articles/window_functions_postgresql.php
======
JohnBooty
Window functions are often a good answer to the "why SQL?" question.

Here's a real-world case where I used them lately. Suppose a user has an inbox
of messages from other users. I'd like to show the user her latest
conversations (aka "message threads"). Ten conversations per page, and I'd
like to show the three latest messages per conversation. That's fairly easy
with window functions like dense_rank() and row_number() - and it's flippin'
fast... just a few milliseconds on a table with thousands of rows and I could
probably get it lower than that with some index optimization.

    
    
      // "conversations" is a CTE (common table expression)
      // "messages" is the actual table
      // we don't need a CTE here but it's probably the most readable way
      // query is a little hairy because of all the greatest() and least()'s
      WITH conversations 
        as (
        select
          row_number() over (partition by greatest(from_id,to_id),
            least(from_id,to_id) order by created_at desc) 
            as conversation_message_number,
          max(id) over (partition by greatest(from_id,to_id),
          least(from_id,to_id)) as conversation_latest_message_id,
        mes.*
      from messages mes where 1234 in (mes.to_id, from_id) // 1234 is the member's id
      )
    
      // get conversations 1 through 10, and get the 3 latest messages
      select 
        dense_rank() over (order by conversation_latest_message_id) 
          as conversation_number, *
      from conversations
      where (conversation_number between 1 and 10) 
        and (conversation_message_number <= 3)
    
     

The alternatives are not a lot of fun. You could fairly easily implement this
logic in Ruby/Java/whatever but you'd have to pull a _ton_ of rows out of the
database and sort them in your app tier, which is kind of ugly performance-
wise even if you're not paying the cost of converting each row into a model
object w/ ActiveRecord/Hibernate/etc. I'm not familiar with NoSQL
implementations but I'm guessing you'd have to pull all those records into
your app tier and do it there as well?

~~~
Amezarak
> You could fairly easily implement this logic in Ruby/Java/whatever but you'd
> have to pull a ton of rows out of the database and sort them in your app
> tier, which is kind of ugly performance-wise even if you're not paying the
> cost of converting each row into a model object w/
> ActiveRecord/Hibernate/etc.

This, a thousand times this. I have done a lot of performance work with a
variety of web applications, and this is a common theme - pulling more data
into the app tier than you need to.

Putting even a small amount of effort into writing good queries will almost
always have a massive performance impact on your application. If performance
is at all a concern, this is one of the first places you should start looking
- you should be trying to bring over the smallest amount of data from the
database as possible.

I've found, though, that a lot of developers have conceptual problems dealing
with set-based operations. Not only does it look cleaner (depending on the
framework and design) to front-load the application, I think that many
developers simply find it much more intuitive to bring the whole dataset in
and then filter with some loops and conditionals and then use the 25% of
fields they pulled in that actually need to be displayed. If you forced them
to push more work onto the database, they'd simply write imperative-looking
SQL.

I'm not real sure how to get people to think more in set terms. Once you get
there though, well-written SQL is beautiful. Query optimizations for the most
part become obvious - though there will always be further optimizations
database experts can make because they have a better understanding of the
database engine - just try to make sure you're always operating on the
smallest possible sets of data to get the desired results. It's often not hard
at all to write highly performant queries even if they have huge numbers of
joins in large tables. And window functions are a big help when it comes to
doing this!

Sorry for the rant - I just see this so much. :)

~~~
JohnBooty

      I've found, though, that a lot of developers have 
      conceptual problems dealing with set-based operations.
    

Yes! I can't help but think that a lot of people are drawn to NoSQL (or find
themselves opposed to SQL for other reasons) simply because they don't
understand SQL. As somebody who avoided databases early in his career I
definitely understand this.

    
    
      well-written SQL is beautiful
    

Count me as a member of the club that believes this. Not every problem can or
should be handled with a relational data store, but for those that do, I don't
think the resultant set-based operations could be expressed much more cleanly
or succinctly than SQL allows.

~~~
buckbova
These NoSQL document stores seem to be a realization of that object database
dream of years past. It's not exactly one-to-one (objects to documents) but it
gets you there quicker than a relational database could.

I enjoy writing well crafted SQL queries as a currently employed database
architect. But I don't find it to be all that elegant. I do think it bridges
the gap between lay person and expert by being somewhat readable declarative
language. For those programmers out there it offers some procedural aspects,
differing from vendor to vendor.

Features like window functions and CTEs have helped to limit the temp tables,
derived tables, subqueries, etc required to achieve the desired output. One
can now make a much more terse query.

------
endersshadow
I find that window functions are one of the least used aspects of modern SQL,
despite them being incredibly useful. I've used them in T-SQL [1] and PL/SQL
[2], as well (MySQL does not do window functions). If you're doing lots of SQL
work, get very familiar with the OVER clause. It's a lifesaver.

[1]: [http://msdn.microsoft.com/en-
us/library/ms189461.aspx](http://msdn.microsoft.com/en-
us/library/ms189461.aspx)

[2]: [http://www.java2s.com/Tutorial/Oracle/0320__Analytical-
Funct...](http://www.java2s.com/Tutorial/Oracle/0320__Analytical-
Functions/0580__Window-Functions.htm)

------
rebelde
Whenever I see functions like these and PostgreSQL's JSON functions, I get the
feeling that MySQL/MariaDB is not progressing, or not nearly as fast.

For those who have worked with both, do you find the PostgreSQL-only functions
and features to be extremely useful or does it not really matter much?

~~~
cwyers
It really depends on your needs. I still am kind of in shock that Postgres is
willing to add these baroque sorts of features, but still doesn't have an
UPSERT.

~~~
neilc
UPSERT is not a simple feature to implement correctly:

[http://www.pgcon.org/2014/schedule/events/661.en.html](http://www.pgcon.org/2014/schedule/events/661.en.html)

------
ufmace
I get the idea that not too many people know how to use window functions, even
though they're a slick solution to some potentially messy problems. I've used
them several times to deal with duplicate data issues.

Got a table where you want to put a unique index on a column that has some
duplicate values? It's a pretty easy fix to use a window query along with
group by and having to get a list of rows with duplicate entries for the
column and use row_number to assign a unique number to each duplicate. Then,
use string concat to tack that number onto the end of the string. Select the
new string and the item ID, put it in a CTE, and connect it to an update query
to actually change all the names in one query.

Goes kinda like this: with DrawingDupes as (select d.DrawingId, d.RevTag,
'_DUPLICATE_' \+ cast((ROW_NUMBER() over (partition by d.Number, d.RevTag,
d.UnitId order by d.DrawingId)) as varchar) NewX from (select Number, RevTag,
UnitId from Drawing group by Number, RevTag, UnitId having COUNT(*) > 1) t
join Drawing d on d.Number = t.Number and d.RevTag = t.RevTag and d.UnitId =
t.UnitId) update DrawingDupes set RevTag = RevTag + NewX;

------
msluyter
Nice writeup, though I would have liked some examples using "partition by".
The main gotcha I recall -- and it's somewhat confusing the first time you run
across it -- is that things like last_row() don't work as expected without the
magic incantation:

over (partition by foo order by blah BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING)

This forces the query to evaluate the function over the entire partition,
rather then just over the current window. The article does mention this point,
but it bears re-iteration.

------
pjungwir
I'm a huge fan of window functions, but one restriction surprises me: in
Postgres, you can't put a window function into a WHERE clause. Does anyone
know if there is a good reason for this, or is it just a bit of missing
functionality?

~~~
JohnBooty
I'd love to know this too. I provided an example elsewhere in this discussion
of a potentially-complex query I implemented with window functions.

I wanted to do something like returning every row with a row_number between 1
and 3 - but discovered I couldn't put this into the WHERE clause.

The solution I came up with was to define a CTE where one of the columns is
the result of the row_number() function, and then I did a SELECT on the CTE
that included the WHERE functionality I needed.

------
midmoon
cant u use any function as window function (not only 11 ) ? i do alot stuff
like sum(1) over(PARTITION BY index1,index2 ORDER by ...) for creating
statistics. I wish i had that infos some time before :-)

~~~
JohnBooty
Yes, things like sum() can work on partitions. :)

"Window functions" refer to something a little different... specifically to
functions like row_number() and rank() that refer to the position/ordering of
rows inside a partition.

------
kalimatas
Thanks for sharing!

