Hacker News new | past | comments | ask | show | jobs | submit login

What i like most about window functions is that they give me a way to do a sort of 'extended group by' which i have always wanted.

If you want to know the highest salary in each department, that's easy:

  select department, max(gross_salary)
  from salary
  group by department
If you want to know who it is who earns that salary, you might try to do this:

  select department, first_name, max(gross_salary)
  from salary
  group by department
But this doesn't work, because it's meaningless to ask for first_name in a situation where you're grouping by department. You could ask for an aggregation of all names, but there's no straightforward way to ask for the name of the person who earned that salary. You end up having to write a join against the group by, as in the article, which is pretty grim, and falls apart if you want to order by multiple columns to break ties.

Window functions let you re-frame this kind of group by like this:

  select department, gross_salary
  from (
    select *, row_number() over (partition by department order by gross_salary desc) as n
    from salary
  ) _
  where n = 1
Because the outer query is no longer a group by, you can select any columns you like. The natural query works fine:

  select department, first_name, gross_salary
  from (
    select *, row_number() over (partition by department order by gross_salary desc) as n
    from salary
  ) _
  where n = 1
This only works where the group by is based on an aggregate function that picks one value, like min or max. I somewhat think it was a mistake to model that kind of thing as an aggregation in the first place. If SQL had a way of picking one row from a group, rather than aggregating over it, that would be immensely useful.



> If SQL had a way of picking one row from a group, rather than aggregating over it, that would be immensely useful.

You can do this with a LATERAL join, if you want to avoid the jankiness of window functions. Lateral joins are just a programmatic way to introduce a correlated subquery. For example

    SELECT department, first_name, gross_salary FROM
        (SELECT DISTINCT department FROM salary) depts,
        LATERAL (
            SELECT first_name, gross_salary 
            FROM salary 
            WHERE department = depts.department 
            ORDER BY gross_salary DESC
            LIMIT 3
        )
This uses a limit of 3 to show off top-3 instead of just argmax, but you could clearly set that to one if you wanted. This construction can be pretty handy if you need the per-group rows to be something other than what a window function could handle.


> If SQL had a way of picking one row from a group, rather than aggregating over it, that would be immensely useful.

Well, there is a way which is window functions :) as shown by you.

The idea to expect exactly one first name of the person with the biggest salary is kinda wrong, since there can be more than one person, and this can obviously not described as a single column per group.

Note that aggregating is not limited to min, max, sum, etc. Postgres, for example, has array_agg which aggregates individual columns from each record of a group into an array, if that becomes necessary.


> The idea to expect exactly one first name of the person with the biggest salary is kinda wrong, since there can be more than one person, and this can obviously not described as a single column per group.

That's true. For this idea to work, there would need to be some framework for picking exactly one row from a group. The simplest thing would be to raise an error if there were multiple candidates, but perhaps there are better ways. I think the operation of picking exactly zero or one things from a group is so common that it's worth making provision for it.


With Kinetica you can use the 'arg_max' aggregate function

  select department, max(gross_salary), arg_max(gross_salary, first_name)
  from salary
  group by department
https://www.kinetica.com/docs/concepts/sql.html#sql-aggregat...


Here's an alternative I use all the time (without window functions) that I don't think is widely known! Works on SQL Server and I think is more performant?:

  SELECT department, first_name, salary
  FROM salary AS s
  WHERE s.[salary] = (
    SELECT MAX(ex.[salary])
    FROM salary AS ex
    WHERE s.[department] = ex.[department])




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: