
Seldomly Used SQL: The Bool Aggregate Functions - neovintage
http://neovintage.org/2016/05/04/the-bool-aggregate-functions/
======
nickpeterson
I don't know if postgres allows one to treat boolean as 0 and 1 natively but
in t-sql, the equivalent would be something like:

    
    
      DECLARE @orders TABLE (
          category VARCHAR(5),
          express_delivered INT 
      );
      
      INSERT INTO @orders (category, express_delivered) VALUES ( 'food', 1);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'food', 0);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'shoes',0);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'shoes',0);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'auto', 1);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'auto', 1);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'book', 1);
      INSERT INTO @orders (category, express_delivered) VALUES ( 'book', 0);
      
      SELECT 
          category, 
          CASE WHEN SUM(express_delivered) = 0 THEN 0 ELSE 1 END AS ever_been_express_delivered
      FROM @orders
      GROUP BY category

~~~
aetherson
That's fine for an "or" aggregation, but it's harder for an "and" aggregation,
in which case I guess you'd have to say:

select category case when sum(express_delivered) = count(express_delivered)
then 1 else 0 end as always_been_express_delivered from etc.

~~~
MSM
I think instead of trying to shoehorn the "AND" that he discusses in the
article into a case statement you'd just look for each category to see if
there has been a non expressed delivered package.

    
    
      SELECT 
          category, 
          ISNULL(NonExpress.NonExpress, 0) AS ever_been_non_express_delivered
      FROM @orders O
      OUTER APPLY (
         SELECT TOP 1 1 as [NonExpress]
         FROM @Orders O2
         WHERE O2.Category = O.Category
            AND express_delivered = 0
      ) NonExpress
      GROUP BY category
    

Edit: Changed the logic up, the original code's field was contradicting my
logic.

------
netcraft
Very cool. It looks like you can also use `every()` instead of `bool_and()` -
and also note that you can provide any boolean value to these, so you can do
something like `every(column = 'value')...`

------
rhinoceraptor
I love `json_build_object()`.

With `json_build_object()`, you can select fields as a JSON object and not
have to do that JSON building in your app.

Here's a snippet demonstrating building, and then querying a JSON object:

    
    
      with location as (select json_build_object(
      	'street1', street1,
      	'street2', street2,
      	'city', city,
      	'state', state,
      	'zip', zip
      ) as loc from my_table)
      
      select
      	loc ->> 'street1' as street1,
      	loc ->> 'street2' as street2,
      	loc ->> 'city' as city,
      	loc ->> 'state' as state,
      	loc ->> 'zip' as zip
      from location;

~~~
kbenson
Honest question, why do you consider this better? I know there are development
strategies that prefer to put as much model control as possible in the DB, but
there are others that prefer to keep the DB more clearly as data, and not the
end representation of that data (it's a trade-off, I don't see one as clearly
better than the other. One allows for tighter control of data, the other
allows more scalability and the use of more tooling).

I guess my question is, to you, what's the main benefit of building JSON in
the inthe DB instead of your app?

~~~
matthewmacleod
There are some quite impressive performance gains that can be exploited -
especially beneficial if you're using a framework with higher overheads. Using
Rails, for example, it's possible to generate JSON output directly from
Postgres and stream it to a client, instead of instantiating zillions of
ActiveRecord objects just to query flat fields.

There is an obvious trade off of flexibility, but this technique is beneficial
in that it allows rich object graphs through the ORM where required, while
performing well for simple operations.

~~~
kbenson
I see that more as working around the limitations of the ORM, not really as an
inherent strength of doing it inthe DB. For example, in Perl and using
DBIx::Class, I would get around that problem by applying another ResultClass
to the ResultSet (the HashRefInflater ResultClass), so instead of inflating
results into DBIx::Class objects, it returns a hash directly. With some of the
helper classes, this is as simple as changing $result_set->search( \%criteria
)->all; to $result_set->search( \%criteria )->hri->all;. This allows you to
move the serialization task to the controller, and in what should be a fairly
efficient way. I can't imagine ActiveRecord doesn't have something equivalent.

Now, where I could see returning JSON being _really_ useful is if you can
build complex structures out of it, in a way that follows constraints. E.g.

    
    
        {
          "title" : "Cool Hand Luke",
          "released" : "1967-11-01",
          "cast" : [
            { "name": "Paul Newman" },
            { "name": "George Kennedy" },
            ...
          ],
        }
    

If that can be efficiently generated on the DB, that could help immensely with
the current state of prefetching relationships, which to my knowledge,
currently requires either redundant data through joins, and is thus
increasingly inefficient the more items you relate (such as with DBIx::Class's
prefetch), or uses multiple queries (ActiveRecord's :include), which includes
either processing between queries or duplicating portions of results in
subsequent queries to get the correct subset of relations.

------
xxbondsxx
Doesn't this become less valueable as the data grows? You'll essentially
always have at least one "true" value, and at that point you're basically
doing the query of:

SELECT cateogry FROM orders WHERE express_delivery=TRUE GROUP BY category

Also are these aggregate functions as efficient?

~~~
mjevans
It still has to get the data from every row, unless you happened to make an
index that correlates the two columns in just the right order.

This is where a more effective question would be more useful.

A question like 'in the last quarter, how many sales did we have for each
category and shipping type?' You can then take the results and calculate more
useful values like the percentage of express shipments, etc.

------
cjauvin
It's actually the first time I'm seeing the `group by 1, 2` syntax.. is that
Postgres-specific?

~~~
coredog64
"group by 1,2" is for lazy people who never have to support an application in
production. It tells the database to group by the columns in the order that
they come back in the result and is supported by most engines.

~~~
ak4g
Much like SELECT * or JOIN foo USING(bar), it's the kind of thing that greatly
speeds up interactive queries, but shouldn't ever end up in source control.

~~~
JoeCamel
Why "JOIN foo USING (bar)" shouldn't be used?

~~~
ak4g
Because with USING, you're precluded from using the full table.column
identifier in your JOIN condition. Without that, a future ALTER TABLE that
adds another "bar" column to an existing table in the FROM-list will cause a
previously-working query to break.

[https://gist.github.com/AdamG/1128b86fdafca1e53a4bd5253a6882...](https://gist.github.com/AdamG/1128b86fdafca1e53a4bd5253a6882a2)

------
jnpatel
Neat! I use the Python equivalents, `any()` and `all()` quite frequently.

------
kristianp
The title should be "seldomly used Postgres-specific SQL".

