Why not directly allow each column to be specified as “NULLS DISTINCT”? unique (...

ww520 · on April 14, 2023

For the ORDER BY in grouped table (F868), it's kind of weird. I guess it's a special case with narrow usage.

In the example, product_id is unique (as PK) along with product_code being unique. There's one product_id for each product_code. A Group By on product_id will produce a list of groups where each group has one product_id and one product_code. A subsequent Order By on the groups by the product_code would work. I assume if there're multiple product_code's per product_id, the query won't work.

    CREATE TABLE product (
        product_id int PRIMARY KEY,
        product_name varchar,
        product_code varchar UNIQUE
    );

    SELECT product.product_id, sum(product_part.num) ...
    GROUP BY product.product_id
    ORDER BY product.product_code;

amluto · on April 14, 2023

Agreed. I read:

> it turned out that this was technically not allowed.

And thought “no, it turned out this query made no sense and had ill-defined results”.

But the rest of the description is odd:

> In particular, it was not allowed to have a grouped table ordered by a column that is not exposed by the SELECT list of the grouped table.

And I’m wondering if the example query was wrong and they meant:

    SELECT product.product_id, sum(product_part.num)
    FROM product JOIN product_part ON product.product_id = product_part.product_id
    GROUP BY product.product_id, product.product_code
    ORDER BY product.product_code;

In which case the change makes sense and is inoffensive.

Dylan16807 · on April 14, 2023

It's already grouping by the primary key of `product`. Why would you need to add another column in `product` to the group by?

jbverschoor · on April 14, 2023

If that’s the very specific case, then they should be clear about it. But my interpretation was that our order by ungrouped column

If it’s about preferring by non selected columns, that’d be fine.

Something is wrong somewhere imo

amluto · on April 14, 2023

I assume that query optimizers are able to detect a GROUP BY of a primary key and other columns in the same table and optimize accordingly. But using this trick to change the set of valid queries seems odd. I think I'd rather see special syntax for it:

    GROUP BY table.*

evanelias · on April 14, 2023

> you’re allowed to use values you haven’t grouped by and haven’t aggregated? Sounds like MySQL to me

The default settings of MySQL have forbidden this since MySQL 5.7, released in October of 2015: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.ht...

brabel · on April 14, 2023

Who comes up with an argument for considering nulls distinct??

Imagine a programming language where `if (variable == null)` always returned `false` because nulls were considered distinct. What kind of reasoning goes into thinking that's a great idea?

> If nulls are considered distinct, then having more than one of them won’t cause a unique constraint violation.

Gosh, I would've thought the spec was fine as it was, having to spell out that the DB should consider two null values equivalent for UNIQUENES constraints is like spelling out that "two integers that have the same mathematical value should be considered equivalent". I guess they need to add that in the spec if it's missing as well.

andrius4669 · on April 14, 2023

NULL is more of a declaration that 'there is no value', than mathematical value on its own. if you compare NULL with something you don't get false, you get NULL (which is often considered same as false).

some implementations carry the same "uncomparable" logic when checking for unique constraints, while others drop the ball here.

programming language wise, it's kinda like null pointer or float NaN, both of which are also occasionally complained about, eg. indexing maps by float values can do bogus things in case of NaNs.

edit: clarified some things

SoftTalker · on April 14, 2023

Imparting any meaning at all to NULL including non uniqueness or distinction seems fraught with problems. NULL is NULL that’s it. It means nothing else.

Dylan16807 · on April 14, 2023

If it can't be either behavior, then should nullable columns not be allowed in unique indexes at all?

SoftTalker · on April 14, 2023

No ideally not. If they are they must be ignored.

If other columns in the row index are unique regardless of the value of the nullable column, then the row is logically unique. So if in that case the NULL value is ignored, then OK. But two NULLs neither match each other nor are they distinct from each other. You just cannot say either way.

masklinn · on April 14, 2023

A few DBs actually do that.

It can be rather limiting, though also less surprising (at runtime) than either other behaviour.

DangitBobby · on April 14, 2023

The index only should apply over non-null values. I'm not sure how anything else makes any practical sense.

masklinn · on April 14, 2023

That is equivalent to the DISTINCT semantics.

First, it should be noted that the reason this toggle was added is because the spec was ambiguous and different databases have different defaults, specifically:

- BigQuery and Db2 forbid nullable columns in indexes, solving the issue

- most databases default to NULLS DISTINCT

- SQL Server defaults to NULLS NOT DISTINCT

- as usual Oracle huffes glue in the corner (it generally uses NULLS NOT DISTINCT, except in the case where every column in the constraint is null in which cases it uses NULLS DISTINCT —- I assume because it just doesn’t apply the constraint at all)

And of course something else makes sense, because SQL’s NULL covers multiple use cases (UNKNOWN, missing, intentionally left out) they leave a lot to the details of the modelling.

For instance let’s say you’re modelling an FS-like structure, you have records which have a parent and a name, and the name needs to be unique within a parent.

That constraint needs to apply the the top-level items still, and while you could model top-level-ness through a special-case hard-coded record, you could just as well model it by leaving the parent field NULL, in which case you’d need `UNIQUE (parent, name)` where “parent” is NULLS NOT DISTINCT. This is especially common under under organic growth.

Adding a separate boolean flag is redundant and distinctly ugly, I’m not sure expressions are supported by every db (and they’re not free) and while sometimes a sentinel value can be used, that’s not the case in general.

paulddraper · on April 14, 2023

2024 :)