For the ORDER BY in grouped table (F868), it's kind of weird. I guess it's a special case with narrow usage.
In the example, product_id is unique (as PK) along with product_code being unique. There's one product_id for each product_code. A Group By on product_id will produce a list of groups where each group has one product_id and one product_code. A subsequent Order By on the groups by the product_code would work. I assume if there're multiple product_code's per product_id, the query won't work.
CREATE TABLE product (
product_id int PRIMARY KEY,
product_name varchar,
product_code varchar UNIQUE
);
SELECT product.product_id, sum(product_part.num) ...
GROUP BY product.product_id
ORDER BY product.product_code;
> it turned out that this was technically not allowed.
And thought “no, it turned out this query made no sense and had ill-defined results”.
But the rest of the description is odd:
> In particular, it was not allowed to have a grouped table ordered by a column that is not exposed by the SELECT list of the grouped table.
And I’m wondering if the example query was wrong and they meant:
SELECT product.product_id, sum(product_part.num)
FROM product JOIN product_part ON product.product_id = product_part.product_id
GROUP BY product.product_id, product.product_code
ORDER BY product.product_code;
In which case the change makes sense and is inoffensive.
I assume that query optimizers are able to detect a GROUP BY of a primary key and other columns in the same table and optimize accordingly. But using this trick to change the set of valid queries seems odd. I think I'd rather see special syntax for it:
Who comes up with an argument for considering nulls distinct??
Imagine a programming language where `if (variable == null)` always returned `false` because nulls were considered distinct. What kind of reasoning goes into thinking that's a great idea?
> If nulls are considered distinct, then having more than one of them won’t cause a unique constraint violation.
Gosh, I would've thought the spec was fine as it was, having to spell out that the DB should consider two null values equivalent for UNIQUENES constraints is like spelling out that "two integers that have the same mathematical value should be considered equivalent". I guess they need to add that in the spec if it's missing as well.
NULL is more of a declaration that 'there is no value', than mathematical value on its own. if you compare NULL with something you don't get false, you get NULL (which is often considered same as false).
some implementations carry the same "uncomparable" logic when checking for unique constraints, while others drop the ball here.
programming language wise, it's kinda like null pointer or float NaN, both of which are also occasionally complained about, eg. indexing maps by float values can do bogus things in case of NaNs.
Imparting any meaning at all to NULL including non uniqueness or distinction seems fraught with problems. NULL is NULL that’s it. It means nothing else.
If other columns in the row index are unique regardless of the value of the nullable column, then the row is logically unique. So if in that case the NULL value is ignored, then OK. But two NULLs neither match each other nor are they distinct from each other. You just cannot say either way.
First, it should be noted that the reason this toggle was added is because the spec was ambiguous and different databases have different defaults, specifically:
- BigQuery and Db2 forbid nullable columns in indexes, solving the issue
- most databases default to NULLS DISTINCT
- SQL Server defaults to NULLS NOT DISTINCT
- as usual Oracle huffes glue in the corner (it generally uses NULLS NOT DISTINCT, except in the case where every column in the constraint is null in which cases it uses NULLS DISTINCT —- I assume because it just doesn’t apply the constraint at all)
And of course something else makes sense, because SQL’s NULL covers multiple use cases (UNKNOWN, missing, intentionally left out) they leave a lot to the details of the modelling.
For instance let’s say you’re modelling an FS-like structure, you have records which have a parent and a name, and the name needs to be unique within a parent.
That constraint needs to apply the the top-level items still, and while you could model top-level-ness through a special-case hard-coded record, you could just as well model it by leaving the parent field NULL, in which case you’d need `UNIQUE (parent, name)` where “parent” is NULLS NOT DISTINCT. This is especially common under under organic growth.
Adding a separate boolean flag is redundant and distinctly ugly, I’m not sure expressions are supported by every db (and they’re not free) and while sometimes a sentinel value can be used, that’s not the case in general.
So that means you’re allowed to use values you haven’t grouped by and haven’t aggregated? Sounds like MySQL to me