
Handling Null Values in Postgres - davidquilty
https://www.percona.com/blog/2020/03/05/handling-null-values-in-postgresql/
======
ainar-g
They forgot "IS NOT DISTINCT FROM", which is, IMHO, the best god damn operator
ever invented. It works like "=", but adequately responds to NULL:

    
    
      postgres=# SELECT 1 IS NOT DISTINCT FROM 1;
       ?column? 
      ----------
       t
      (1 row)
      
      postgres=# SELECT NULL IS NOT DISTINCT FROM NULL;
       ?column? 
      ----------
       t
      (1 row)

~~~
carterehsmith
"SELECT NULL IS NOT DISTINCT FROM NULL" ?column? \---------- t

That bothers me. It is like saying that "infinity" (+inf) equals other
infinity. Which is just wrong.

Some other SQL dialects will return "UNKNOWN" rather than "t" (true), which
imho makes way more sense.

~~~
Tostino
...I think you are misunderstanding the whole reason for the IS NOT DISTINCT
FROM operator, which is exactly that.

"SELECT NULL = NULL" will return NULL, use that if you want the standard way.
There are many cases where you want it to return true though, and the
boilerplate for that sucks, so IS NOT DISTINCT FROM has a very useful place in
the dialect.

------
jeffdavis
As I wrote before:

[http://thoughts.davisjeff.com/2009/08/02/what-is-the-deal-
wi...](http://thoughts.davisjeff.com/2009/08/02/what-is-the-deal-with-nulls/)

"NULL semantics are designed to make you think you understand them, and think
that the semantics are intuitive, and think that it’s part of some ingenious
consistent system for managing missing information. But none of those things
are true."

Any explanation you come up with does not generalize to other cases. NULL is
just a bunch of special cases piled on top of each other, and really only kind
of works because SQL is not designed to be very extensible (and to the extent
it is, the standard doesn't give much guidance on how NULL should behave in
these extensions).

~~~
iaabtpbtpnn
SQL needs to have NULL, because otherwise how does an outer join work? And
once NULL is there, you need rules for how it interacts with other values.
What is the alternative?

~~~
patrec
Why would you need outer joins any more than nulls?

~~~
pavel_lishin
Show me all schools in the system, and their associated teachers, including
schools which have no teachers.

~~~
patrec
Easy:

    
    
       select school, (select array_agg(teacher) from teachers where teachers.school = schools.school) from schools;
    

If you don't like nulls any more than outer joins, you'd probably want:

    
    
       select school, (select coalesce(array_agg(teacher), '{}'::text[]) from teachers where teachers.school = schools.school) from schools;

~~~
derefr
You realize that joins are a part of the formal definition of relational
algebra, while subqueries aren't, right? SQL has had outer joins since the
beginning, but subqueries came only gradually, and more limited SQL engines
tend still to not support them.

~~~
patrec
Why would SQL (fundementaly) need outer joins? "Because it has them", is not
really a compelling answer in this context.

A "pure" relational model (like the one C.J. Date has been going on about for
decades) has no outer join.

------
adamzochowski
An easy way to think about database nulls is how you would answer a question
in real life about unknowns.

I have a friend James, and another friend Jane. I don't know their ages. Are
they of the same age?

Or Jane has a car. James has a blue car. Do car colours differ?

SQL provides human answers: explicitly answers "I don't know".

Other languages treat null as if it was just another value, providing very
confusing answers. I wish all programming languages would use SQL style of
null.

~~~
kingdomcome50
Eh... it's exceedingly common to want to determine whether or not two
expressions both evaluate to "unknown" (e.g. null = null). I literally wrote a
query that needed to respect null equality yesterday. Extending your examples
it would be the equivalent of asking:

"Is the answer to 'Are they of the same age?' the same as the answer to 'Do
car colors differ?'"

Both answers are "I don't know". So... are they the same answer? The human in
me says "Yes".

Of course you will recognize the similarity between the above and your first
example. The interesting part is in the subtle difference between:

"Are they of the same age?" \--> "I don't know"

and

2\. "Is the answer to 'What is James' age?' the same as the answer to 'What is
Jane's age?'" \--> "Yes"

The answer to number 2 relies on the fact that I _know_ I don't know _either_
of their ages. This extra piece of knowledge allows me to "lift" null into a
value rather than the absence of. It's the difference between:

`Maybe<T>.value == Maybe<T>.value`

and

`Maybe<T> == Maybe<T>`

~~~
irishsultan
The answer to "Are they the same age?" is not really "I don't know". Try
answering "I don't know" to a question during a quiz or an exam, it's very
unlikely that your answer will be considered correct (even if it's actually
true that you don't know it).

You don't know the answer to the question "Are they the same age" (but it's
either true of false), and you don't know the answer to the question "Do car
colors differ" (it's either true or false). If you don't know the answer to
either question then you don't know whether the answers differ either.

------
jeffdavis
Quiz:

    
    
        -- Find all customers with total orders
        -- less than $100 in 2019
        SELECT c.name
          FROM customers c LEFT JOIN orders o
          ON (c.id = o.customer_id)
          WHERE DATE_TRUNC('year', o.ts) = '2019-01-01'::timestamptz
          GROUP BY c.name
          HAVING SUM(o.price) < 100;
    

Can you find the problem?

[ SPOILER ]

The problem is if there is a customer with no orders. The left join will
produce NULLs for the attributes of "orders", and SUM() will sum them up to
NULL because it thinks NULL means "nothing". The predicate "NULL < 100" will
then think the NULL means "unknown" and produce the result "unknown". Then,
the HAVING clause treats NULL like FALSE, and does not produce the row.

~~~
skrebbel
This is not really a problem with NULLs though, but primarily with how SQL
forces you to put any kind of result in a rectangular table-shaped resultset.
Because of this, a LEFT JOIN returns a single "order" row both if there is
exactly 1 order for that customer, but also when there are no orders. That's
just nuts.

If a JOIN could somehow return hierarchically structured data then there
would've been no gotchas here.

~~~
jeffdavis
Well, it's many problems, that's the fun of the example.

But yes, I think left join is fundamentally about grouping, and grouping would
be better solved with set-valued attributes or relation-valued attributes.

I wouldn't call that "hierarchically-structured data" because that has the
connotation that the data is actually stored in a hierarchy, which I think is
generally a bad idea (though not always).

------
earthboundkid
Null dates and null numbers are sometimes useful. Null strings? No. It
basically never makes sense to distinguish between text/varchar '' and NULL.
If you really need to distinguish between "no middle name" and "middle name
not provided" (why would you need that?), just add a boolean
middle_name_known. NULL creeps into data in all sorts of ways, so you can't
trust that the people doing your entry are actually using the semantics you
want for null text. Add a boolean field and make them explicitly commit to the
semantics.

~~~
cavanasm
Came here for this. I've had so many struggles with big enterprise data where
engineers who design the schema think empty strings and nulls should have
meaningful distinctions, but the humans involved in creating that data (who
are not engineers) somehow break those assumptions, so when the data migrated
elsewhere, inevitably it gets collapsed back down to null and empty string
being identical.

------
teddyh
> _it is quite evident that NULL is defined as “0” in C language._

It is actually the other way around. The bare numeral 0, in C, is defined to
be the null pointer when the numeral 0 is encountered in a pointer context.
This does not mean that the null pointer actually has anything to do with the
value 0. The succeeding if clause in the C program in the article, the one
which prints “NULL is 0”, is not even using any pointers! Since the C
preprocessor is text-based, the token sequence “if(0 == NULL)” is replaced by
the preprocessor with “if(0 == 0)”, which the compiler will interpret as an
_integer_ comparison, not a pointer comparison.

For further reading, see [http://c-faq.com/null](http://c-faq.com/null)

------
tabtab
They way null strings are dealt with under default functions and operators has
been a huge industrial mistake. It creates code bloat and mistakes that
otherwise would be rare. Oracle mostly got it right.

And in general, "x <> 5" should be "true" if x is null, and not return null.

The justifications are mostly theoretical, not practical. I'd rather have 10
bugs from misuse of theory than 100 from real world. Theorists seem to magnify
the "sin" value of theory-related bugs compared to actual bugs. I count them
the SAME, and under that scoring formula, we are doing it wrong.

------
crimsonalucard
Null may have technical meanings in languages like C++ (0) but essentially the
conceptual meaning of a null is "No Value" in every language.

It's a huge mistake in every language including SQL.

If you have an entity of a type. Say Int. What does NULL mean in context of
that type?

    
    
      X = NULL
      Y = NULL + 1
    

It doesn't matter what language you operate in there is no NULL + 1, because
NULL is not an INT. It just doesn't exist. So your programming language, if
it's good, it should throw an error. If it's a shit it will continue to
propogate the y deep into your program and you won't even know X was a null.

If you think a default to zero works imagine this:

    
    
      X = IntFunctionThatErrorsOutAndReturnsUnknownValueAkaNull()
      Y = X + 3
    

Does it make any sense to you that Y will be 3? X was unknown! You don't even
know that the function errored out! Either
IntFunctionThatErrorsOutAndReturnsUnknownValueAkaNull should have thrown an
error or it should have returned a special type called an Optional Type.
Throwing an error isn't ideal because you won't know something is wrong until
runtime so the Optional Type is the better deal here. Either way neither
approach is taken and instead you have a null and Y propagates through your
program as if NOTHING happened.

Unfortunately SQL doesn't offer The Optional type or sum types therefore we
are stuck with Null.

So if you do this in a language that defaults nulls to zeros:

    
    
       x = getFirstSqlColumnAsList("SELECT * FROM integerTableWithOnlyNullValues;")
       y = x[0] + 1
    

boom you get y a garbage value propagating through your entire program thanks
to SQL nulls. SQL isn't innocent here.

Now imagine Optional types with exhaustive pattern matching:

    
    
      x: Maybe(int) = getFirstSqlColumnAsList("SELECT * FROM integerTableWithOnlyNullValues;")
      y = x match {
             Just x: Just(x[0] + 1),
             Nothing: Nothing
           }
    

If you can't understand why the above code is so much better than the previous
code than you need to look up the Maybe Monad with algebraic data types.
Basically the code above makes sure that there is Zero chance that an error is
unhandled.

~~~
ibrar74
Yes, that's the problem this blog try to explain. NULL means "no value", but
this is not true for all the languages. So you better know what is the value
of NULL in the language you are trying to use.

------
say_it_as_it_is
Unless you're using an ORM that is replacing "= NULL" to "IS NULL" for you,
you should know that there's a configuration within Postgres to change to do
so instead:
[https://postgresqlco.nf/en/doc/param/transform_null_equals/](https://postgresqlco.nf/en/doc/param/transform_null_equals/)

~~~
jeffdavis
I don't recommend that setting, personally. If you do use it, I recommend it
only for migration purposes with the idea that you go back to the default
setting soon.

Configuration settings that change SQL semantics are just too dangerous.

From the official docs:

"Note that this option only affects the exact form = NULL, not other
comparison operators or other expressions that are computationally equivalent
to some expression involving the equals operator (such as IN). Thus, this
option is not a general fix for bad programming."

[https://www.postgresql.org/docs/12/runtime-config-
compatible...](https://www.postgresql.org/docs/12/runtime-config-
compatible.html#GUC-TRANSFORM-NULL-EQUALS)

I know it seems tempting to use this setting as a convenience, but I strongly
recommend against it.

~~~
say_it_as_it_is
Thanks. "= NULL" is not benign, either. It causes its own set of issues.

------
eMSF
>The output of the above program will be “NULL is 0”, so it is quite evident
that NULL is defined as “0” in C language.

To be pedantic, all that's evident from the observed behaviour is that in C,
NULL compares equal to the integer literal 0. The expressions may be, and
usually are not mutually interchangeable both ways, as they (may) have
different types.

------
mcdoker18
Can someone explain the meaning to use 'x == NULL' expression? Why does not
PostgreSQL prohibit usage of this expression?

~~~
ibrar74
It is valid syntax, you are comparing an "x" with "no value" and result will
be "no value"

~~~
mcdoker18
I know that it's a valid expression, but are there any use cases for it?

------
mfrye0
One thing that threw me off is upserts with NULL in Postgres. NULL != NULL so
it will continually create new rows.

------
asah
30 yrs in the business and I never run into NULL semantics issues... because I
virtually never use NULL (!)

my experience: it's far better to explicitly reserve a value for NULL. In
fact, I go the other way and ask when NULL is worth adding?

example: in your datatype NULL-values exist but there's no easy way to
represent them otherwise, i.e. you can't reserve a value because they're truly
all used by real data.

example: you have a slew of BOOLEAN columns and NULL is a legitimate third
value in the dataset. The alternative (char(1)) can be a lot of overhead vs
the bitmask optimizations that most SQL engines offer (but test this!!!)

hope this helps!

p.s. having studied the history and known some of the people responsible... if
we gave them a time machine, they'd first kill Hitler... and then undo this
decision.

------
jeffdavis
I wish SQL had used a Maybe/Option type, or other algebraic data types,
instead of NULL. So much safer and cleaner.

~~~
earthboundkid
What would the behavior of the option type be? I don't see how it would behave
differently than what SQL does today.

~~~
jeffdavis
Option types can require exhaustive matching. So where there can be a NULL,
you would have to address it. The syntax can make it pretty easy to handle
typical cases.

------
exabrial
Related: Did you know that MySQL has a null safe operator? Absolutely life
changing:
[https://www.knowledgewalls.com/johnpeter/books/mysql/nullsaf...](https://www.knowledgewalls.com/johnpeter/books/mysql/nullsafe-
equal-and-not-equal-operators-in-mysql)

