
Null Values in SQL Queries - maynman
https://mitchum.blog/null-values-in-sql-queries/
======
juped
>For example, Oracle database won’t allow you to have an empty string. Anytime
Oracle database sees an empty string, it automatically converts the empty
string into a NULL value.

Damn. This is how you do enterprise.

I might be the only person who likes SQL nulls. If you learn how they work up
front, they're useful and not really that confusing. But if I ran into weird
behaviors like this, I might hate them too.

~~~
SigmundA
No I am tired of this WHERE (a = b) or (a IS NULL AND b IS NULL)

Null should equal null like in every other programming language even SQL group
by do null equals null which is even more inconsistent.

~~~
WorldMaker
Something to petition ANSI about.

(Microsoft's SQL Server still defaults to the non-ANSI NULL behavior where a =
b when both are NULL, and that's something that still pings on checklists of
SQL Server if it follows ANSI standards. SQL Server is kind enough to let you
enable/disable the behavior, and likely that would persist even after the
default switches to meet the standard as the docs assure will happen "in some
future version".)

~~~
zvrba
Actually I find this a very practical behavior. It allows me to have NULLs in
columns participating in UNIQUE indexes. At least with indexes, it's very
intuitive and useful.

------
wefarrell
Null values and inequality are extremely counterintuitive (in postgres at
least). If you run the query:

    
    
      SELECT * FROM my_table WHERE my_column != 5
    

You would expect it to return rows that have a null value for my_column, since
null is not 5. However that is not the case.

~~~
magicalhippo
I think the issue here is that SQL should have more "NULL variants" to express
why there is no concrete value.

A NULL value technically means it's unknown. An unknown value might be 5,
hence why it's not in the result set. Some abuse NULL to mean "value doesn't
exist". But a value that doesn't exist can't be 3, or 42, or any other value
that's different from 5, so in that regard shouldn't be part of the result set
either.

Others again abuse NULL to mean "doesn't apply". And in that case I think it
makes sense to include the row in the result set. For example, if I write a
query to get all people who's middle name is not "William", I'd most likely
want people without middle names included.

Maybe we should have introduced NEX (non-existing) and NAP (non-applicable) as
possible values in addition to NULL?

~~~
tabtab
Re: _I think the issue here is that SQL should have more "NULL variants" to
express why there is no concrete value._

No, that would muddy things in my opinion, like it did to JavaScript. Instead,
have more operations/functions for dealing with them in a more "normal" way,
so that we can say "WHERE x <> 5" and get results one expects. I'm not sure
the syntax, and my drafts would take a lot of time to explain. To give a
taste, maybe have something like "WHERE ~x <> 5" in which the tilde converts
x's value to the type's default, such as a blank in the case of strings.

If the different reasons for "emptiness" matter, then usually it suggests the
need for a "status" column of some kind so that queries can be done on the
reasons. I'd need to study domain specifics to recommend something specific.

~~~
magicalhippo
But that would mean you need to be aware that the column can have this
property, no?

Continuing with my middle name example. Say I and everyone I knew had middle
names, so I write a database including a required middle name column. Later I
discover not everyone has middle names, and so I need to relax the
restriction.

In your case, I would change the column to accept NULLs, and I'd have to
remember to go over my query to add the ~ operator.

In my case, I'd change the column to accept NAPs (or whatever) and since a NAP
value would behave differently to a NULL for <> (and other operators), I
wouldn't need to change my query.

~~~
tabtab
Re: _I 'd have to remember to go over my query to add the ~ operator._

I'd almost always use it no matter what. In fact if starting SQL over via a
DeLorean, I'd reverse it to require "~" to make it null-sensitive.

It's analogous to case-sensitive comparing. The vast majority of the time you
don't want case-sensitive comparisons such that case-insensitive should be the
DEFAULT, and you only add extra specifiers/functions if and when you need to
do case-based comparing.

If not, then you either end up with SQL full of "toUpperCase(x)", or problems
keep popping up if somebody typed stuff wrong or the UI coders messed up.

Similarly, if your SQL is full of NVL() or DENULL() or whatnot, it means the
language was poorly factored for your domain, or even most domains. It flunked
D.R.Y. in design, such as comparisons having the wrong default behavior.

~~~
magicalhippo
Yeah that'd be interesting.

------
xivzgrev
Not sure what big deal is. You learn somewhere along the way that you check
for null values with “is” vs “=“. Done, write it on a sticky note if you need,
and move on.

“Why isn’t it consistent??” - well a lot of systems have a lot of bat shit
crazy inconsistencies, some times there for good reason. You learn to keep
them straight and get your shit done.

If you want to learn the “why” every time you encounter a system design quirk,
be my guest but you may be going down a time intensive rabbit hole with little
pay off for yourself.

~~~
Jaxan
Sometimes it’s easier to remember the why than the what. Then it does make
sense to learn about it!

------
mwexler
Null values are so important in representing data. But they cause so much
confusion in a) unexpected behaviors in queries and b) inconsistent handling
across various engines... I sometimes wish <whisper> that they hadn't been
included in the spec at all </whisper>. But then I come to my senses again,
and go fix yet another bugged query for an analyst who didn't account for
nulls in the data.

~~~
paulryanrogers
Does it make sense to coalesce them away in a view? I thought most analysts
are given star schemas implemented by views or ETL'd data anyway.

~~~
mwexler
Depends on the level of sophistication of the analyst, and if nulls have a
meaning or value to the result.

Also, at a certain point, knowing that nulls are present gives you yet another
measure of dq: not knowing if they are present and hidden vs. visible and
countable can be the difference between a wrong answer vs. just an uncertain
one.

------
tabtab
A pet peeve of mine is concatenating null strings. It's like a poison pill
that nulls the whole result. 99.99% of the time that's NOT what one wants
domain-wise. Maybe the standard should make another concatenation operator
that treats null strings as zero length strings. Sure, one can de-null each
string in the expression, but that's ugly anti-DRY code. Please fix it, I
haaate that.

~~~
paulryanrogers
Agreed that it makes the pipe concat operator a lot less useful. Now
PostgreSQL and MySQL both have CONCAT_WS which does replace NULLs with empty
strings. It's also nice when you do need a common separator between all
elements.

------
sashavingardt2
Now here's a blast from the past! 20 years ago this was common knowledge. Now
it's making headline news on HN. SQL is back with the vengeance!

------
irrational
We recently moved from Oracle to Postgres. We had thousands of queries written
based on the way Oracle handles NULLs and empty strings. It took us the better
part of a year to rewrite all of them to the Postgres way. I am so glad to be
off of Oracle.

------
altitudinous
I miss my past Oracle career, I've diagnosed this "= NULL" rather than "IS
NULL" in so many broken queries, slow queries because of the way Oracle
indexing handles NULL.

There is a lot of discussion in this thread about whether this implementation
of null checking in Oracle is appropriate, analysing it, but the current
implementation is just fine, it has been tested by time.

The internet does tend to rehash the same arguments over and over!!! The
internet forgets. I remember these arguments 20 years ago.

~~~
lisper
[Ignore this comment. It was posted by mistake. I'm only leaving it here for
the historical record.]

> the current implementation is just fine, it has been tested by time.

No, it isn't "just fine". It is broken. Just because something has been broken
for a very long time and has spawned an entire industry devoted to dealing
with the fact that it is broken does not change the fact that it is broken.

~~~
altitudinous
Do you have substantial experience with Oracle? or are you just blindly going
on what everyone else says?

There is no mention of outer joins in this thread, no mentions of the ability
to minus results of one query from another which are _basic_ constructs which
handle many of the issues that are discussed here. It says that the people
here are inexperienced with Oracle. Everyone here trying to resolve issues
using inner joins. Inexperience.

If people here had experience, not only would these topics have been
discussed, but the real issues with NULL would have been discussed, one of
which I mention in my previous post.

~~~
lisper
Sorry, I made a mistake: I thought I was responding to a different comment.
(I'm currently on a very slow internet connection.) The "broken" thing I meant
to refer to was Oracle's conflation of null and the empty string. But I think
I hit the wrong "reply" link. Sorry about that.

------
michannne
Another one is MIN and MAX ignore NULL values, which make for some interesting
rollback scenarios.

I also swear I have seen a gotcha involving UPDATE WHERE IN and not throwing
an error where it should have, which is why I always quadruple check my update
statements, but I wasn't able to reproduce it and couldn't find any
information online. I haven't seen the issue in so long I forgot what it was,
but it would update all rows in your table even if your WHERE clause was
proper.

~~~
robocat
Also OR/AND can return non-null results even if NULL is one side of the
operator:

    
    
        (NULL AND 0) gives 0
        (0 AND NULL) gives 0
        (NULL AND 1) gives NULL
        (1 AND NULL) gives NULL
        (NULL AND NULL) gives NULL
        (NULL OR 0) gives NULL
        (0 OR NULL) gives NULL
        (NULL OR 1) gives 1
        (1 OR NULL) gives 1
        (NULL OR NULL) gives NULL

------
pjdorrell
Theoretically NULL means "unknown" value. As it happens, most business
applications do not have any requirement to deal with "unknown" values. These
applications are only interested in acting on requests where all the required
data are provided by the person responsible for entering the data. For
example, when I transfer money from one bank account to another, the amount of
the transfer can't be "unknown", the sending account can't be "unknown", and
the receiving account can't be "unknown".

These same applications do have requirements to deal with empty values.
Sometimes an empty value means "I haven't yet entered this value in the to the
UI". But in that case the UI won't let you submit the form until you have
supplied a valid value.

In other cases an empty value is a valid value. For example, "who is your
spouse?" and the answer is "I'm not married".

Sometimes NULL represents "irrelevant", like for "who is your spouse?", where
some of the records in the table represent people who can have spouses, and
some of the records represent other person-like entities that aren't actually
people and therefore they can't have spouses.

Given that NULL is _not_ being used to represent "unknown" values, and there
is a requirement to represent empty values, and you don't want to have a whole
extra column just to represent "emptiness", the most straightforward way to
implement empty values is to use NULL. So that is what happens.

And you have to remember to use "is" instead of "=" when you want to test your
empty NULL values for equality with other empty NULL values - because your SQL
database is pretending that NULL really means "unknown", and it doesn't want
to say that one unknown value is equal to another unknown value, because that
would be theoretically incorrect.

------
salzig
there is something "missing". The SQL spec specifies `null = null` to be
"unknown", where i sometimes expect "true". For MSSQL this can be configured
using `SET ANSI_NULLS { ON | OFF }`. AFAIK MySQL can't be configured. Don't
know about Postgres.

~~~
hobs
For what its worth, don't do this - pretty much all db code and practitioners
expect three valued logic, not two.

~~~
salzig
until you have to work with a database created by an insane guy. Never needed
it outside of that one project. (edit: small hint: composite primary key where
parts can be null)

~~~
hobs
I have nothing to say to this but simply "I am so sorry."

------
hanche
NULLs in subselects do bite me with distressing regularity: Writing

    
    
      SELECT ... FROM ... WHERE blah NOT IN (SELECT foo FROM bar);
    

getting no hits until I slap my forehead and add WHERE foo IS NOT NULL to the
subselect.

------
kords
DynamoDB, which is NoSql, also doesn't accept empty strings. But at least,
Oracle automatically converts the empty string into NULL, comparing with
DynamoDB which would actually fail the query.

------
paulryanrogers
With some columnar databases NULLs are 'free' because they are a default,
absent state or compressed away. Can be another reason to prefer them with
very large datasets.

------
lesserknowndan
In MySQL, NULL values are useful when using CONCAT_WS (concatenation with
separator) or GROUP_CONCAT because NULL values will be ignored - so you don’t
get e.g., “one,,two”.

------
Andromeda88
I was dealing with NULLs whole day on MySQL workbench. It wasn't considering
int as NULL value. Needed to make all empty cells 0 to be able to import data
properly.

