
The Meaning of Null in Databases and Programming Languages (2016) - raaij
https://arxiv.org/html/1606.00740
======
carapace
Not to be flip but the crux of the issue is (I think I've got this correct):

[https://en.wikipedia.org/wiki/Open-
world_assumption](https://en.wikipedia.org/wiki/Open-world_assumption)

vs.

[https://en.wikipedia.org/wiki/Closed-
world_assumption](https://en.wikipedia.org/wiki/Closed-world_assumption)

Which means we're dealing with philosophy, no?

------
ken
> The relational null represents the absence of a value in a field of a
> record; whereas the programming language null represents one of the possible
> values of a variable.

This sounds like splitting hairs, especially as they never seem to explicitly
define what "value" means. How is it _not_ a value? It sure looks like a
"value" to me.

It's not just the occasional textbook that uses this nomenclature. In SQL
itself, you type "INSERT INTO mytable VALUES (NULL);" to insert a NULL. Did
the designers of SQL not understand what "value" means?

They also claim that databases are special in that they use "three-valued
logic" \-- which would seem to imply that NULL is a third _value_. Does
"value" have different meanings in different contexts? For the word whose
precise definition is central to their thesis, it has some remarkable
linguistic flexibility.

> Here is a partial list of the reasons why NULL can occur. 1. The field has a
> value but it isn't known... 2. The field has no value because it isn't
> applicable...

These are all reasons I've seen people use null in programming languages, too.

> The field is not allowed to have a value. For example, the root node of a
> tree has no parent node. The fact that the field is NULL in this case is not
> due to missing information.

Are they really trying to claim this is somehow unique to relational
databases? Trees work exactly the same way in programming languages.

~~~
nerdponx
I think I understand what they're trying to say.

The value of a variable with type "non-nullable 32-bit integer" is an element
of the space {-2 __31, ..., 2 __31 - 1}. The value of a variable with type
"nullable 32-bit integer" is an element of the space {-2 __31, ..., 2*31 - 1}
U {NULL}.

There's a "main space", and an extra singleton space tacked on, containing the
"NULL" element.

So NULL isn't a value in the "main space", and therefore in some sense "isn't
a value". I think it's wrong for the same reasons you stated, but I think this
is what they were trying to describe.

------
anw
This is a very interesting article, especially since I can now appreciate the
meaning, having spent the last year doing both SQL schema design and backend
development that works off that schema.

The main point here is that Language null is a possible value, whereas
Database NULL is a composite with several different meanings, which can often
trip up people who are not familiar with how some SQL systems treat it (NULL
<> NULL, etc)

The important takeaway from this article is that DB NULL can arise for a
variety of reasons (taken from the article):

1\. The field has a value but it isn't known.

2\. The field has no value because it isn't applicable.

3\. A field value could have been inferred but was not inferred, because of an
overriding requirement.

4\. The field has a value but it is not within the domain.

5\. The field value cannot be determined due to an exception.

As the author suggested, "NOT SET" does a good job carrying the meaning behind
what NULL is used to accomplish (I also think "NOT GIVEN" or "NOT AVAILABLE"
could work and keep the idea.

~~~
ken
I don't understand how those aren't also descriptions of Language null.
Haven't you seen programs that use null in those ways (and several others)? I
certainly have.

------
rskar
Fun factoid, VBA/VB6 dichotomizes between the "NULL" of value (keyword Null)
versus the "null" of reference (keyword Nothing).

    
    
        Dim x
        x = Null
        
        Debug.Print "x", x
        Debug.Print "x = x", x = x
        Debug.Print "x <> x", x <> x
        Debug.Print "x = 0", x = 0
        Debug.Print "0 = x", 0 = x
        Debug.Print "0 + x", 0 + x
     

(Yep, all of the above result in Null.)

    
    
        Dim y
        y = Nothing
        Debug.Print "y", y
    

(That one results in an "Object variable not set" error.)

~~~
perl4ever
VBA seems to have null, empty, "" and nothing, all of which have different
semantics. But only variants can be null or empty, I believe; if you have a
typed variable like a Date or a String, you're out of luck.

And then, I was using either Power Automate or SharePoint/odata and it turned
out "null" does not have the "normal" null semantics, and to test for null,
you simply check if something is equal to it.

I think I remember from using SQL Server that there is a option that you can
treat null either way.

Microsoft has a weird relationship with null.

On the other hand, I'm sure someone has complained about Oracle's idea that
empty strings are null values. And once you think about it, SQL's handling of
nulls is weirdly inconsistent, because the special handling of them doesn't
always apply in aggregates.

Somewhere I had a book by C.J. Date/Hugh Darwen in which I remember a rant
about how nulls are a terrible offense against the true relational model.

~~~
rskar
>But only variants can be null or empty...

That's true, only Variants can be Null (Null is a state of Variant); however,
Empty is translated to whatever default value makes sense for some of the
(non-Object, non-user-defined) data types:

    
    
        Dim s As String, d As Date, x As Single, y As Double, i As Integer
        s = Empty: d = Empty: x = Empty: y = Empty: i = Empty
        Debug.Print s, d, x, y, i
    

(The above compiles and runs, and - no surprise - numerics are made zero, and
string made "".)

>I'm sure someone has complained about Oracle's idea that empty strings are
null values.

I have certainly complained about that: I mean, what's not to love about
conflating "this field intentionally left blank" with "no certain value could
be obtained for this field"? /s

>...a rant about how nulls are a terrible offense against the true relational
model...

By way of Wikipedia, found this: [https://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-
TheAskewWall-pri...](https://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-TheAskewWall-
printable.pdf). A fine read. I am left wondering about how OUTER JOINs would
be handled without NULLs. I am now also wondering if it was OUTER JOINs that
made NULLs seem necessary.

~~~
perl4ever
"I mean, what's not to love about conflating "this field intentionally left
blank" with "no certain value could be obtained for this field"

Well, probably because Oracle was the first database I used a lot, it just
seems natural to me and Microsoft's distinction between the empty string and
null is annoying.

On the other hand, if I were to try to rationally defend Oracle's way of doing
it, it would be something along these lines:

You can subdivide the concept of "this field doesn't have a normal value" into
an _infinite_ number of reasons. So if you're not going to have _zero_ "null-
like" options, and you're not going to have _one_ , then where does it stop?
When you have two, or three, or four, etc. that seems like you've gone down
the wrong path no matter how good your intentions are. It vaguely reminds me
of the "zero, one, infinity" rule.

------
vivekseth
Although this is relevant for languages like C where NULL == 0, other
(typically more modern languages) treat Null similar to how databases treat
it. Ex: Swift, Kotlin, Rust

~~~
klodolph
This can’t be correct.

At least in Rust, null (or None) is not treated at all like how databases
treat it. If you have T: PartialEq<T>, then there’s an implementation for
Option<T>: PartialEq<Option<T>>.

With this implementation, None == None. The result is true. None != None is
false.

In an SQL database, (null = null) is null. Neither true nor false. (null !=
null) is also null. Neither true nor false.

This is an enormous difference, and it is basically THE gotcha for working
with null in SQL databases.

If you want to translate that into a "traditional" programming language, the
closest I can give you is Haskell, where you can think of SQL equality as
being normal equality lifted into the Maybe applicative functor. (If that
doesn’t make sense, you’re not a Haskell programmer, don’t worry about it.)

    
    
        sqlEq :: (Applicative f, Eq a) => f a -> f a -> f Bool
        sqlEq = liftA2 (==)
    

This has a generalization of the tri-value semantics.

    
    
        > Nothing `sqlEq` Nothing
        Nothing
        > Nothing `sqlEq` Just 4
        Nothing
        > Just 4 `sqlEq` Just 5
        Just False
        > Just 4 `sqlEq` Just 4
        Just True
    

I say “generalization” because this works in any applicative functor.

~~~
vivekseth
Thanks for the explanation, I forgot that null != null in databases. I was
thinking more about how null != 0 and is a separate concept in languages with
optional values. You’re right that optionals in these languages don’t quite
work the same as in databases

~~~
Sniffnoy
But it's not NULL != NULL. It's that NULL == NULL is NULL. That's different!
NUll != NULL is also NULL, for instance.

~~~
vivekseth
Maybe a concise way to describe it is that in databases NULL represents the
absence of a value. Since NULL is not a value, operators like == and != don't
really make sense on NULL, and also produce NULL as an output.

------
brummm
How the hell is there no proper pdf to read this on the arxiv?

~~~
recursive
Don't know much about arxiv, but reader mode in Firefox works pretty well
here.

