
On null - ingve
https://www.sicpers.info/2018/05/on-null/
======
cubano
After 35 years of hearing debates and suppositions on null in C, C++, and
<insert modern programming language here>, I feel that "null discussions" now
metaphorically model null itself, as they share all the same features...

reference undefined state ... check! contain no useful information ... check!
causes confusion in programmer ... check! are defined differently in each
instance ... check!

I think I could go on, but perhaps you get my point.

~~~
forgottenpass
>After 35 years [] I feel that "null discussions" now metaphorically model
null itself, as they share all the same features...

I think this goes the same for a lot of unfortunate design decisions
throughout computing that like to get bikeshed with the benifit of hindsight.
Like, yeah, sure, there's this foot-gun. And, yeah, we've probably found
better options in the intervening years, or ways to use the feature more
delicately where it is the right option.

But hot takes make for better conversation than reasonable discussion. Even on
Hacker News, the person I want to hear from most is the one rolling their eyes
and closing the tab without taking the time to write a comment.

~~~
AdieuToLogic
I was one of the "eye rolling" crew, but was struck by your comment. So below
is my perspective regarding null's as learned over several decades and
multiple programming languages/paradigms.

Conceptually, null is a sentinel value representing nonexistence within a
value domain. Memory pointers/object references are the archetype of null's
use, but as others have pointed out SQL's null support is another. The key
takeaway, though, is the concept of "sentinel value." For a (pardon the pun)
non-null example, consider the sentinel value "-1" in the domain of integer
file descriptors returned by the "open" POSIX function[0].

Sentinel values, by definition, put the onus of discovery on the code which
receives it since they can be the result of any computation which can produce
a respective domain value. This complicates usage but must be done for correct
programs. As a result, it is very common to normalize sentinel value discovery
through the constructs available in the language being used in an attempt to
minimize defects related to "missing a sentinel."

In my experience, this naturally led to modelling the expectation of sentinel
values in its own container type, one where the cardinality is either 0 or 1.
Common names for a container with 0/1 cardinality are "Option", "Maybe", and
"Nullable." Again, the key takeaway here is that these types in this context
serve to normalize the detection of sentinel values wherever a domain value
can be produced.

Depending on the language used, "lifting" sentinel values into an option type
might be not worth the effort. But the handling of sentinels must always be
addressed.

0 -
[https://en.wikipedia.org/wiki/Open_%28system_call%29#C_libra...](https://en.wikipedia.org/wiki/Open_%28system_call%29#C_library_POSIX_definition)

------
GolDDranks
Null definitely has a meaning, as missing value, encoded often by Optional
types. What many of the programming languages got wrong is that is NOT a
subtype of all of the other types. (It's not a bottom type.) Subtypes are
required to fulfill the same interface as their supertypes but null fails at
this because it doesn't have the same minimum set of fields, it doesn't
respond to the same methods etc.

The only way I can think of making this sound and well-typed, is that null
supports _any_ fields and _any_ methods, returning always null. This doesn't
fit well how many of the languages _actually_ do.

~~~
jakear
That is what Objective C provides (the author here calls it “viral null”). It
works, but I think a better approach is the Swift way. In this a nullable type
is precisely an Optional, as you stated, and they don’t respond to anything
besides “isNull?” and “get” (the latter crashing the program if the former
returns true). This way when you need to say “I might not know”, your type
signature will express that, and the type system will guarantee everyone
abides by it.

~~~
greiskul
Agreed, the swift way is the correct way. For cases where you do want the
Objective C behavior, you can just use the ?. operator. This way you get
compile times guarantees by the type system, and where you actually want to
use the "viral null" you can, and it's very explicit for people reading the
code what it's doing.

------
jpz
It's clear that null has semantic and syntactic meaning.

If one cares to make the analysis, one could say that it's messy encoding of
the None of the maybe monad, e.g. at the call site you need to do some
reasoning, but it effectively encodes Some / None.

It simply does not communicate anything to me to hear someone say "logically-
speaking, there is no such thing as null" \- such a statement really does
appear to be word games. It has no explanatory power, and it cannot ever be
categorically true.

~~~
wellpast
> It simply does not communicate anything to me to hear someone say
> "logically-speaking, there is no such thing as null"

I've seen tons of complex and unwieldy code in my day. Often the complexity
comes from illogic and incorrect modeling of how things are. So when someone
points out a logical unnecessity, for a programmer to dismiss it as
meaningless is to miss out.

(In some logics,) there _is_ no such thing as null and therefore you don't
need it's representation to write effective programs. That's actually a
valuable thing to say. Because from there you can start asking questions like
why _does_ it exist here or there? And then you can proceed to write more
reasonable, less complex programs.

I've had to fight ill-devised null semantics too many times to count in my
programming life.

To get concrete, there are things we know and things we don't. Most often null
is used as a placeholder to say "I don't know." This leads to complexity
because I'll then have to write code to branch on this one other value case
(or at minimum defend, like the article points out). But if it's not logical
per se, then what _else_ can we do. What other way can I program that more
reflects logic and our reasoning processes? That's a path to writing better,
more maintainable programs.

One revelation I got when having to model data in NoSQL after years of SQL is
that while SQL makes you decide and write down NULL values for entity
properties, NoSQL did not. If you don't know it, you don't write it down. It's
that simple. And that is much more in line with the way that I contend with
the things I don't know in my life. I don't go around re-iterating my lack of
knowledge by referring to some logically non-existent value. I generally
simple don't _say_ things I don't know. Following this reasoning to a degree,
and then I am writing programs that are much more ergonomic to the way people
_actually_ think.

~~~
jpz
When one writes code which is close to the processor, for instance, one has a
reason to write in C, one can represent an unallocated memory structure with a
null pointer, or one can use a boolean value.

A null value means something in that context. I don't see how it means
nothing.

If I have a tail-recursive function, I probably do not need null. But if I
have a loop which may run 0 or more times, and need to catch the absence of
runs, I need to model a None result.

In that instance, the null has semantic meaning. I can use an algebraic type
of course, but for the task I am doing, that may not be practical.

Saying that null logically doesn't exist is not true if my context is the
Linux kernel. The unqualified statement of such seems to be overly
categorical, to me.

I do like your analogy with NoSQL, I think it's an excellent point.
Nullability in a JSON document for instance is better modelled with absence -
however, with a language which is compile-time type-checked, those absent
values may need to be present in materialised classes as nullable values
(options if you like) - unless you're dictionaries all the way down like
Python.

------
triska
Regarding the following quote from the article:

" _A statement either does hold (it has value 1, with some probability p) or
it does not (it has value 0, with some probability 1-p; the Law of the
Excluded Middle means that there is no probability that the statement has any
other value)._ "

This holds for example in classical logic, but it is not a universal principle
in logic. For example, the statement does not apply to _intuitionistic logic_
:

[https://en.wikipedia.org/wiki/Intuitionistic_logic](https://en.wikipedia.org/wiki/Intuitionistic_logic)

The fact that (A ∨ ¬A) is _not_ a tautology in intuitionistic logic reflects
the limits of logical proof: By Gödel's first incompleteness theorem, if a
consistent formal system is sufficiently powerful, then it can express
statements that can be neither proven nor refuted within the system.
Intuitionistic logic lets you reason about such situations.

If you are interested in learning a programming language where you can
conveniently express "anything at all" without even a trace of "null", a
_logic programming_ language may be suitable. For example, in Prolog, a
logical variable stands for _anything at all_ , and as such is truly a
variable as in predicate logic.

------
jakear
What’s the point? Yes Null can have meaning. Even from a purely mathematical
standpoint it increases the entropy of a value, and thus provides more
information. I don’t think anyone is arguing that.

But if you don’t explicitly expose that additional entropy/information to the
programmer and type system, the meaning vanishes. If every Value can have this
additional type, we lose the semantic understanding we would get by seeing
that a particular method returns a “bool?” rather than a “bool”, and instead
we are forced to either assume everything is a “bool?” and add a bunch of
unnecessary checks, or assume nothing is, and miss out on whatever extra value
a nullable give us. This is again very uncontroversial.

------
ian1321
This seems pretty simple to me. You can either use null, and force checks
before any reference to a nullable value (of course, these checks are usually
skipped). Or, you can not use null and use something like Option/Maybe to
encode a missing value.

What this really comes down to is, do you prefer to have guarantees, or do you
think that "less code"/"less useless checks" is better.

In my mind, the only way to write robust, maintainable code is to not use null
and use Option/Maybe.

------
bjpbakker
The author disputes that logically there is no such thing as a Null value.

> I disagreed with the statement in the post that:

>> Logically-speaking, there is no such thing as Null

> [..]

> Boolean logic

> [..]

> It isn’t true [..] it isn’t false either. We need another value, that means
> “this cannot be decided given the current state of our knowledge”.

Null as used by many computer languages is not some logical choice. As in the
example, sometimes a value has possibly more choices than just True | False.
Logically-speaking that means the value has three choices: True | False |
Undecided.

Some computer languages let you encode this more convenient that others. Even
falling back to Null to represent the possibility isn't necessarily bad.
However, that doesn't make Null a logical thing, but just the representation
of this choice for a computer.

~~~
mjburgess
>However, that doesn't make Null a logical thing, but just the representation
of this choice for a computer.

In that sense, "True" as encoded in programming languages isn't a "logical
thing" either.

Programming languages are their own domain of syntax/semantics, which can
encode other domains (eg. mathematics).

The type, Trivalent = { True, False, Null } is a perfectly reasonable encoding
of a system of logic.

~~~
Y_Y

        enum Bool 
        { 
            True, 
            False, 
            FileNotFound 
        };

------
beat
Way back in the 1990s, I had to deal with a null-vs-0 bug in C. It was caused
by the Sybase API for BLOBs, along with some careless programming by someone
years before me.

Sybase, at least back then, stored BLOBs as 8k chunks. So when creating a row,
if you passed NULL (in C) for the pointer to the data to store in the BLOB,
then the BLOB became null. But if you passed 0 as the pointer, then Sybase
allocated 8k and stored an empty value.

For years, we had stored data in BLOB fields, sometimes empty. We'd never
noticed the 8k allocation into empty fields, because it was swamped by the
ones with real data. But we pulled the data out into files and added a file
name field to find it (for performance reasons), and adjusted the size of the
databases, because disk was expensive back then!

I had implemented the transition, and had argued for removal of the BLOB
column entirely, but our cautious lead dev rejected it, out of fear the
missing column would break other things. Testing showed everything working
perfectly. Then we put it into production...

...and saw that we had about four hours before we ran out of rapidly-filling
database space. And we could not gracefully roll back, for business reasons.
So I had four hours to find and fix the bug.

The real problem here wasn't NULL itself. That was a reasonable way to
communicate a state to Sybase. But other problems made it a problem. First,
the C programmer's habit of treating 0 and null pointers interchangeably. They
are usually but not _always_ equivalent! This bug had actually been sitting
latent in our system for five years at that point - we just never noticed it.

The other problem was the lava flow antipattern, not removing a clearly
unnecessary database field out of fear of downstream repercussions. That one
bothers me more, and to this day, I try to be good about not keeping
apparently-dead code out of fear.

------
blackbrokkoli
Sorry if off-topic, but regarding the original post this if refering to:

Does anyone else immediately lost interest in the article at the word
"mansplaining"? Like, why do we need tp bring this in a discussion about logic
and theoretical cs? Also if you post something in a public manner and people
state their disagreement, that's not mansplaining...

~~~
rootbear
The author of the original article does appear to be male, and I also found
his use of "mansplaining" off-putting. I think he should have used a term like
"nerdsplaining" which doesn't gratuitously bring gender into the discussion.

~~~
forgottenpass
If someone wanders into a group of nerds to take an "I'm right" stance on a
nerdy topic, what do you expect? No matter how delicately the responses are
phrased, the underlying "well akshully" is going to show through.

------
SomeHacker44
> Either approach will work. There may be others.

This was a pithy summary. Yes, all the languages discussed (Ruby, ObjC, Java,
Haskell) are all indeed Turing-complete and hence can express what any other
one can express.

What the article didn't discuss is ... basically everything there is about the
"nullness" discussion going on in the software engineering community.

I use nullable languages a lot (Ruby, Java, C#, Clojure, Common Lisp, etc.)
and null-free languages a little, but for a long time (Haskell, more recently
others). If I absolutely want to make correct code without writing more tests
than code and using extensive code coverage tools and other external support,
I'd rather do it in Haskell. Then I don't have to worry about nullness except
when I want to, and then the type system handles checking that I properly
checked for it.

------
masklinn
> If we know that the set has a maximum cardinality of 1, then we have
> reinvented the Optional/Maybe type: it either contains a value or it does
> not.

unit has a cardinality of 1, it has a single value (). Option types have a
cardinality of 1 (if you ignore the sub-value): an empty value and a full
value. This can trivially be shown by the bijection between boolean and unit
option.

> In an implementation language like Java, Objective-C or Ruby, a null value
> is supplied as a bottom type

1\. Ruby's nil is just a singleton instance of NilClass, and it's not a
subtype of "all other types in the system" anymore (or less) than Fixnum or
String is.

2\. a bottom type is empty by definition, null can't be a "bottom type" since
it's a value (of sorts). `void` would be a bottom if it were something
approaching a proper type.

------
_bxg1
I wholeheartedly agree with the author that representing "nothing" or
"unknown" is essential. I read the previous article and had my feelings put
into better words by this one.

But I think - beyond having wildly inconsistent notions of null - many
languages do a much worse job of dealing with null than they could, which is
why it causes so many errors in the wild. Java is obviously the archetypical
example. It sounds from the comments, though, like Objective-C and Swift and
Ruby have interesting strategies which I didn't know about.

------
bgongfu
There are as many kinds of nulls as programming languages. Cixl [0] doesn't
derive null from every other type, instead Nil and all other types derive Opt;
which means that user code may specify Opt to accept nulls or any other type
to have nulls automatically trapped.

[0] [https://github.com/basic-gongfu/cixl#optionals](https://github.com/basic-
gongfu/cixl#optionals)

------
asavinov
Formally, null can be defined as empty tuple NULL=⧼⧽ as opposed to normal
values which are non-empty tuples like ⧼5.0⧽. It is analogous to having empty
set ∅={} and non-empty sets {5.0, 6.0, ...}.

    
    
        TUPLES       SETS
        NULL=⧼⧽       ∅={}
        combinations collections
    

Obviously, there is duality between tuples and sets but currently there is no
theory which would be based on this property by treating them as equal
constructs (tuples are defined as ordered sets by simply ignoring their
duality to sets).

Since variables need to allocate some memory to store their values, it is
necessary to introduce a mechanism to deal with empty values and then somehow
interpret them.

There was another discussion of nulls some time ago:
[https://news.ycombinator.com/item?id=17028878](https://news.ycombinator.com/item?id=17028878)
(The Logical Disaster of Null)

------
moron4hire
I really like C#'s optional Nullable<?> monad for value types. I wish I
trigger a mode to allow reference types to be non-nullable by default and
still work with Nullable<?>

------
zaarn
I quite like how PG treats Null values in it's SQL, as described in the blog
post; undecided. It allows me to express some very interesting properties in
systems.

Suppose we have a permission system for some webapp. With using Null as
permission to access a user profile we can express three end states; Return
403 Unauthoried (FALSE), Return 404 Not Found (NULL) and Return 200 OK (TRUE).

That way a user not logged in or with no specific permissions will not know a
profile exists. However if they have been added to the deny list they get a
deny error. Otherwise they see the profile.

It's quite neat in that way.

------
emilfihlman
This is an issue of encoding.

Live in C (or assembly) and there are no issues with what is NULL.

------
cousin_it
Maybe the solution is to ban null pointers, but give every type its own
default value. That sounds weird, but hear me out.

There are two kinds of types:

1) Exposed data like numbers, strings, structs or containers. These will often
go on the disk or wire. Best practice says such types shouldn't have any
required parts, otherwise updating the code in production becomes hell. For
example, see how Protocol Buffers V3 bans "required" fields and uses default
values everywhere.

2) Opaque objects with behavior. Best practice says such objects should be
mockable, which makes them de facto nullable, because you can write a mock
that behaves like null (throwing an exception on every method, or even looping
forever).

This is the same as the old distinction between types with deep equality and
types with pointer equality (or no equality operation), or the distinction
between serializable and non-serializable types, etc.

So the idea is that we shouldn't have null pointers per se, but we should
allow programmers to construct a default value for any type, which could be
spelled like {} or null. Its implementation should vary depending on type. For
data-like types the default value should be something like 0, "" or [], and
for behavior-like types it should throw an exception on every method. That
would work in both static and dynamic languages and cover all uses of null
that I've seen, in both production and test code, without losing any
convenience or semantic niceness.

