
What Is Null? (2010) - tosh
http://wiki.c2.com/?WhatIsNull
======
_hardwaregeek
Non nullability is probably one of the most important concepts I've learned.
People can talk your ear off about macros or about borrow checking or whatever
cool feature is in their favorite languages. But non nullability isn't a
feature as much as the removal of a terrible one: types being automatically
unifiable with null.

What does that mean? Well basically that every single (reference) type in most
languages comes with an implicit "and null". String in Java? More like "String
and null". Car object? Actually it's "Car and null".

Why is this a bad thing? Well null is a very specific type with a very
specific meaning, but because it's automatically included with every type,
people end up using it for a bunch of situations where a different type or
value would work. Let's take a simple parser. A naive implementation, upon
reaching the end of the string, might just return null. After all, nothing has
been found. But that's not a null value, that's an EndOfString value! The
moment you pass that value out of the context of the function, you need to
remember that null means EndOfString. Or maybe the string you're passing in is
a null value in the first place. It'd be tempting to return out null, right?
Except you've now lost information on whether the string itself was null, or
if something happened in the parse function that caused it to return null.

That isn't to say null is wholly evil. There's certainly uses for null. But
it's often way better to contain its use with Option or Maybe, essentially
wrappers that detail "hey, this value could be null". These wrappers are __not
__unifiable with regular values, which forces you to think about where values
can and cannot be null.

I totally understand if language designers want to omit features that they
deem unnecessary or overcomplicated. I get it if you want a language sans
generics or sans macros. But I don't understand keeping a feature that has
caused far too much pain and encouraged far too many bad practices.

~~~
WalterBright
If you don't have a null, then you have to set aside some value for the type
as the default value. With null, you get a seg fault if you try and use it.

Without a null, you'll need to create a special default value which performs
the same function - giving an error if you try to use it.

Floating point values have NaN for this, UTF-8 code units have 0xFF. The D
programming language uses NaN and 0xFF to default initialize these types. NaN
is good because it is "sticky", meaning if a result is computed that depended
on a NaN, the result is NaN as well.

Some people complain about this, wanting floats to be 0.0 default initialized.
But then it would be nearly impossible to tell if an unintended 0.0 crept into
the calculation.

> More like "String and null"

Null's replacement, Maybe and Optional, still have the extra check.

~~~
dbaupp
Another way to address that problem is to not default initialise at all.
There's then no need to worry about a default value for pointers, or about NaN
vs. 0.0 for floats.

In addition, null doesn't always give a segfault, as I'm sure you're aware. In
C and C++, using (dereferencing) null is undefined behaviour, and a segfault
is the best-case result. The code may have been optimised so that there's not
a direct dereference (which would segfault) but instead other operations that
rely on the pointer being non-null.

~~~
XMPPwocky
Not to mention "fun" cases like this (contrived example) code-

    
    
      // Read sparse array from file
    
      uint64_t len = read_u64();
      uint8_t *buf = malloc(len);
      while (more_data_remaining_in_input_file()) {
        uint64_t pos = read_u64();
        if (pos >= len) { abort(); /* hackers detected! */ }
        buf[pos] = read_u8();
      }
    

which will very reliably compile into a write-what-where primitive when passed
a gigantic `len`. malloc() fails and returns NULL (yes, it'll do this even on
Linux when virtual address space is exhausted), and nasal demons emerge
rapidly from there.

~~~
WalterBright
While this can happen, it's theoretical. In 40 years of dealing with null
pointers, I've never seen one happen where the offset is outside the protected
null page.

The reason is simple - very few allocated objects are that large, and for the
arrays that are, they get filled from offset 0 forwards.

The real problem with C is not null pointers, it's buffer overflows caused by
C arrays inescapably decaying to pointers, so array overflow checks can't be
done:

[https://www.digitalmars.com/articles/b44.html](https://www.digitalmars.com/articles/b44.html)

~~~
XMPPwocky
Out of curiosity, have you been trying to make things work, or have you been
trying to break things?

Because- here's the thing. I've been messing around with software security
stuff for 5 years or so, and I've seen exploitable bugs related to a pointer
unexpectedly being null twice.

There's a big difference between the kind of bugs you find "organically", when
somebody's trying to use the software normally, and the kind of bugs you find
when you're crafting (or fuzzing) absurd inputs that make no sense and that no
ordinary software would produce. Perhaps this is why I've seen more of these
bugs despite my much shorter career?

~~~
WalterBright
> I've seen exploitable bugs related to a pointer unexpectedly being null
> twice.

I've never heard of one, what I hear about endlessly are buffer overflows.

Can you give more information about these? I want to learn more.

~~~
XMPPwocky
Yeah, definitely! The one that comes to mind was in a game's texture loader -
because of players being able to use custom "spray paint" images, this was
exposed to untrusted code.

It unpacked mipmaps from a packed texture file into a malloc'd buffer, with an
option to skip the first N mipmaps. (If I remember correctly, it'd then go
back and upscale the largest mipmap to fill in the ones it skipped.)

Mipmaps were stored largest first, consecutively- so the first, say,
512x512xsizeof(pixel) would be the biggest mipmap, then you'd have
256x256xsizeof(pixel) bytes for the second-biggest one, etc, down to some
reasonable (i.e. not 1x1px) minimum size.

The issue came when a texture's dimensions were specified as being so large
that malloc'd fail and return NULL. Normally, this wouldn't be an issue
(besides a denial of service) - but by skipping the first N mipmaps, you'd
instead write to (where x and y are the dimensions of the texture)

    
    
      def addr(n, x, y, pixel_size_in_bytes=3):
        out = 0
        for i in range(n):
          out += x*y*pixel_size_in_bytes
          x >>= 2
          y >>= 2
        return out
    

By choosing x, y, and N carefully (I used a SMT solver to help) you could
overwrite a function pointer and get it called before the later upscaling
operation ran (since that _would_ access 0x0 and crash).

It's definitely a unique bug, but this sort of thing does happen in real code.

Making malloc() and friends panic or similar on failure instead of returning
NULL would fix most of these bugs- but it does sort of seem like the whole
idea of sentinel values and in-band signalling is hazardous.

Go-style 'f, err := os.Open("filename.ext")' has its appeal from that
perspective- you can forget to check "err" before doing things with "f", but I
assume the Go ecosystem has good tooling to catch that.

Also probably worth noting that arguably this bug _is_ related to C arrays
being just pointers wearing fancy pants- as long as you can't get a slice
where the pointer is NULL but the length is nonzero.

~~~
WalterBright
Yup, exploitable all right. Thanks for the explanation!

Usually, what I do is just make a shell around malloc() that checks for
failure and aborts the program with a message. Then, and only then, it gets
converted to a slice.

I'll still maintain, however, that buffer overflows are a couple orders of
magnitude more common. You case involves loading a file essentially backwards,
which seems awful obscure. When I load files piecemeal, I'm using memory
mapped files, not a malloc'd buffer.

------
rogual
One thing that intrigues me about null is that conceptions of null seem to
divide into two main families.

In the first, a type nullable(T) is constructed from a type T by simply
adjoining a "Null" value to it. So, a nullable integer type could take the
values {Null, 0, 1, -1, 2, ...}

In the second, nullable(T) is more like a "box" that you have to get the value
out of. This is what Haskell is doing with its Just & Maybe. Here the values
of our nullable int would be {Nothing, Just 0, Just 1, ...}

That's actually a fairly big difference. For instance, in the first model of
null, nullable(nullable(T)) = nullable(T), but in the second, those are
distinct types.

Another big distinction is that in the first model, nullable(T) is a superset
of T, where in the second, it's not.

I haven't seen anyone give a name to this distinction or talk about when the
first is more useful vs. the second.

~~~
kmill
The first is a union type, and the second is a disjoint union type. The second
is also known by the names of tagged unions, sum types, and discriminated
unions.

For the first to be useful, there needs to be some hidden disjoint union to
implement it (how else would you know it's an int vs the adjoined null?) but
the type equivalence rules would let unions collapse: union(T,Null,Null) =
union(T,Null).

Here's an illuminating example I saw once: consider a class with a member that
represents a value that is known or unknown, and it might take some time to
initialize the value. There are three states: uninitialized, initialized but
unknown, and initialized and known. This can be modeled by
nullable(nullable(T)). If the nullable collapses, you can't tell the
difference between uninitialized and unknown anymore.

~~~
BlackFly
I like having a semantic difference for async values (maybe not computed yet)
and for optional values (maybe no value).

Thus you get Future<Optional<T>>. It is of course helpful since the "Do this
when a value happens, do that when a failure happens" algorithms can be
encoded onto the future. Obviously you don't want people implementing busy
waits.

Edit: as to your overall point, I agree that this just cannot be modeled with
null. The intention is just not capturable.

------
dreamcompiler
Common Lisp addresses some of these problems out of the box. It returns a
second value from some functions to distinguish between "nothing was found"
vs. "A value was found, and it is nil."

CL also uses a special reserved value to mean "unbound" so it's always clear
when a symbol or instance is uninitialized vs. initialized with nil. It's not
possible for the programmer to portably see this value or assign it but there
are functions to find out if a container is unbound and set it to such.

Having said that, problems remain. I prefer to use maybe and error monads in
my CL programs now rather than just returning nil. That solves most of the
remaining issues.

------
SigmundA
I regularly need two types of "null" and both JSON and XML among others
provide it:

1\. Unknown value, the user whatever could not provide a value so it was left
blank, or it had a value and was purposely removed (as in an update), but it
was set by the other side specifically to null.

2\. The value was not not provided, that is it was not set by the other side,
missing property or element (undefined). This usually means do not modify the
value if doing an update, distinct from setting it too null.

~~~
ACow_Adonis
For the record, I get this all the time in my work and it's super frustrating
that none of the "science or stats languages" (with the possible exception of
SAS of all things) natively support multiple types of nil/missing data.

I often need not applicable, unknown, unsupplied, zero types, other,
undefined/ nonsense/error, theoretically knowable but not currently present,
missing, etc, depending on the context.

~~~
mindB
Also for the record, Julia has all of 0, Missing (indicating data that is
uknown), Nothing (indicating data non-existence), floating point NaN (as well
as +Inf and -Inf of course), and exceptions for actual errors in Base and the
Standard Libary. If you need more than that, user-defined types are just as
performant and relatively trivial to implement.

Not sure if you were including Julia in "science or stats languages", but
there it is anyway.

~~~
ACow_Adonis
Somewhat off topic, but my main problem with Julia is that my
colleagues/correspondents won't understand it, it's not installed anywhere i
need it, and my impression is they tried to make it MATLAB'y as though that
was a positive rather than a negative.

If I wanted a performant, compiled, solution that allowed me to program up the
answer myself that wasn't installed anywhere and everyone else couldn't
understand, I'd just cut out the MATLAB syntax and install SBCL lisp :p

~~~
mindB
If it's a helpful perspective to you, here's an economics researcher who had
been using common lisp for scientific computing and why he switched to Julia.
I found it helpful for choosing between the two when first selecting a
language for personal use:

[https://tamaspapp.eu/post/common-lisp-to-
julia/](https://tamaspapp.eu/post/common-lisp-to-julia/)

------
inlined
I’m sad this didn’t include the mongo query language. My favorite Mongo WAT is
the query {x: null}. It will return all documents where x:

1\. Is not in the document

2\. Is equal to the literal null value

3\. Is a list that contains a literal null value

------
Waterluvian
I ran into a very frustrating but unsurprising Null case last week: a value on
the client having three states: "not retrieved yet", "no value", and "value".

Javascript's ugly parts called to me, "use undefined!" but that's such a bear
trap. I ended up with a separate enum of the possible states for the value's
variable.

------
ben509
It's a trap!

You think it's safer to return null than raise an exception.

After all, the caller just needs to check. Of course, sometimes they don't.

So if you're _lucky_ it blows up in some completely unrelated part of your
application. Being null, it carries _no_ information as to where or what the
error was.

If you're not lucky, it's saved to disk. Now you have a silent null somewhere
in your data that blows up potentially months after the actual error.

~~~
heavenlyblue
Well... No null check is essentially a tech debt that you are going to pay.

The beauty of tech debt is that most of the time the universe is responsible
for making you pay up for not making the right decisions in the first place.

Sometimes you get lucky (we find other ways of dealing with the issue)...
Sometimes you don’t.

------
H8crilA
The thing which is not.

In most mathematical logic systems "false" or "⊥" has the nice property that
anything can be derived from it (⊥ -> p, for any sentence p). I find it funny
that the undefined behavior of _null_ dereferencing works the same way in C -
literally anything can happen to the program, so the compiler is free to
assume (derive about the current state of execution) anything it wants.

~~~
silasdavis
Anything can be derived from p or not p, not from false.

~~~
thegeomaster
This is just a corollary of the fact that "p or not p" is false (for any value
of p).

~~~
pretty_lorelei
"p _and_ not p" is false, "p or not p" is always true.

~~~
thegeomaster
Oops. Went a bit on autopilot with that one.

------
verytrivial
I remember having a realisation regarding null in SQL data models that is
probably obvious to other people who paid attention to the "History of SQL"
parts of the lecture, but was along the lines that tables are sets, and all
relations are sets, so where a relation has no value, you are really just
talking about the empty set, and there is only one of those. i.e. You can
remove all null "values" from your tables by normalising on that column --
null was just where the join is now empty. (But obviously don't actually do
that, it was more a thought experiment for a null-less nirvana.)

~~~
ben509
Also, there is a QL that treats all values as sets, EdgeQL[1], and they get
null for free.

[1]: [https://edgedb.com/docs/edgeql/overview#everything-is-a-
set](https://edgedb.com/docs/edgeql/overview#everything-is-a-set)

------
kissgyorgy
In Python we sometimes differentiate the different meanings like "whatever",
"default" or "not set yet" by defining a singleton object and comparing by
identity, because None would be acceptable value and we need to differentiate
the meaning. Examples:

[https://github.com/marshmallow-
code/marshmallow/blob/c1506cc...](https://github.com/marshmallow-
code/marshmallow/blob/c1506cc48edf927fa7074f237f951a11f04df7c0/src/marshmallow/utils.py#L35-L38)

[https://github.com/python-
attrs/attrs/blob/eda9f2de6f7026039...](https://github.com/python-
attrs/attrs/blob/eda9f2de6f7026039c3eee80297f0c95382d51d8/src/attr/_make.py#L67-L70)

------
trollied
I adore SQL NULL.

The problem is that people don't use it correctly & the empty string can be a
nightmare.

SOrry, not adding much to this, but it's a significant problem in the real
world. People often don't understand NULL.

~~~
Koshkin
There are many forms of non-existence. Maybe there should be more than one
NULL.

~~~
mr_toad
Since null doesn’t equal null and is also not not equal to null in SQL, there
are effectively unlimited nulls, all possibly different.

------
jmchuster
Great talk by Sandi Metz on the topic - Nothing is Something
[https://www.youtube.com/watch?v=OMPfEXIlTVE](https://www.youtube.com/watch?v=OMPfEXIlTVE)

------
riazrizvi
I consider null to properly mean 'a value outside some value domain', and nil
to mean 'zero value in some value domain', and both these values are useful,
and can be assigned to variables of that value domain. So C/C++'s definition
of NULL pointers is a historical transgression that has now stuck, as far as I
am concerned.

Examples: p=null-pointer should mean invalid adress, p=nil-pointer should mean
zero offset/address. l=nil-list should be empty list, l=null-list is invalid
list. So null-list could also be called undefined-list. c=null-character is
correct as a character, and works well as a terminator. s=null-string
undefined string, s=nil-string empty string "".

------
chewxy
> Nothing: In HaskellLanguage the other possible value of the 'Maybe'
> datatype.

Wouldn't null be bottom?

~~~
Ao7bei3s
No.

First, null is generally a value, not a type.

Second, think of bottom types as "does not return".

Taking C as an example: C doesn't have a bottom type, only void which is a
unit type (it has a single value, which is anonymous in C). However you can
tell the compiler that you _actually_ meant the bottom type:

    
    
        void panic() __attribute__((noreturn)) {
            while(1) {
            }
            __builtin_unreachable();
        }
    

Scala is also interesting, in that it supports all three (Null, Unit,
Nothing).

~~~
afiori
Bottom should rather be the empty type and one consequence of that is that is
you happen to build a x of type bottom then x sort of belongs to every type

~~~
Ao7bei3s
That's the formal definition, and an interesting corollary with interesting
consequences (from a typing perspective, a function of any return type can
always opt to halt instead of returning. note that in a lazily evaluated
language like Haskell the caller can still continue, as long as it does not
evaluate the result).

But I didn't go with that because it doesn't answer the question (is null the
bottom type? no.), and in the context of return types (the main use case,
though not the only) it's equivalent to what I said (no possible return value
-> cannot return).

------
Grustaf
Baby don’t hurt me

------
alkonaut
The two biggest PL design flaws are implicit conversions, and nulls.

They are mostly the same flaw.

------
enriquto
What ever happened to c2.com ?

It used to be one of the best sites on the internet. Now I cannot even read it
with and ad blocker.

~~~
wffurr
Works fine for me. You have to enable JS now tho.

~~~
enriquto
Not really. I am using umatrix, and after "enabling everything", all that it
shows is the spiral loading thingy.

~~~
jraph
Alternative way to access the content:

1) curl -s
[http://c2.com/wiki/remodel/pages/WhatIsNull](http://c2.com/wiki/remodel/pages/WhatIsNull)
| jq -r .text

or:

2) Go to
[http://c2.com/wiki/remodel/pages/WhatIsNull](http://c2.com/wiki/remodel/pages/WhatIsNull)

In the web console :

    
    
        document.body.textContent = JSON.parse(document.body.textContent).text
        document.body.style.whiteSpace = "pre-wrap"
    

\---

I hope one day we will have some kind of standardized markup language that
allows presenting text in a browser in a plug and play way… text/plain for the
most basic stuff would do, and we could add some kind of tags to have stuff
like italic / bold parts and titles, and links, too.

~~~
dreamcompiler
This "markup language" idea sounds intriguing. Perhaps we should form a
committee to design it, and then a consortium to standardize it. It could even
be promulgated World Wide!

~~~
wruza
Please add simple scripting for form validation.

~~~
jraph
You mean you want to run arbitrary code on a user's computer behind their
back? Sounds terrible.

Forming the committee does not need script validation.

~~~
dreamcompiler
We could run the code in a chroot environment that simulates the whole
machine. It would be a safe place to play. We could call it something cute
like "sandbox." Naturally it would be resource limited so code couldn't
consume all your memory or CPU cycles. That would be completely under user
control. People might want code to show ads, but the user could limit them to
100 ms of load time, 5 seconds of wall clock time, and no more than 10% of the
screen.

~~~
jraph
What browser would go to great lengths to implement such a sandbox to run a
code which the user does not care about and for which they have no use anyway?

People put "no ads" signs on their mail boxes. Do you really think they'd see
such a thing as a _feature_? Crazy stuff, no browser on their right mind would
shoot themselves such a bullet in the foot. People would immediately turn to
the competition, should such a thing happen.

What next? Document viewers weighting several megabytes? Containing a full-
fledged _garbage collector_?

I'd say let's keep things simple. People like simplicity, and stuff that works
and that does not freeze or crash every other hour.

------
paulddraper
Null is the absence of something.

Typically given as sentinel value; i.e. a union with the set of possible non-
null values.

Null references were called by their creator "the worst mistake in computer
science".

Those that share this belief prefer the more composable Option/Maybe pattern
rather then a sentinel value.

[https://www.lucidchart.com/techblog/2015/08/31/the-worst-
mis...](https://www.lucidchart.com/techblog/2015/08/31/the-worst-mistake-of-
computer-science/)

