
Null References: The Billion Dollar Mistake - wheresvic3
https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/
======
Matthias247
Out of all possible gotchas in programming languages I still find null
pointers the easiest one to discover and fix. You directly see when and where
it happens, and the fix is usally straightforward.

Compared to that invalid pointers (stale references) are a lot more painful,
since programs might continue to work for a while. Managed languages do at
least prevent those.

Multithreading issues are imho the biggest pain points, since they are
introduced so easily and often go unnoticed for a long time. The amount of
languages that prevent those is unfortunately not that big (Rust plus pure
singlethreaded languages like JS plus pure functional languages).

~~~
int_19h
You directly see where the _null dereference_ happens. But that's not
necessarily where the problem actually is, because a null pointer can flow
through a lot of code before it actually gets dereferenced. So "program
continues to work for a while" is also a thing with them.

In a language like C, a null pointer can also become an invalid non-null
pointer pretty easy with pointer arithmetics.

~~~
cylon13
Easy, just add "if (thing == null) return;" to the top of the function that
crapped out on a null reference and close the ticket! /s

~~~
int_19h
Sarcasm aside, this is often how it gets fixed in poor quality codebases! And
if it returns a reference itself, it ends up being:

    
    
       if (thing == null) return null; 
    

more often than not. Which leads to an even more entertaining debuggin trip
next time...

~~~
cylon13
Oh for sure. I wish I would have thought of that sarcastic quip by being
creative, but in reality it's because I've seen it so many times.

------
LorenPechtel
Count me amongst those who do not think they're a mistake. You need to
indicate no-data-here in some fashion. If you try to use that no-data in some
fashion having your program blow up from a null reference is a feature to me--
in the vast majority of cases it's better go boom than silently continue doing
something wrong. In the few where that's not the case you can trap the
exception and go on.

The real solution is what has been done with C# in recent years--have the
compiler track whether a field can contain a null or not and squawk if you try
to dereference something you haven't checked. That causes it to blow up in the
best place--compile time rather than runtime.

~~~
philwelch
I think it's perfectly fine to have option types, and that's exactly what
languages without null references end up doing. What null references end up
doing is accidentally turning _all_ types into option types and making it
impossible to have non-option types.

------
x3ro
This comes up again and again in one form or the other, yet new languages
still seem to be making the same mistake. Of all languages I've touched, Rust
seems to be the only one that mostly circumvents this problem. Are there other
good examples?

~~~
jrockway
I assume two reasons, efficiency and because an efficient implementation of
mutable state would have the same problem.

Right now, a single sentinel value makes a pointer null or not null (0x0 is
null, everything else is not null). This is exactly how you'd implement a
stricter type, like "Maybe". Encoded as a 64-bit integer, "Nothing" would be
represented as 0x00000000 and "Just foo" would be represented as 0xfoo. No
object may be stored at the sentinel value, 0x00000000. Exactly the same as
what we have now, and provides no assurances that 0xfoo is actually a valid
object.

Meanwhile, Haskell which "doesn't have null" crashes for exactly the same
reason your non-Haskell program crashes with a null pointer exception:

    
    
        f :: Num a => Maybe a -> Maybe a
        f (Just x) = Just (x + 41)
    

This blows up at runtime when you call f Nothing, because f Nothing is defined
as "bottom", which crashes the program when evaluated.

It's exactly the same as langages with null pointers:

    
    
        func f(x *int) *int {
            result := *x + 41
            return &result
        }
    

And the solution is the same, your linter or whatever has to tell you "hey
maybe you should implement the Nothing case" or "hey maybe you should check
the null pointer".

Where I'm going with this is that you need to develop entirely new datatypes
and have an even stricter type system than Haskell. Maybe Rust is doing this,
but it's hard. We all know null is a problem, but calling null something else
doesn't make the problems go away.

~~~
anderskaseorg
> It's exactly the same as langages with null pointers:

Four huge differences:

1\. You don’t need to pass around ‘Maybe a’ everywhere. If null isn’t expected
as a possible value (which usually it isn’t), you just pass around ‘a’, and
when you do use ‘Maybe’ it actually means something.

2\. The Haskell compiler can, and does (with -Wall), tell you that your
pattern match is non-exhaustive. You don’t need a separate “linter or
whatever”. This is possible because the needed information is present in the
type system, and doesn’t need to be recovered with a complicated and
incomplete static analysis pass.

3\. If you do this anyway, the error is thrown at exactly the point where
‘Maybe a’ is pattern-matched, not at some random point several function calls
later where your null has already been coerced into an ‘a’.

4\. This program is defined to throw an error; it’s not undefined behavior
like in C that could result in something weird and unpredictable happening
later (or earlier!).

Also, Rust optimizes away the tag bit of ‘Option’ under common circumstances;
for example, ‘None: Option<&T>’ (an optional reference to ‘T’) is represented
internally as just a null pointer, which is safe because ‘&T’ cannot be null.

~~~
jrockway
> You don’t need to pass around ‘Maybe a’ everywhere.

You don't need to pass pointers around everywhere. Languages with null still
have value types that cannot be null.

> You don’t need a separate “linter or whatever”.

Optional compiler flags count as "whatever" to me.

> it’s not undefined behavior like in C that could result in something weird
> and unpredictable happening later (or earlier!)

C++ doesn't define this, but the OS does (and even has help from the CPU).

Anyway, my TL;DR is that it's easy to have a slow program that passes
everything by value, or east to have a fast program that uses pointers or
references. Removing the special case of null is meaningless, because you can
still have a pointer to 0x1 which is just as bad as 0x0, probably. This goes
back to my original answer to the question "why don't more languages get rid
of null" which was "it's harder than it looks." I think I'm right about that.
If it were easy, everyone would be doing it.

~~~
temac
> Languages with null still have value types that cannot be null.

Not all languages.

> C++ doesn't define this, but the OS does (and even has help from the CPU).

That's not how it works anymore, because C / C++ front-ends interacting with
the optimizers are yielding too "optimized" results. See the classic
[https://t.co/mGmNEQidBT](https://t.co/mGmNEQidBT)

------
seventh-chord
"Making everything a reference: The Billion Dollar Mistake" is the talk I want
to see

~~~
dorfsmay
Everything in Python is a reference, and there's no null pointer issues.

~~~
auxym
I've certainly had some "None" errors in Python.

I think the difference comes from dynamic vs static typing. In Python, you
sort of get into the habit of "defensive" programming: checking inputs to your
function, catching Nones, etc.

In java, you tend to rely more on the type system. If it typechecks/compiles,
there's a good chance it's OK. That is, until you get a null value that's not
handled.

That's the root issue I think: If null is an acceptable value per the type,
then the same type system should force you to handle it. As do the type
systems in ML languages for option types, for example.

~~~
JakobProgsch
The first line is why I'm not 100% convinced of the severity of this mistake
compared to the alternatives. The problem fundamentally is the use of magic
values/numbers to represent the concept of "no value". You don't need explicit
language support to have that concept and the bugs it causes. I guess having
that as an intrinsic concept in the language makes it more likely that people
use it badly. On the other hand debuggers etc. also intrinsically understand
this and segfaults due to null pointers are usually very easy to localize once
you see them. On the other hand if a "bad programmer" introduced their own
magic non-value in a supposedly safe language, debugging that becomes way more
confusing.

~~~
andrepd
No, that's not the "fundamental problem". The fundamental problem is a type
system that lies. A "pointer to string" is not actually a pointer to a string,
it's a pointer to a string _or to nothing_. If your api returns a pointer of
the latter type, it should signal this by making the return type "maybe-
pointer to string" (although it has the same memory representation as "pointer
to string"). Then, if the user tries to dereference a maybe-pointer (that is,
to use a maybe-pointer as a pointer), the type system can statically catch
this and make it a simple type check failure compilation error. The user must
first check if it's null through a function that casts a maybe-pointer to a
pointer.

Nothing about this precludes the usage of sentinel values.

------
decafbad
C.A.R Hoare couldn't foresee consequences 55 years ago. That's a small
mistake. We should blame language designers who didn't bother to handle the
problem after it's been obvious.

~~~
littlecranky67
Lot of mainstream languages nowadays support non-nullable types, i.e.
TypeScript and C# (taken from F#).

------
rzwitserloot
This old chestnut again.

There is an inherent problem in designing processes and writing code to
capture them: The notion of not-a-value.

There are great many ways to solve them. The most common ones are 'null' and
'Optional[T]'. Neither just makes the problem magically go away. If a process
is designed (or a programmer writes it) thinking that 'ah, well, here, not-a-
value cannot happen', but it can, then.. you have a bug.

Some language features might make it possible to help reduce how often it
occurs, but eliminate it? I don't think so.

Imagine, for example, in an Optional based language, that you just map the
optional to a lambda to execute on the optional, and the behaviour of the
optional is to then simply silently do nothing if it's optional.none. That'd
be a much harder to find bug than a nullpointer error. (errors with stack
traces pointing at the problem are obviously vastly superior to mysterious do-
nothing behaviour with no logs or traces of any sort!).

Some other creative solutions:

* [Pony]([https://www.ponylang.io/](https://www.ponylang.io/)) tries to be very careful about registering when an object is 'valid' and when it isn't, and when you write code, you have to say which state the objects you interact with can be in. This lets you avoid a lot of the issues... but pony is quite experimental.

* In java you can annotate any usage of a type with nullity info, and then compiler linter tools will simply tell you that you have failed to take into account a potential null value. You are then free to ignore these warnings if you're just writing test code, or know better. Avoids clogging up the works with optional, but as the java ecosystem shows, you can't just snap your fingers and make 30 years of massive community effort instantaneously instantly be festooned with 'might-not-hold-a-value' style information. At least the annotation style gives the hope of being backwards compatible (to be clear, optional, for java? Really bad idea).

* in ObjC, if you send a message to a null pointer, it silently does nothing, in contrast to virtually all other languages with null types where attempting to message a null ref causes an error or even a core dump.

* Just write better APIs. Have objects that represent blank state (empty strings, empty collections, perhaps dummy streams which provide no bytes / elements, etc). For example, in java: Java's map (a dictionary implementation) has the `.get(key)` method which returns the value associated with that key, and returns `null` if there is no such value. About 6 years ago another method was added in a backwards compatible fashion (so, all java map implementations got this automatically): `getOrDefault(key, defaultValue)`. This one returns the provided default value if key isn't in the map. You'd think optionals provide a general mechanism for this, but, in scala, you have both: There's `someMap get(key)` which returns an optional, so to get the 'give me a default value' behaviour, that'd be `someMap.get(key).getOrElse(defaultValue)`, but maps in scala also have the java shortcut: `someMap.getOrElse(key, defaultValue)`. Sufficient thought in your APIs mostly obviates the issues.

null is not a milion dollar mistake. It is a solution to an intrinsic problem
with advantages and disadvantages over other solutions.

~~~
JoshMcguigan
The goal of `Optional[T]` is not to "make the problem magically go away", in
fact that is almost the opposite of the goal.

Optional[T] exists to make it very obvious when a value is nullable. Having
non-nullable types as default, with Optional[T], allows a developer to model a
system more accurately. This is helpful both to the compiler as well as anyone
else who reads/maintains that code.

> Imagine, for example, in an Optional based language, that you just map the
> optional to a lambda to execute on the optional, and the behaviour of the
> optional is to then simply silently do nothing if it's optional.none. That'd
> be a much harder to find bug than a nullpointer error. (errors with stack
> traces pointing at the problem are obviously vastly superior to mysterious
> do-nothing behaviour with no logs or traces of any sort!).

This is just one of the things a developer could decide to do when faced with
an optional which is none. It is up the language design to make it easy to
express this behavior (or any other behavior they might choose) without hiding
it.

~~~
strictfp
Sure it's up to the language design, but in practice a `None` gets a similar
treatment as an empty collection, usually effectively short-circuiting
remaining calculations. As the parent poster pointed out, this might either be
the behavior you want, or actually mask the error, depending on the situation.
By this logic, optionals aren't better than null refs, just different. The
same argumentation holds for exceptions vs optionals.

~~~
JoshMcguigan
In my experience, languages with strict non-null guarantees (and optional
types), do the exact opposite of "mask the error". If anything, they are
sometimes faulted for being too verbose.

The idea is, by explicitly marking things which can be null (wrapping them in
an Option[T], for example), you can be sure that everything else is not null.
This alone relieves the developer of a large cognitive load.

Further, the language can provide syntax to make handling optional types
obvious without being painful. Rust match statements are one example of this.

Can you provide a specific example of how using an optional type makes a
potential "missing-thing" type of bug harder to see?

------
mlangenberg
I was expecting someone to mention the Crystal programming language.

In Crystal, types are non-nilable and null references will be caught at
compile time.

[https://crystal-lang.org/2013/07/13/null-pointer-
exception.h...](https://crystal-lang.org/2013/07/13/null-pointer-
exception.html)

I certainly recognize that many bugs in Ruby programs announce themselves as
`NoMethodError: undefined method '' for nil:NilClass`. So to be able to catch
that before releasing code is a very welcoming addition in my opinion.

------
teh_klev
InfoQ has some gems, but their video content presentation is terrible (tiny
box, or full screen):

[https://www.youtube.com/watch?v=YYkOWzrO3xg](https://www.youtube.com/watch?v=YYkOWzrO3xg)

------
RickJWagner
No comment on the Null References, but I will say I _love_ the time-index
provided for the video. I wish every video had these!

------
olliej
Null termination is still easily much worse. At least the general case of null
dereferences today (less so earlier) is a page fault.

------
agumonkey
Should every domain have a Nil element instead ?

~~~
fhars
No, obviously not. Every domain having a Nil element is exactly the problem
null references have introduced (at least for the call by reference parts of
the affected languages).

~~~
agumonkey
Null is a single nil for all, I meant having a null per domain would force
people to think of what it means to have nothing in that field and handle it.
Maybe I'm too naive.

~~~
augusto2112
Sometimes you don't want to allow a value to be null at all, but with null
references you can't represent that at the language level.

~~~
agumonkey
But for numbers, a zero is not considered null, because it was handled in the
operators rules.

~~~
fhars
Numerical zero has nothing to do with the issue discussed here. What you are
proposing is to add another “number” to types like int and float that results
in the program crashing whenever you try to add it to another number.

~~~
strictfp
It does! It's the numerical "null object", just like the empty string and
empty collection.

------
microcolonel
Null references are not a mistake, they make perfect sense. Letting nullable
types be dereferenced directly is the mistake.

Null references are at the core of a great number of sensible datastructures,
and they're a natural fit for conventional computers.

~~~
int_19h
There are two separate concepts here that often gets conflated.

There's null reference in a sense of a special pointer value (usually all bits
set to 0) that means "this doesn't point to anything". That's a useful low-
level tool that allows for compact representation of many important data
structure.

And then there's null reference in a sense of type systems. To be more
specific, "null reference" here is really a shortening of "every reference in
the type system is implicitly nullable". And that is the billion dollar
mistake.

An explicitly nullable reference type that requires explicit check on
dereference, or option types, that use null pointers under the hood, are
obviously not the problem.

~~~
throwaway2048
To put it a bit more compactly, why is Boolean logic "True, False, Null"

