Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Null References: The Billion Dollar Mistake – Tony Hoare (2009) [video] (infoq.com)
48 points by quickfox on May 30, 2016 | hide | past | favorite | 79 comments


One of my absolute favorite things about Rust is that nulls can't be used to circumvent the type system. They must be explicitly accounted for via an option type in all cases. It's very hard to go back to a language that doesn't do this after getting used to it. I get so frustrated by constant null dereference errors when that entire category of error can be avoided.


I'd rather have explicit seperation of concern for UB-causing behavior than have a generic monad approach.

Perhaps the examples I've seen in Rust just don't do that, because part of the appeal of Rust is that you sort of don't have to.

That makes the cognitive load acceptable and documents all the UB prevention in source code. It's a habit arrived at from using formal instantiation protocols that were driven from external to the device being made - starting with RFC1695.


The problem isn't null itself, but that many languages use nullable types by default.

Null is a damn useful construct and while there may be better solutions to using a null reference they're often not as clear as using null or worth the time investment to develop properly.

So yea, I generally try to avoid null's and completely understand how they've caused a lot of problems, but I'm not willing to say we shouldn't have them or use them when appropriate.


You don't just not use nulls, you replace them with something like Optionals. With a bit of language support, they can compile down to nulls but provide a layer of type safety and convenience functions.


It's perfectly fine to use nulls in languages that encode them in their type system, e.g. Kotlin.


Exactly. The problem is not that there is a null, it is that it is a valid value for all types (e.g. you can pass null to a function that expects a String).

Allowing nulls explicitly, like with a "String?" type, is the way to go in languages where Option is not an option.


Isn't it, in practise, basically the same thing though? The theoretical difference is this:

1. Option is a regular data type/object that represents "a collection of at most one value". It's similar to a list, set, heap, tree or whatever other collection types you have, except it can not contain more than one value.

2. With explicit nulls I guess any data type (e.g. Integer) will automatically get a clone of itself (a different type named Integer?) where also null is a valid member.

From a purely theoretical standpoint, I like #1 better.


In practice, it's not the same thing.

Here's a Java example where I want the type system to enforce that a method in `ClassB` can only be called from `ClassA`. However, the fact that `null` circumvents the type system makes this pattern just wishful thinking.

    class ClassA {
        private static final Witness witness = new Witness();

        final static class Witness {
            private Witness() {}
        }

        void callClassBMethod(ClassB classB) {
            classB.onlyClassACanCallThisMethod(witness);
        }
    }

    class ClassB {
        void onlyClassACanCallThisMethod(ClassA.Witness witness) {
            // ...
        }
    }


I think Option[String] and "String?" are semantically both a union type–they both effectively add one possible value to the underlying type, along with a constructor for naturally extending functions on that type–Kotlin's "?" operator is equivalent to Scala's map or Haskell's fmap, etc.


  Option[Option[String]] != String??


My impression, though I'm not sure, is that Kotlin simply won't allow you to do a double "??". Scala etc. will technically allow Option[Option[Something]], but in practice you would almost never want to use it, and can easily avoid it with flatMap.


The whole point of this was to show that Scala's types preserve the structure of the computation.

It might not be very interesting in the Option[Option[String]] case but imagine Try[Either[String, Int]] or List[Future[Double]].

It's a very important distinction.

Collapsing cases is one of the primary thing why exceptions sometimes get a bad rap, and Kotlin (and Ceylon) do the same with ? (and |, &) at the value level.


Option[Option[String]]? Is this a Church Numeral?


The downside of `Option` is that it's a wrapper, and as such, you need to `flatMap` (or similar) whenever you want to access the wrapped value.

By encoding `null` in its type system, Kotlin lets you manipulate these values directly which leads to code that is much less noisy and just as safe.


Not really.

The main strength of the first approach is that Option is only one type out of many error-handling structures.

Not every error is handled appropriately by Option/?.

If you have a language like Kotlin where they hard-coded one way of handling errors, it feels very unidiomatic to pick a better fitting error handling type, while in languages where errors are handled by library code, it's a very natural approach.


> Not every error is handled appropriately by Option/?.

Which is expected since these two constructs are not aimed at handling errors: they manage missing values.

> If you have a language like Kotlin where they hard-coded one way of handling errors

No, no. `?` is not for handling errors.

Kotlin is as agnostic as Scala for managing errors: you are free to use exceptions, dumb return values or smarter ones (`Either`, `\/`, `Try`, ...).


Yeah, it's just that-if you look at every language ever designed-if the language ships with a built-in construct developers will use and abuse it on every occasion, and every other approach lingers in obscurity.

> Which is expected since these two constructs are not aimed at handling errors: they manage missing values.

Which is a very small part of handling errors in general. As Kotlin offers special syntax for only this case, developers tend to shoehorn many errors into the "missing-value" design to get the "nice" syntax even if a different approach would have been more appropriate.

> Kotlin is as agnostic as Scala for managing errors: you are free to use exceptions, dumb return values or smarter ones (`Either`, `\/`, `Try`, ...).

That's not true in practice:

Just have a look at funktionale: Despite providing almost the same as Scala's error handling types (partially due to the blatant copyright violations) almost nobody uses it. This is a direct result from having a "first-class" construct in the language: It turns library-based designs into second-class citizens.

That's the thing Scala got right, and many of the copy-cat languages got wrong.


> Which is a very small part of handling errors in general

Missing values are not errors.

If you look up a key on a map and that key is not present, it's not an error.

> partially due to the blatant copyright violations

Uh copyright what? On an API?!?


> Missing values are not errors.

Call it whatever you want. ? only covers a small subset of interesting "conditions" while tremendously hurting "conditions" which could be handled in a better way.

> Uh copyright what? On an API?!?

Implementation. The copying of slightly buggy exceptions strings makes it even more obvious that files were copied verbatim with just enough syntax changes to turn Scala code into Kotlin code while replacing the original license and authors with different ones.

PS: Feel free to comment on actual the points I made.


> PS: Feel free to comment on actual the points I made.

Sure.

I think the idea that API's (or implementations as you said) can be copyrighted is completely insane and I can't believe any software engineer would be okay with it. Which makes me think you're not a software engineer, and that's okay, but please read up on the issues, this is super important for our profession.

I can't belive the US made that a law and it makes me sure that I will never want to move there.


> I think the idea that API's (or implementations as you said) can be copyrighted is completely insane and I can't believe any software engineer would be okay with it. Which makes me think you're not a software engineer, and that's okay, but please read up on the issues, this is super important for our profession.

I think you are super confused here. This is not about APIs. Copyright is what allows software developers to enforce a license of their choice. Without copyright, the license is just a text file without meaning. I suggest you read up on the FSF's position on this if you want to have an example.

> Sure.

(Still waiting for you to comment on the points I have made.)


Eiffel was one of the very first ones to have that feature actually, around 2006 when the language was revised for the ECMA standard.

https://www.eiffel.org/doc/eiffelstudio/Differences%20betwee...


This is e.g. what rust does. Another advantage of this is that you can also use them with types that aren't pointers (e.g. chars, integers, enums).


I would like to read about this. Got any links?


Option<T> is just a normal enum in Rust. Enums can also can have parameters in Rust. Those are usually called tagged union in non-rust-speak. It's either Some(x) or None. It has a pretty easy definition: https://doc.rust-lang.org/std/option/enum.Option.html

Storing None for an Option<&T> as a null and Some(ref) just as the reference is an special optimisation that the rust compiler does. Usually you have a selector which tells you which variant of the enum it is.


  Tree *a = new Leaf();
  Tree *b = new Leaf();
  Tree *c = new Node(a, b);
  a.parent = b.parent = c;
null has other uses, e.g. as above for cyclically dependent initialisation.


First, to some extent, your code is more a demonstration of the weakness of imperative vs declarative code, since, for example, Haskell lets you declare recursive structures directly, since it doesn't enforce an order of execution where none is needed.

    data Tree = Leaf Tree | Node Tree Tree
    a = Leaf c
    b = Leaf c
    c = Node a b
But more importantly, there is no reason you can't do this with Optionals.

    Tree *a = new Leaf();
    Tree *b = new Leaf();
    Tree *c = new Node(a, b);
    a.parent = b.parent = new Some(c);


On a related note and at the risk of sounding obtuse here - if you didn't have "null" wouldn't you just have to replace it with something else?


Just to add my two cents to the discussion, Typescript is implementing Non nullable types [1]. I think i'd be convenient that any language being used for large projects would support this feature.

[1] https://github.com/Microsoft/TypeScript/pull/7140


Some functional language use non-nullable types by default. Haskell, ML, OCaml, f.ex.

Java has the checker framework that supports `@Nullable` annotations and will allow only those vars/fields to contain `null`.


Why are option types that force you to check the value for null good, but checked exceptions are frowned upon?


The arguments I've heard against checked exceptions are that they're hard to compose.

If you write the `send_mail` procedure you might only want it to throw `MailException`s of various kinds. But if you use a TCP procedure inside, you'll have to declare `send_mail` to also throw `ConnectionError`s, thus revealing its implementation. To correctly hide/abstract over the implementation, you have to internally catch any `ConnectionError`s thrown by the TCP procedure, and re-throw them as `MailException`s. That's a lot of manual work that shouldn't be needed.

Another similar problem is what someone else mentioned: if you don't actually know which specific exceptions your method can throw until runtime, what do you declare?

----

Note that this is also the case for some "null alternatives". Sure, the `Option`/`Maybe` type is easy to compose, but as soon as you start inserting error information in there (with an `Either` type) you run into some of the same problems. This is acknowledged by language communities where that is practised, and some of them prefer unchecked exceptions to `Either`-style types for that reason.


Unless there's a full discipline built into the system for addressing TCP errors that your "mail" thingy can rely on, you'll need to address it.

And how can you not know the specific exceptions you need to handle? Don't you need to test all of those?

I am sure you didn't mean it to, but this reminds me of meetings where people say "oh, we don't have to worry about that. TCP is reliable."

Twitch :)


In your example, unless your send_mail procedure is generic over transport in some way, surely its use of TCP is an inherent part of its behaviour, not an implementation detail?

You're going to have to handle them as TCP errors anyway, even if they're lifted to a different type, so why not just throw them as they are?


A couple reasons off the top of my head: It's usually easier to read and reason about code that has fewer possible control flows. Exceptions are an alternative control flow outside of the normal one, and you have to deal with them at any level of a call stack that can trigger them somewhere further down, maybe wrap things in try/catch/finally to make sure that this alternative control flow does not break the normal cleanup (resource release, etc.) that your code does. Or in the case of checked exceptions, at the very least add 'throws' clauses, which unfortunately in some cases leads to a leaking of implementation detail - e.g. it's not uncommon to have low level code that throws exceptions (e.g. IOException if the hard drive blows up) and high level code that is where you actually deal with it (show the user a popup to tell them that their hard drive blew up) - a checked exception forces any intermediate layer of code to also deal with the fact of this exception, even if all they do is to add a throws statement - refactoring the low level in a way that adds or removes a checked exception now involves trudging through every layer of the call chain to deal with this, even if there's only really one layer at the top that actually cares. Making the exception unchecked means this isn't forced, but the alternative control flow might still mess things up. An algebraic data type like a Maybe or Option (or other 'nillable'), unlike an exception, follows the normal control flow, is explicit about the fact that the value might be missing (similar to when an exception is checked), but it's up to each layer of intermediate code to decide if that should be considered exceptional or not (like an unchecked exception) - if all they do is pass the value on, then they can be written exactly the same.

TL;DR: Unlike a checked exception, an Option type doesn't break the normal control flow (easier to reason about, not as prone to not cleaning things up) and allows each layer of code in a call chain to do as much or as little error-handling as it deems appropriate.


Because the type systems of most languages that have checked exceptions cannot properly accommodate them when higher-order functions are utilized (i.e. "I throw everything that f throws except E, and also everything that g throws except T").


The difference is that currently, there's no way to express "this (reference type) variable really cannot be null", which makes it painful. It's the equivalent of having "throws Exception" on every method, declared for you implicitly.


> It's the equivalent of having "throws Exception" on every method, declared for you implicitly.

You imply that this behavior is unreasonable, but that's the approach Scala takes (all exceptions are unchecked), and a lot of users seem to like it. So I think the parent's question still stands.


That's an interesting point, and a fair question.

I didn't actually want to imply that lacking checked exceptions unreasonable. I was merely trying to paraphrase isthe null problem in terms of checked exceptions, which the person I was replying to seems to favour. It was just an attempt to create an "aha" moment, showing what kind of pain is created by allowing null references.

A proper answer would probably have a lot more substance to it, but in short I get the feeling that null-free programming is a lot easier to achieve in general than exception-free programming. In fact, in most cases, not allowing nulls seems to be the implied default, yet it's hard (or impossible) to declare this explicitly (in most languages, like Java, etc). It just happens to be an extremely common problem that can be solved in a nice way, unlike exception checking in the general.

TL;DR: The added safety that is given by allowing the type system to reason about nulls - when looked at in the light of the amount of boilerplate code that it creates - compares favourably to checked exceptions.


Currently in some (well, unfortunately many) languages. C++ references have always been non-nullable, and in Scala the only way you should get nulls is from Java libraries.


Parent means something closer to pointers (in C++) when they say "reference type". In other words, if a type contains null, then we must always check locally that values of that type are not null before it is safe to use them (or manually maintain contracts).

With option types, we check whether it is empty once, and if it is not, we unwrap it and use it as if it were never nullable. Any function that uses the unwrapped value downstream is oblivious to the fact.


C++ references can easily be null (T& foo = *ptr_i_thought_was_good;).

But they provide a useful social convention: if it is a pointer then it is your job to check for NULL. If it is a reference, then it is the job of the other guy to ensure it is not null. And the rules of the language mean that null references (as opposed to pointers) are rare.


Not without an UB


So in other words my comment should have said that: in C++ it is easy to create something that is either a null-reference or some other, much more unpredictable, undefined thing.

Well then! That just goes to show how much safer C++ references are than I had previously supposed.


If you never dereference a null pointer, then there will be no UB of that type.


I guess it counts as dereferencing. Here is what standard says

"A reference shall be initialized to refer to a valid object or function. Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior"


They preserve the structure of the computation.


I posted a very closely related article about this last year (http://mattdavey.logdown.com/posts/259807-avoiding-the-billi...) - it's C# focused but the idea should carry over to similar languages. It summarizes 5 strategies for dealing with nulls, rather than just jumping straight into monads.


I think the section dealing with the Optional monad sort of glosses over their composability, which is one of the primary reasons we use monads. Multiple functions that all return an optional can be composed; for instance, if we had

  Optional<WidgetFriend> GetWidgetFriend(Widget widget)
and WidgetFriend had an Optional<FriendName> property, we could simply write [1]

  from widget in _widgetRepository.FindById(widgetId)
  from friend in _widgetRepository.GetWidgetFriend(widget)
  from name in friend.Name
  select name;
and get an Optional<FriendName>, without having to repeat all the tedious pattern matching of Some or None for every call. If any of those calls returns a None, the entire evaluation short-circuits and we get a None result at the end.

Of course, we can still write

  Optional<Widget> widget = null;
which is the problem with not fixing this issue on a language level.

[1] You'd have to implement Select and SelectMany, instead of Map and Bind, to get query syntax like this.


I think you'd like C#'s linq syntax and null coalesce operator. While obviously they don't get away from null, they go a long way to making this type of syntax and null safety much easier to implement.


So that's the famous Tony Hoare. Nice trip down memory lane there.

Who was the guy at the end from the Erlang community? (The one who drew a chart with the axes useful/useless versus unsafe/safe that he claimed he got from Simon Peyton-Jones?)


I have a question about null being able to subvert types, using Java an example, if I write foo.upperCase() and foo is a string, the type check succeeds but will produce an NPE at runtime. Why doesn't javac catch that foo is not set i.e is "null." Why is null above type checks?


Because whether foo's value is null is for most cases only detectable at runtime precisely because java doesn't have the capability to express disjoint type sets in it's type system the way that say Haskell or F# do. This inability means that the programmer can't tell the compiler what code paths produce null vs what code paths produce string. As a result the compiler has to assume that all codepaths could produce both.

There are some trivial cases where the compiler could figure it out for you but most of those cases are not very useful in practice.


Interesting, is there a formal name for this kind of type system that has the ability to look at the code paths and express disjoint type sets?

Is this a trend in newer languages in general or an FP specific thing?


Typically it's referred to as an Algebraic Data Type https://en.wikipedia.org/wiki/Algebraic_data_type. Any Type System that can express this type will be able to do the above.

It's not necessarily new since the ML family of languages has had it for a while now. It's also not limited to FP languages but it best known from ML the ML family. Rust has it though as well as Swift i think.


I see. Thank you for the follow up. This is really helpful. Haskell implements this data type which is why it has come up in this discussion. I need to spend some time with something from the ML family.


> formal name

non-shitty

> a trend in newer languages

yes, only discovered a few decades ago

:-D


I understand making a snarky comment in addition to some meaningful commentary but really two useless comments? Why even take the time to do so?


Sorry, was meant to be a light-hearted response.


Nulls are not a mistake. They are a trade-off as so many other things in language design.


> Nulls are not a mistake. They are a trade-off as so many other things in language design.

A bad trade-off – i.e., one where the downsides outweigh the upsides – is a mistake.

Nulls are not a necessity: Several languages demonstrate that there are better alternatives.


Benefits of nullable by default: -

Drawbacks: numerous

How is this a trade-off?


Null as a default is useful / a big simplification when it comes to reflection/meta-programming.

Consider this piece of pseudo-Java/Spring code and think how you would do it if the platform you were using forced you to either declare a and b as optional (which they are not) or assign them some value between the IoC container instantiates the class X and wires the values for a and b.

   public class X {
       @Autowired
       ServiceA a;

       @Autowired
       ServiceB b;

       @PostConstruct
       void init() {
           // a and b are instantiated by the IoC container and
           // are fully valid for the rest of the execution
       }
   }


Constructor injection?


Yes. But then you have to define an explicit constructor. Also X maybe auto wired into ServiceA or B making it impossible for the IoC container to create the intended object graph without splitting construction and injection into two passes.


1. Lombok @RequiredArgsConstructor?

2. Do you not consider cyclic dependence a code smell? At least I have always avoided cyclic dependencies.


I personally consider the example I gave as being a nicer way of doing dependency injection than the more verbose constructor based injection.

But in general yes; cyclic dependencies are a bad idea.


If they guy who invented them calls them a mistake then maybe they are, or were at least to him.


I was talking about them in a general context; for a particular language / a particular setting of course they may be a mistake.


You're gonna need nulls until databases stop supporting nulls. And everyone would need to use 6th normal form for that to happen, and nobody wants that


Not true. Nullable values in a database can be dealt with (and I would argue _should_ be dealt with) using a maybe/option[1] type. This is exactly what Slick[2] does, for example.

[1] https://en.wikipedia.org/wiki/Option_type [2] http://slick.lightbend.com/


Could you explain why you believe the presence of a NULL in the database necessitates a NULL pointer somewhere in the code?


I believe the author's intent is if you read a null value out of a database, how would you represent it in code?


I think it depends on where the NULL is and what it represents.

If NULL value is in a field that is used for JOIN-ing, I don't think you would "represent" the NULL value as much as you would simply have a lack of data. For example, if you had some set of results from a query that contained a JOIN on a field and some values were NULL, those records with the NULL value would not be in the result set.

If we do receive results with possibly NULL values I believe they could be either be represented with an appropriate zero value -- e.g. "" or 0 -- or with an optional type like another commenter suggested.


.NET uses DBNull.Value, which is not null.

Also 6th normal form?? What is that?



I work with an engineering application that uses a 5th/6th normal form database that is implemented in standard SQL server/Oracle DBs. It is awful and painful. A simple query with maybe five properties and 2 relationships takes upwards of 20 joins. Yet the database only has 4 tables...


You store each letter in the names of the columns of your tables in a different data center.


Null references to memory are what is being discussed, which is completely different from variant/option types. Changing databases has nothing to do with NULL references to memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: