Hacker News new | past | comments | ask | show | jobs | submit login
The Logical Disaster of Null (conery.io)
115 points by matthewwarren on May 9, 2018 | hide | past | favorite | 287 comments



>I’m not sure Null should have a place in programming.

The essay is conflating 2 different concepts of "null."

The Tony Hoare quote is about null _references_ which is about aliases to computer memory. (E.g. the dreaded NPE or null pointer exception.) He's not talking about _logical_ nulls such as "tri-state booleans" or "missing data".

The logic-variant of Null usage for "unknown" "missing" "invalid" values is unavoidable in programming. This is why that null concept shows up repeatedly in different forms such as NaN in floating point, NULL in SQL language, etc. If you invented a theoretical language without logical Nulls, the users of your language would reinvent nulls using worse techniques such as homemade structs with an extra boolean "HasValue" field. E.g.:

  struct NullableInteger {
    int x;
    bool hasvalue;
  }
The "hasvalue" field becomes a "null by convention". Other programmers might not even code a verbose extra boolean variable and instead use (dangerous) sentinel values such as INT_MAX "32767" or negative value such as "-1" as the "pseudo null" to represent missing data. A lot of old COBOL programs had 99999 as some out-of-range value to represent missing data. We need nulls in programming languages because they are useful to model real-world (lack of) information. All those other clunky techniques will reinvent the same kinds of "null" programming errors!

The orthogonal aspect is making the type system more powerfully aware of nulls so that the compiler sees that possible null conditions were not checked. A compiler error forces the programmer to put in the explicit defensive code to handle the possibility of null values.


The point is that option types let you explicitly state which values are nullable, but null references mean that every value is potentially null even if it doesn't make sense.


And that having everything being 'implicitly nullable' is a categorically worse for maintainability, readability and comprehensibility than using explicit Option<T> types at the boundaries; such as the implementation in F# [1], with an expressive matching syntax.

It is now common when creating a library to rely on manual checking and coders explicitly writing code to check and throw NullArgument exceptions for each object parameter in every public method, and then writing dozens or hundreds more pointless unit tests to ensure this null exception behaviour works as expected; this is totally unnecessary in languages that demand explicitly stated Option<T> when an Option is meaningful.

Instead of requiring all the above for safety -an approach which will notably fail to report any errors if the null argument checks are partially or completely omitted- we have -and we should prefer- languages where the compiler or interpreter will not let us pass null into a function where null makes no sense, because that fact is encoded in the function definition.

At the library boundaries, which are comparatively rare to internal code, we can simply use the explicit Option type.

I should note I have one objeciton with the F# implementation - in that I think that ideally the properties { .IsSome, .IsNone, .Value } should not exist, as they promote bad programming practices.

[1] https://fsharpforfunandprofit.com/posts/the-option-type/


That is only the case in reference-centric languages such as Java, C#, JS, etc. In value-based languages, "values" can't be null, while "references" can.

For instance a `std::string` or a `double` in C++ can never be null ; a pointer to either can be but that's far from idiomatic.


That approach doesn't really work, because C++ conflates where a value is stored with whether the value can semantically be null. Sometimes you want a nullable value that's on the stack. Sometimes you want a non-null value in the heap. Both these things are hard to do in C++.


Neither of those is hard to do in C++. The first one is std::optional. The second one is a reference or a non-nullable smart pointer like https://github.com/dropbox/nn


It's easy to do _since less than a year_


And probably not in your compiler yet. Or if it is, you'll blow up on thousands of other compilers when people try to build it themselves.


boost::optional has been available since 2003. There's no excuse.


You don't need option types for the compiler to enforce safety around nulls. Its possible to model null as its own type in the type system to enforce safety.


In such a type system, is every variable and function potentially null, or does its type have to be declared as being potentially null (or not declared as not null)? Would the latter case not be, in practice, just like using option types, and would the former case require ubiquitous null checking?

My understanding so far is that option types have two advantages over the way current mainstream languages handle the null case: a) you can find the potential uses of null values statically, and also ensure that the programmer writes some code that at least nominally handles the case; b) you can avoid this burden in cases where null is not an option. Does null-as-its-own-type improve on this?


I mean the latter. And in practice, it's not just like using option types because it works well with flow typing. Being able to express your nil checking in ordinary control flow instead of special optional methods is often cleaner (think `raise "foo" if bar.nil?` as a guard statement and for the rest of the method body bar is no longer nil). All absolutely safe and checked at compile-time. If you then allow calling methods on nil you can even implement the "try" method that most Optional implementations have, which is pretty nice sometimes even when you have flow typing.

In Crystal we go a step further and allow type unions between arbitrary types (of which nil is just an ordinary empty struct in the stdlib which you can call methods on). I love how it works in practice.


This means you need a different type system though.


Yeah, its a good option for new languages though.


It's the same mistake as checked exceptions all over again. If you make null a special case in the type system then anything that interacts with the type system has to know about null or will handle it wrong. Far better to have the type system work with plain (non-nullable) values, and implement a plain old library type like Maybe/Option for use in places that need absence semantics; that way your concerns are properly separated and you can spend your type system complexity budget in more generally applicable ways.


Just because null is a type in the type system doesn't mean it has to be a special type.


That's the only way I've ever seen it implemented. If you're proposing some idea for having null in the type system as a normal type (that doesn't boil down to just being equivalent to Option/Maybe), can you be more concrete?


Being able to create arbitrary union types between different types, like crystal (which I work on). The difference between that and option/maybe is that it ends up playing nicely with flow typing without any special cases for optional/maybe in the flow typing.


You shouldn't need special cases for optional/maybe in the first case; optional/maybe are simple sum types, you would want whatever flow-typing system you have to do the right thing with user-defined "x or y or z" types, and if you do that then it'll automatically work just as well for optional/maybe.

(I don't like inclusive union types because they're noncompositional - code you write with them isn't parametric any more - which seems particularly awkward in a compiled language - how do you compile code that forms unions involving type parameters?)


You shouldn't be able to use a type without giving all the necessary type parameters, so you can't form a union type with unbound type parameters in so it's not a problem? I'm not 100% sure what you're asking...

I'm not 100% sure how you can integrate sum types with a flow-typing system, perhaps with pattern matching?


> You shouldn't be able to use a type without giving all the necessary type parameters, so you can't form a union type with unbound type parameters in

So inside the body of a generic method (or class) you can't form unions involving the method's type parameters? That works but makes them much less useful as a language feature - most language features work as normal within a generic method.

> I'm not 100% sure how you can integrate sum types with a flow-typing system, perhaps with pattern matching?

Whatever you do for union types should work, surely. Indeed it ought to be simpler since you have more information to work with - with a sum type if the thing is B then you know it's not A, whereas with a union type it's possible for the thing to be both A and B.


In crystal, the compiler always knows the concrete values of the type parameters of any generic before typing it. I really am interested in knowing more about what you said about union types being non compositional because I'm definitely not a type theorist. The crystal compiler wasn't written by type theorists and the typing algorithm is completely home-grown.

And regarding sum types I was just thinking syntactically instead of in terms of the type system.


> In crystal, the compiler always knows the concrete values of the type parameters of any generic before typing it.

Hmm. Does that mean you can't have a generic method in a separate compilation unit from where it gets used?

> I really am interested in knowing more about what you said about union types being non compositional because I'm definitely not a type theorist.

I'm not really a type theorist, I'm just thinking about e.g. if you have a method like:

    def doSomething[A](userSuppliedValue: A) = {
      val x = if(someCondition) Some(userSuppliedValue) else None
      x match {
        case Some(a) => y ...
        case None => z ...
      }
      x match {
        case None => v ...
        case Some(a) => w ...
      }
    }
then the compiler knows that if we take branch y we will also take branch w and if we take branch z we will also take branch v. Whereas if you do the same thing with an inclusive union type, that's true most of the time but not when userSuppliedValue is null.


We don't have compilation units, all code is compiled at once in Crystal. It's a little painful in terms of compile times with large codebases but nowhere near C++ compile times yet :) We're thinking about caching and various modifications to the crystal compiler to make compilation faster.

I'm confused by your example too. If you mean replacing matching None with testing for nil, then you'd have to replace Some(T) with an `else`, then the compiler absolutely can prove that if you take one branch you take the other - given that x is not assigned to (not sure why the compiler would want to prove that though). And after all that I'm not sure how your example relates to, or what you even mean by "noncompositional".


> I'm confused by your example too. If you mean replacing matching None with testing for nil, then you'd have to replace Some(T) with an `else`, then the compiler absolutely can prove that if you take one branch you take the other

If you use an inclusive union, the type of x becomes A | nil (or however you write it). The trouble is that you don't know whether A might also be a type that includes nil, and userSuppliedValue might be nil already. So if you write in match/case style your branching behaves inconsistently in that case (because x is both an A and a nil) - and if you write in "if(x == nil)" style then you can't infer that branch y will be taken when someCondition is true.

> And after all that I'm not sure how your example relates to, or what you even mean by "noncompositional".

What I mean is that viewing these types as type-level functions, they don't compose. For option/maybe-style types, composition works as expected: you can know the behaviour of X[Y[A]] by reasoning about X and Y independently. That doesn't work for inclusive unions, because whether A | nil is a different type from A depends on what type A is.


Ah! I understand completely now! Interestingly I've only very rarely noticed that being a problem in practice. I'm not quire sure what to say about it more than that because it's definitely a real problem but I haven't noticed it being a pain point in practice. Most type parameters in crystal are only used for collection APIs which never union the type parameter with nil.


Interesting. I imagine that's the Ruby influence - on paper at least, Crystal's featureset is almost ML-family, but sounds like the culture must be rather different.

Being able to move all your concerns into the type system is really nice, and the difference between a type system that does what you expect 95% of the time and one that does 100% of the time is huge - but I don't know any way to get there except gradually. I've been doing Scala for 8 years and it probably took 4 before I realised how important this stuff was even in the simple case of Option - and that's coming from a background of Java before then and already being dubious of things that step outside the type system.

Language inconsistencies matter, but only in large codebases, and by the time you have those large codebases it's probably too late to change the language. I think Scala gets a lot right, I think something like Idris gets more right. I can make the case for why those things are the right things in terms of theoretical purity/consistency. But when it comes to practical differences I don't know how to convince anyone other than by saying "go maintain a 300kloc system in an ML-family language for 5 years and see how much easier it is" :/.

Best of luck. I hope Crystal finds some things that work well, and that we all add something to the language design state of the art. I do think we're getting gradual progress and consensus - if nothing else, almost every new language has at least some form of sum typing and pattern matching these days - but I guess progress on the aspects that matter for big systems is inherently going to be slow, because it's only after we've built those big systems that we can see what's wrong (and what's right) with them.


Interestingly crystal has neither sum types or pattern matching (and it doesn't need them) :)

And yeah, nobody has any experience maintaining 300kloc+ systems in crystal. And i'm sure we get loads of things wrong and there are lots of pain points. I hope we find some type theorists to pick through the problems when that happens too, because crystal's type system is really quite different to anything I've ever seen before. And it would be nice to see what comes out of that theory wise.


> Interestingly crystal has neither sum types or pattern matching (and it doesn't need them) :)

Well, the use cases will be there, and any general-purpose language will need answers for them. One can of course achieve the same effect with other language features (e.g. Java's famous use of the "visitor pattern" for those cases), but there's a cost in verbosity and maintainability.


Crystal can essentially have a sum type by representing it as a union of different record types. It's not common to do in crystal though, at least not with 3+ union entries.

And pattern matching isn't really needed if you have flow typing + OO (at least I haven't found a usecase for pattern matching which crystal doesn't already have an answer to)


> The essay is conflating 2 different concepts of "null."

I'd take it a step further, and posit that The Fine Article's rage comes from its author not grokking the distinction, or even recognizing there is one.

Of course null is going to seem frustratingly and dangerously wrong if that's your concept of it.


Null is dangerously wrong. Languages that avoid it give a much nicer experience because you explicitly write your model considering which values can be missing.


And booom you just then reinvented a new concept of "missing" exactly as he said.

PS: To disembiguate I was speaking of the null a value in a nullable type. Maybe I got you wrong.

Also I guess Null is way less a problem in strictly typed language than in JavaScript for instance.


I didn’t reinvent anything, the concept of optional value is well ingrained in the real world. If you are registering on a website the email address is mandatory while the home address is optional. If you are sending a parcel the home address is mandatory and the email address is optional. If lazy programmers instead of modelling the optional concept in the type system use a null the result is the huge amount of null pointer exceptions that are riddling a lot of software and the useless “defensive guards” everywhere.


> If lazy programmers instead of modelling the optional concept in the type system use a null the result is the huge amount of null pointer exceptions

Null values and null pointers aren't fundamentally the same thing, though some programming languages may be implemented in a way such that use of one exposed the other. A language can have a Null value that never results in NPEs.


I think that's kind of his point - Optionals/tagged unions are "Null values", but only in the places where they semantically make sense.

The problem with nulls isn't strictly null references, in the sense of "a pointer that points to nothing" or "a pointer that throws a null pointer exception", it's that languages with nulls implicitly make null an element of every type. They tend to do this because variables are allowed to have null references, but if you changed the semantics of variables to technically forbid null references, but still had a null value as an implicit element of every type, you'd still have the problem of needing to do runtime checks for null everywhere.


But how do you handle those optional values then? Let's say you run a classifieds site, some of your users will probably prefer not to specify a price, it has to be an optional value. Or age field on a dating sites. How do you filter that data, or handle sorting on that field, without using null values to mark those that are undefined?


That has been done with option types in functional languages since at least the 80s. It's not hard.


e.g:

  sum = 0
  for entry in entries:
    case (val):
      null: pass
      number n: sum += n
    endcase
in pseudocode. Several languages have offered this since forever (eg. Haskell and Maybe).

The fun part is that the typesystem enforces that you check all possible cases -- or only lets you explicitly un wrap potentially null values.

So it's not like a C programmer checking if (x == NULL) manually.


Option types and comprehensive pattern matching in F#. If you just need to filter them you can simply use Seq.choose.


Or you write sunny day code and wonder why everyboy elses code is riddled with those strange defensive structures.


> If you invented a theoretical language without logical Nulls, the users of your language would reinvent nulls using worse techniques such as homemade structs with an extra boolean "HasValue" field.

Alternatively, you can only use languages which offer actual tagged unions, and avoid the ridiculousness of these 'hasvalue' fields.


Your comment seems to be missing the point of the parent. It wasn't saying that adding this field was a good thing to do. In fact, it relies on the extra field being self-evidently ridiculous -- it was saying that there's no way to avoid doing something.


The parent said any solution implemented within the language would be worse. Decades of experience with actual tagged unions would seem to indicate otherwise


>Alternatively, you can only use languages which offer actual tagged unions,

I tried to address explicit-null-handling programming concepts such as Option<T> or tagged unions in my last paragraph.

Nevertheless, it didn't seem like the essay was about non-nullable concepts but instead, he was writing about the idea of "null" as a data representation for unknown values to be fundamentally flawed because of various boolean operations strangeness.

My point is that "null" as a data representation mapping to "missing" is an irreducible complexity which is orthogonal to Option<T>. E.g. F# Option<T> still has ".None" which is a "null by another name". Yes, the programming language support is superior because it forces the programmer to handle possible nulls instead of crashing. Safety around nulls enforced by the compiler is progress. (The null handling as a verb assisted by compiler checks.) However, it didn't remove the need to represent a null as a state -- or null as a noun. Why do I separate those 2 verb-vs-noun concepts? Because he used examples with constants such as "10* null" returning a "null" instead throwing an error as violating math logic. And "10< nil" violating 2-value boolean logic. Therefore he plays with the idea that nulls in the language shouldn't exist. Let's rewrite one of his examples using variables instead of constants:

  x = Option<int>
  y = Option<int>
  x = null
  y = 10 * x
What should "y" be? The essay says that's not the question to ask. He's wondering why multiplication doesn't throw an error. The idea of tagged unions and Option<int> really doesn't address "data" / "mapping" / "noun" concept of null and how it fits into certain intuitions about math and booleans.

This irreducible complexity is why "null as noun" will get reinvented as "99999", "-1", etc if you don't have null (whether spelled as ".None", "Nil", or some other standard way). Even if you have an array of Option<int>s and want to serialize to disk, you still need to write "null as data concept" in some fashion (empty string, sentinel value, extra "hasdata" field etc).

As side note, I find it interesting that some commenters believe the author is talking about Optional<T> or similar concept and that I missed his point. I ask those readers to look at the essay again and confirm that it makes no mention of "Option", "Optional", "Maybe", "sum types", "tagged unions", etc. His essay seems to be based on the flawed concept of data representation of unknown values. A key sentence near the top:

>Logically-speaking, there is no such thing as Null, yet we’ve decided to represent it in our programs.


> Logically-speaking, there is no such thing as Null, yet we’ve decided to represent it in our programs.

But he's right. It shouldn't be an inherent part of any data. Sometimes, something not being present is a reasonable logical state to be in. For these times, there are tagged unions. In other cases, it is absolutely insane to simply add on the possibility of the NULL value to every data type in a language.


>For these times, there are tagged unions.

Again, tagged unions do not solve the author's examples of Aristotle logic puzzles and inconsistencies around null/None/Nil/Nothing/etc.

He's not talking about "type safety" enforced by static compiler checks, or opt-in nulls as the default vs opt-out, etc. Therefore, compiler enforced explicit pattern matching code on tagged unions misses the point.

I suggest you click on the Stackoverflow q&a link in the author's essay to get a better idea of what the his philosophical conversation is about.

>In other cases, it is absolutely insane to simply add on the possibility of the NULL value to every data type in a language.

That's a totally valid position but the author isn't talking about that. He's questioning the _meaning_ of "null" in light of observing (in his view) non-intuitive behavior with math and boolean operators.

Your "tagged unions" doesn't address his philosophical conversation at all.

My point is that if the author removed "null" from a hypothetical language because it's "weird", the users of such a hypothetical language would recreate the "null" again. This has nothing to do with discriminated unions or Option<T>.


I think the way C# handles it is actually good. Any operation with null should result in null. Throwing errors willy-nilly just constrains you into doing extra null-checks in dozens of places instead of only where it is needed.


I’m not sure that you understand how options work, tbh.

Your example would be nonsensical in any language I’m aware of.


It's just pseudocode. Whether one explicitly expands out the "x" as "x.GetValueOrNone()" or the type system implicitly converts/coalesces any "null" value of "x" to propagate the null to "y" or one has to explicitly write pattern-matching code is not relevant because none of those syntax details affect the author's thesis.


I find it weird that people keep making languages with static typing where all types can also be null instead of using an option type. Doing so nullifies a significant amount of the correctness checking static typing provides.

It makes some sense for interoperation when living inside .NET or the JVM, but this was a known problem when those systems were designed.


1. It's hard to support circular data structures in an imperative language without null pointers.

2. Restricting a language to non-circular data structures will not be popular.

3. Having a nullable and non-nullable version of every pointer or reference type is not usually super popular either.

Phrased differently: nullable types are actually a fairly reasonable compromise from a set of not fully appealing choices.


Circular data types are perfectly fine with optional types. You can have optional<*raw_pointer> for example which after compilation may even be equivalent to a simple null, since valid pointers are not-null.

This was discussed in rust some time ago: https://mail.mozilla.org/pipermail/rust-dev/2013-March/00330...


Having nullable and non-nullable versions of every type is fine if you have decent syntax for it. Take Kotlin for example:

https://kotlinlang.org/docs/reference/null-safety.html


You also need type polymorphism (which Java and most of the languages under discussion did not have/did not start with), or it will drive you crazy.

And you may start needing fancier versions of polymorphism to express things like "this accepts a T=Foo or Optional<Foo>, and returns Bar<T>".

In passing, I remain to be convinced that that x?.length/etc syntax (and equivalents in other variations of Optional) is really a good idea - to me, it feels like an error-prone way of papering over the fact that non-explicit null(aka absent-optional) checks are just too painful to program with. Phrased differently, I see no reason to believe that the sufficiently-common-to-warrant-special-support appropriate handling of null is just to feed it up through as the computation of the computation if it occurs in any part.


Well, not really. What Kotlin has is exactly an Optional type. It's just a different syntax for it.


There's one difference between nullable/non-nullable types and Optional type, which is nesting, right?

I'd assume that while Kotlin's String? is equivalent to Optional<String>, Kotlin has no equivalent to Optional<Optional<String>>.

Which, personally, I think I'd prefer Kotlin's approach, because that means you can do `maybeString = "exists"`, which is more readable than `maybeString = Optional::Exists("exists")`


Sure. In Haskell you join them, but just not allowing it perhaps better.

Also, you can still nest them in Kotlin, since you can have a nullable type inside generics.


Interesting, I haven't heard this argument before. So what is it about null that makes it more feasible here than for example option types/tagged unions?

Also, regarding your third point, IMO the question mark syntax in for example TypeScript and C# is fine.


Way back when when the JVM/CLR were being originally designed, I'm not sure that Option types as a concept were quite as proven, and both languages were trying to be "safe" languages that didn't have any complicated or new concepts in them. Optional types might not be that now, but they probably were then, when they were strictly in the domain of fairly academic languages.


I'm not entirely sure where or when it was introduced, the first mention of it I could find was May 1989 (pg 25)

ftp://ftp.cs.princeton.edu/reports/1989/220.pdf


Destroying a language's safety for a single use case is a terrible trade-off.

There are various possible solutions for circular structures that do not require destroying all static safety with nulls everywhere.


Do people keeping doing that? All the new languages I see seem to embrace optional.


Go is one of the worst/weirdest. It has semantics around uninitialized variables that most programs will rely on for correctness. And it also has nil.


And nils in Go might be different. That's even more weird.


Right, interface Nils compare on type too, right? This is bananas.


Go is bad in that it uses pointers to make something nullable, but i dont think the uninitialised non nullable values are zero is that bad. Otherwise uninitialised is another type of null...


  Doing so *nullifies*
pun intended?


I use dynamically typed languages the majority of the time, i never really saw the advantage of static typing. I do however rely on the database for typing quite heavily.


I also usually prefer dynamic typing. When I do use languages with static typing, I really want the compiler to be able to tell me that `foo` is definitely an Int and not Null.


c# has an option type. You can make any struct type nullable with a ?.


The issue is you can't make reference types non-nullable (yet, they're working towards that end for 8.0).


Those interested in how C# 8.0 approaches the problem might appreciate Mads Torgersen's thoughts on the matter [0] (relevant part starts at 5:40).

[0] https://channel9.msdn.com/Blogs/Seth-Juarez/A-Preview-of-C-8...


Do you know how they're solving the problem of the language expecting a `default(T)` to exist for every type?


You can get a pretty complete overview at https://blogs.msdn.microsoft.com/dotnet/2017/11/15/nullable-...

But essentially they're not solving it: you will get a warning on explicit default-ing of a non-nullable reference, you will get a warning on implicitly initialising/defaulting class fields, but you will not get warnings when doing so for structs or arrays.


Thanks. That looks quite interesting.


> The logic-variant of Null usage for "unknown" "missing" "invalid" values is unavoidable in programming. This is why that null concept shows up repeatedly in different forms such as NaN in floating point, NULL in SQL language, etc

As an aside here, there is also a school of thought that NULL in SQL was a mistake, see e.g. https://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without... and http://thethirdmanifesto.com/ (Hugh Darwen and Chris Date)


> All those other clunky techniques will reinvent the same kinds of "null" programming errors!

Worse, instead of blowing up and yelling "this is wrong, fix it!", they'll pretend to work and give you the wrong results.

I imagine if null is the "billion dollar mistake", then null never having existed - and relying on sentinels - would've been the trillion dollar mistake.


My company uses a very expensive and common accounting software to run the entire business. It uses a sentinel date value that represents no-end-date. That date is 2025-12-31. I suppose in the 80's when the system was first designed they never imagined that the system would still be in use in 2018. There is a slim chance we will have changed software by then but that chance gets smaller every day.


I imagine businesses plan ahead and make deals for more than 7 years. It's already too late if your sentinel value is within the range of your real values.


Is it a sentinel value or a sentinel range? There's nothing fundamentally wrong with e.g. having a service begin on May 1st 2028 (2028-05-01) and continue until forever (2025-12-31). The logic will only fail if you assume that 2025-12-31 and all subsequent dates are all equivalently forever.

Doing things that way means every time you check the value of a date, the very first check needs to be "is it the sentinel value?". But that's kind of the concept of a sentinel value anyway.


But there is something wrong with having some services actually end on 2025-12-31 and other services that are marked as ending on 2025-12-31 because they continue forever. Once 2025-12-31 is in the range of valid end dates, you can't tell the difference between the two.


> having a service begin on May 1st 2028 (2028-05-01) and continue until forever (2025-12-31).

That doesn't work as most of the program logic simple checks of a date is between two dates. It basically just treats 2025-12-31 as the end of all time. I suspect they will patch it before the end and change that date to something else.


>I imagine if null is the "billion dollar mistake", then null never having existed - and relying on sentinels - would've been the trillion dollar mistake.

The whole idea is that you don't have to rely on sentinels -- and with optionals and pattern matching we didn't have, ever since the 80s or so.


> The logic-variant of Null usage for "unknown" "missing" "invalid" values is unavoidable in programming.

On the contrary it’s perfectly avoidable. You just need to use an option type with a language that supports comprehensive pattern matching and won’t even compile if you don’t explicitly consider the case of a missing value. If your model is flawed and you don’t want to explicitly specify that some value is optional then the problem is in your way of thinking. And there is no need to invent a language that doesn’t support nulls, they already exist.


It's just another name for Null in practice. Or a "typed null" to be more precise. It's got some nice properties, but really it's the good old tri-state.

As the op says - is a "logic-variant of Null"


> It's just another name for Null in practice.

No, it's not.

> Or a "typed null" to be more precise.

To the extent this is true, it is a fundamental difference.

> It's got some nice properties, but really it's the good old tri-state.

No, it's not. Because the Boolean type contains only True and False, not Null and the Option<Boolean> type (likewise, Option<Option<Boolean>>, etc.) does not contain True or False. There's no weirdness in doing logic operations on an Option<Boolean>, or a pair of them, because it is a type error to attempt it because Option<Boolean> is not Boolean.


Sure, they're different things. The comment wasn't meant to be in vacuum, but as a response to option being a better logic 3-state.

I agree that for other reasons it's better, but conceptually they are used to represent the same thing. (Logic variant of null)


>really it's the good old tri-state

Yes it is, the point they make is that Option<Boolean> is explicitly tri-state in contrast to regular Boolean which should be two-state.


That style of "null" has a _lot_ of advantages over the common null found today. The perfect is the enemy of the good - Option/Maybe types aren't perfect, but they're a lot better than null references.


In languages with pointers, it's not really possible for the compiler to know when null needs to be checked for. Consider this fragment:

void f(char *x) {g(&x); printf(x);}

It's possible that calling g(&x) might cause x to no longer be null, even if x originally was null. It would then be inappropriate for the compiler to complain that the next instruction attempts to use x without checking whether x is null.


OP here - I was going to go into that - it's an interesting story. There are null references, as you say, but there's also the null object pattern and the null type. I decided to just focus on the idea of null, which all three of those things represent. A pointer points to a null space, a type represents a value that can optionally be null, and the object pattern allows you to deal with null in your program directly (like Ruby's nil).

So... yeah I dig it man :) I just left some things out to keep on a pace.


I feel like the idea of null is fine, but the implementations are idiosyncratic. Null is an unknown/unknowable/invalid value. You can't know if null = null. This should be an exception or other error. You can't know if 10 = null. Null is unknown. It might be 10, it might not. This should be an exception or other error. You can't have a meaningful result of 10 * null. A pointer to null is meaningless.

Languages which lack a way to signify an error other than null can be an issue. Most modern languages have option types and/or exceptions, both of which provide good ways to deal with error conditions.


>This is why that null concept shows up repeatedly in different forms such as NaN in floating point, NULL in SQL language, etc.

NULL in SQL is also problematic -- e.g. as it behaves wrt aggregations and such. Explicitly user defined unknown values are better.


The Tony Hoare quote is about null _references_ which is about aliases to computer memory. (E.g. the dreaded NPE or null pointer exception.) He's not talking about _logical_ nulls such as "tri-state booleans" or "missing data".

I'm struggling to understand what you mean here. My understanding is that there is no distinction between "logical null" and a "null reference" in terms of the problem we're discussing--as soon as you introduce nulls as a placeholder for values that are not yet initialized you have to deal with the logical implications of null being a member of those types, no? It's been a while since I watched the talk, but scanning the transcript from the talk we're discussing (https://www.infoq.com/presentations/Null-References-The-Bill...) the way Prof. Hoare talks about them seems to be the same in terms of their logical impact on the type system. I quote:

25:55 One of the things you want is to be able to know in a high level language is that when it is created, all of its data structure is initialised. In this case, a null reference can be used to indicate that the data is missing or not known at this time. In fact, it's the only thing that can be assigned if you have a pointer to a particular type.

...

27:40 This led me to suggest that the null value is a member of every type, and a null check is required on every use of that reference variable, and it may be perhaps a billion dollar mistake.

And another thing you wrote which I don't understand:

We need nulls in programming languages because they are useful to model real-world (lack of) information. All those other clunky techniques will reinvent the same kinds of "null" programming errors!

Isn't that exactly backwards? The entire point Prof. Hoare was trying to make is that using nulls to model the "real-world (lack of) information" causes us to have to contend with the logical problems inherent in making null a member of every type. Additionally, sum types seem to model this in a logically consistent way very well, as other commenters have noted.


Totally agree: further to that, if you had int+bool to represent nullable values, you'd also have to sort the ridiculousness of having a null value with a value, also error prone.


Another issue is that the concept of NULL is ill-defined, if you really think about it. There is usually no way to distinguish between use of NULL to represent (1) A missing value, (2) An unknown value, or (3) a default value.

These are not trivial distinctions; knowing the semantic intent of a NULL can either hamper or aid in reasoning about a code-base and avoiding subtle errors


Correct. The real evil is sentinel values.

That said, the memory layout of Maybe/Optional (assuming it can be stack allocated) will look much like what you outlined, though of course the API can be more user friendly.


> That said, the memory layout of Maybe/Optional (assuming it can be stack allocated) will look much like what you outlined, though of course the API can be more user friendly.

But it's really not. First of all, most Haskell compilers will optimize away usages of Maybe that are known to be Nothing or Just. Secondly, GHC at least uses tagged pointers for small sum types, including Maybe.


Rust will do optimisation on the layout of Option-looking data types to simplify them down to a pointer size if possible.


I agree, one of the nicest changes in programming for me is a plugin for intellij and PHP that does static analysis and flags "foo may be be null" via an inspection (it actually does a hell of a lot more than that, truly stellar plug "Enhanced EA Inspections" if anyone has a PHP codebase and uses intellij btw).

It catches so many stupid cases where I'd have call bar on foo where foo could be undefined.

TypeScript handles it pretty well given the constraints it has to work in.


I don’t think it’s conflating anything but trying to explain to you the difference between the two concepts. It’s saying that Null for all references types by default is a bad idea, and representing “missing” in a tagged union (sum type, whatever) leads to better code. Your worry about inconsistent type use for the same concept doesn’t really apply when combined with syntactic sugar for optionals.


Optionals are a better alternative to null. They compose better (i.e., they nest) and play nicely with data abstraction (i.e., you can define an abstract type that hides the fact that its underlying representation is optional), unlike null.


A correction, there are no "tri-state booleans". Booleans are two state logic systems. Extended Booleans require 2 ^ n states where n is an integer >= 1.

Any other number of states automatically ensure that the logic is non-Boolean or non-Extended Boolean.



Boolean has a specific mathematical/logical meaning and semantics. The moment you add a third state, it is no longer Boolean.

The interesting thing is that positive powers of 2 will also allow consistent extended Boolean logic and semantics.


I have a pet peeve whenever people bring up the old "Billion dollar mistake" quote. Calling it a mistake means that it could have been avoided. But we couldn't have avoided null pointer errors (or some other name for the same thing) any more than humans could have avoided the bronze age. In the 60s and 70s we didn't have the "technology" to avoid null errors. By technology I mean production-ready languages with a static typing layer sophisticated enough to implement a Maybe/Optional type. Those languages weren't production ready until maybe around the 80s. And most of those compilers were themselves implemented in unsafe-null languages like C.

But anyway, that's history, now we have nice languages that have safe nulls, so the interesting questions moving forward are: why are people still creating new languages that have unsafe nulls (looking at you Go), and why are people still choosing to use languages with unsafe nulls?


There were better languages but Unix won, and with that - C. A mistake we (whoever “we” are) repeated with JavaScript - we let the language win that was “there”, not the one we really needed.

Worse is better. Ease of deployment over ease of development.

As for the Q: why design languages with null (that is - incomplete or flawed type systems) now? I have o idea. I think the reasoning is that “worse is better” succeeded for JS, php, C, so it’s a viable path.

“Maybe an X” and “Definitely an X” are more different than string and number. If a language pretends string and number are distinct but at the same time has no distiction for “maybe X” - then it doesn’t have a very good type system.

Note that it doesn’t necessarily need to avoid null values for this. Non-nullables is mostly equivalent although less elegant. That is, “String s” means a string or null, while “String! s” means a non-null string (example from a C# vNext syntax).


> we let the language win that was “there”, not the one we really needed.

Different explanation of history: Something "wins" because it is precisely what is needed.


It does - but I think sometimes somethinb should have “won” that solved the short term problem 95% as well, but solves the problems over the next 4 decades better The Q is why individuals and companies would pay short term for gains to others.


Kotlin has null, but also nullability in the type system. They kept it in for interop purposes. Also, optionality is very common, so having it integrated into the language makes it more convenient and efficient.


The creator has a different story; he portrays it as very much avoidable:

"My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement."


There really isn't anything sophisticated about having sum types, ALGOL 68 had them in some form for crying out loud.


The real billion dollar mistake was nullable references in Java and C#. By then we should have known these things need an extra type-specifier. But Java was meant to be simple and easy.

References in C++ are non-nullable so the precedent already existed.

I hope C# gets around to solving this problem -- Microsoft is working on it -- but it's much harder to retroactively solve it.


> References in C++ are non-nullable so the precedent already existed.

Wrong analogy. References in C++ are not first-class entities. Conceptually, they're _alias names_ for existing objects. Unlike C# and Java "references", you cannot "reset" a C++ reference to "point" to another object. Because it's not a pointer. It does not "point". It's an alias. `void f(int& ref)` in C++ is the same as `void f(ref int x)` in C# (the equivalent doesn't exist in java). Inside `f`, you cannot change `x` to "point to some other int" because it's not a pointer!

C# and Java "references" are semantically the same as C++ pointers minus arithmetic.


Yes, C++ can have nonnullable smart pointers just fine if that is what you want. They will lose reset call, throw on assigning null T*, static assert on nullptr_t, require full initialization on construction, have maximum exception safety.

This could've been done since 2003 but wasn't. There are attempts to bring it into language as part of C++ Core Guidelines project.

This is possible because unlike Java and C# not everything can be null.


> C# and Java "references" are semantically the same as C++ pointers minus arithmetic.

But you can entirely write a shitton of C++ programs that won't use pointers at all - not even shared / unique. In contrast, you can't avoid C# and Java references.


Can you show an example of linked list data structure without pointers?


My C++ is rusty but you'd do it exactly like you do in ML. Something like:

    template<typename T> class List<T> {
      protected:
        List();
      template<typename U> class Visitor<U> {
        public:
          U visitNil();
          U visitCons(T current, U accumulator);
      }
      public:
        virtual <U> U visit(Vistor<U> visitor);
    }
    template<typename T> class Nil<T> extends List<T> {
      U visit(Visitor<U> visitor) = visitor.visitNil();
    }
    template<typename T> class Cons<T> extends List<T> {
      private:
        T head;
        List<T> &tail;
      public:
        Cons(T head, List<T> &tail) {
          this.head = head;
          this.tail = tail;
        }
      U visit(Visitor<U> visitor) = visitor.visitCons(head, tail.visit(visitor));
    }


Thanks, that neat. But how would you build a list of arbitrary length and free it later? You'll use pointers, one way or another. The only way to avoid pointers I could imagine is to use stack with recursion, but that's completely crazy and limited programming style.


In more than ten years of programming in C++ I never had once to implement a linked list. That's 100% irrelevant to programming practice. The only reason to implement one is whether it's for academic purposes, else just use `std::{s,}list`.


Using std::list which uses pointers is using pointers.


... no it's not. Else by that logic you'd be using pointers through any programming language.


You don’t even need fancy static type checkers to avoid this. You literally just have to not have a null and not allow initialization of variables without values. It is more work to add null to your language than to not have null.


>not allow initialization of variables without values

Then you've thrown in the towel when it comes to optimality. Not saying this is necessarily bad (maybe you're inventing the next Python), but acknowledge this isn't really an acceptable option for the next C.


Ironically you can make nullsafe methods in Go

    type X struct{}

    func (x *X) y() {}

    func main() {
    	var x *X = nil
    	x.y() // no error
    }
Also its not unsafe. Go doesn't allow unsafe memory access unless you use the unsafe package.

It is incorrect though. An Go does allow you to write bugs.


Why is null in Go unsafe?


    res, err := http.Get("wcpgw")
    defer res.Body.Close()

    ...

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x5ed9af]


Presumably because there’s no language-level way to guarantee a pointer to be non-nil at compile time, combined with nil dereferencing being a crash. Not unsafe in the security sense, but definitely unsafe in the “can I count on my program to run” sense.


> In the 60s and 70s we didn't have the "technology" to avoid null errors.

Yes we did.


I disagree with the author here, and while I admit that Null failures are a pain, it's not true to say they don't exist in other languages. "Maybe" and "Option" etc may force you to deal with them before they will compile, but that's not the same thing as saying the concept of Null does not exist in those languages. I'm sure there are some languages out there without any concept of Null, but they would have their own shortcomings for the lack of it.

It's also wrong to suggest that Null has no place in "logic". Boolean logic is one type of logic, but it's not the only type.

Finally, the examples of how Null works in various languages are really poor. "Why doesn't Ruby coerce nil into zero?" Because it doesn't coerce any types implicitly. Why should nil be the exception? How is that expectation "logical"?


I think you misunderstood something about the languages mentioned like Swift or Haskell. They do have a way to represent the concept of "nothing", or "missing" using an enumeration (called Optional in Swift, Maybe in Haskell).

What they don't have is the global concept of Null that sits at the root of the type hierarchy. This means every type in a language like C# or Javascript has to have Null as one of its members. If you want to define some operators/functions on the members of the type, have have to always consider Null as a member. Like the author listed: How do you compare Null, how do you negate Null? Those questions shouldn't even be asked because Null is not negatable or comparable. Maybe the problem is that Null shouldn't have been admitted as member of a type that you consider comparable or negatable.

So in Swift/Haskell, when you define a type you don't consider "nothing" as one of its members.

The mental model in Null-using-languages is that a Type is a sort of blueprint that can be stamped out to make instances of that type. From that point of view its natural to consider you may be missing an instance.

In language like Swift or Haskell the concept of a Type is closer to a set in math; in this case the set of all possible values of that type. This is why I said above that Null has to be considered as a member of a type. I’m using math language to re-interpret the type-as-blueprint world view, to show why it leads to illogical results.

Seriously, Swift demoting of the concept of “nothing” to just another enum is alone worth the price of admission. This has cascading consequences that lead to safer code. You also get simpler code as you strive to resolve the uncertainty of a missing value as early as possible in your code. Swift is the most impressive language I've seen in a while, but think its origins at Apple have overshadowed the its incredible technical value.


I think using an Option / Maybe over null has the benefit of the caller knowing that the method might not return something.

e.g.

    public Foo GetFooWithId(int id){...}
When I call that, I might get a Foo but what if I don't... Will it be null, is Foo a struct or a class? I have to know what Foo is and check if I need to check the result for null.

As opposed to

    public Maybe<Foo> GetFooWithId(int id) {...}
I know just from reading that I might not get a Foo back so I had better match over the result.


Have you used any language that has a type system powerful enough to support arbitrary sumtypes (and also has null as a separate type)? If not then I can recommend playing around some with Crystal.

IMO optimals are just a crutch to make up for a too weak type system.


Given a sufficiently powerful type system, you should still err towards an Option type, because you don't want to have to make two types for everything you want to represent - the real type, and the real type but also null. Having an Option type lets you compose that behaviour, even if, yes, it's as simple to define as `type Maybe x = Just x | Nothing`.


No, optional really isn't needed in crystal. You can just add `| Nil` to any type wherever you use it. We ever have a short cut in the type syntax of the language `?` to add null to a type union. So in practice at the type layer all you're doing is replacing Maybe Type with Type?. But now you have a strictly more powerful construct which behaves like options in some ways (you can call try on any union with nil because all types including nil implement the try method) but supports flow typing for null checking, removing all of the extra dereferencing syntax for option types.


This is also how Swift implements it support for optional. So far I don’t see a difference in expressive power between the two languages. A user never has to actually type out Optional, when adding ? is enough. Swift also has optional chaining and nil coalescing operators as syntactic sugar. Under the hood there is still an Optional type for the type checker to work with.

I think this is becoming a trend in modern language design. However as this comments section demonstrates it’s hard to understand its benefits or why it’s an important improvement, if your only experience is from C/C++/C#/Java etc


Does swift have flow typing? The expressive power of sum types with nil is only exposed with flow typing. Also I don't think swift supports arbitrary sum types it just has a "fake" sum type syntax that only works with null. In crystal you can have an `Int32 | String` just the same as you can have an Int32?


Swift doesn't have flow typing, unfortunately. It has a few constructs, like `guard let` and `if let`, which let you unwrap optionals in a somewhat nicer way than other languages and are sort of similar to one aspect of flow typing.


I've seen this come up a couple of times in this discussion, so ELI5: What is "type flowing"?


Flow typing is where the type of a variable changes based on control flow.

Consider:

    x : Int32 | String
    if x.is_a? Int32
      # typeof(x) == Int32
    else
      # typeof(x) == String
    end


Thanks!


You mean "union type" where you say "sum type". AFAICT Swift does support sum types.


yeah in crystal we call them union types. We don't have sum types so I'm not really certain on the difference.

I should have stuck to terminology I know.


Swift has sum types, not union types.


This was my initial thought reading this post, that logic has developed quite a bit the past few thousand years, with different logics able to evaluate different types of values (e.g. Kleene's logic). But I feel the author's better point is that different languages evaluate unknown/null/nil/none values differently, or in an illogical way. Like, ideally if a language were to use a null value, I feel that such a value shouldn't ever evaluate to, say, 0 or an empty set, because it's making an unwarranted assumption.


I remember this insight from a paper on SQL's 3VL issues (true/false/null), and I think it somewhat apllies to nulls issues inanguages; its a 3VL logic system that doesn't go all the way. Sometimes it gets forced into 2VL semantics (conditionals), and the conversion is where all the problems lie.

If it stayed true to its word, and fully embraced 3VL, there wouldn't be an issue.


There's one key difference between null as it applies to languages like C and Java versus Option<T>-like renditions of null: null doesn't infect the type-system in the latter.

Type systems can be viewed as a lattice, usually as a bounded lattice. If you look at Java's reference type system (ignore primitive types), there is a top type, aka Object. There is also a bottom type. It doesn't have a name you can spell, but it does have a single value--null. This means that null can satisfy any type, even if the type is impossible to satisfy--and as a result, there is an unsoundness in Java's type system.


Do you still feel the same way if you consider the option type a container type.


Maybe/Option aren't Null. They don't crash when you to perform type-checkingly valid operations, unless your language allows you to write partial functions. And you aren't required to wrap a type in Maybe/Option the way that languages require you to admin Null into every (non-primitive) type.

Obviously "nothingness" exists and needs representation. The problem with Null is that it's not part of the type system but is part of the program, so it undermines the value of having a type system.


I've seen programs that crash given a none. Often explicitly called out with a message saying that a value must be supplied. So, a good thing in those cases.

I'm not optimistic enough to think it would always be a good thing, though. I've actually seen some where the lack of a value was not noticed because people just mapped over the optional and did not code for the missing case. Effectively coercing the value to whatever the zero was.

Now, I can't claim empirically that these would outnumber null pointers. I just also can't claim they don't exist.


Yeah, I have this intuition that options create there own horde of problems because of the way that map/<$> propagate None so that, once you get an unexpected None, it becomes harder to trace the source of the problem. A null pointer exception or segfat, on the other hand, will give you a stack trace or similar that will help you pinpoint the source of the problem.


> Finally, the examples of how Null works in various languages are really poor. "Why doesn't Ruby coerce nil into zero?" Because it doesn't coerce any types implicitly. Why should nil be the exception? How is that expectation "logical"?

That was exactly what I would've guessed with those examples, even without knowing Ruby. Likewise, C# being the one where things "get strange", especially after Javascript? From the example, the C# version acts really close to NaN - a "this is unknown" value that contaminates whatever it touches. I'd call C# and Ruby equally logical, with different intentions, given the examples.


Unrelated to the discussion of null: Ruby the language might not coerce types, but the standard library will happily do things like multiply a string by a number, or parse a malformed date resulting in garbage.


>it's not true to say they don't exist in other languages

Sounds good, waiting for an example to support this...

>It's also wrong to suggest that Null has no place in "logic". Boolean logic is one type of logic, but it's not the only type.

What other types did you have in mind? Given that we work as programmers in a world defined by true/false 1/0 logic, I think you might want to reconsider this blanket dismissal.


You seem intelligent, so I'm going to drop you into the deep end. In category theory, a topos [0][1] (plural topoi) is a structure-place where logic can be done. Boolean logic corresponds to a particular topos. Actually, there are at least two topoi which do Boolean logic; one of them has the law of excluded middle, and another has the axiom of choice. [2]

And there's an infinite other number of topoi to choose from! Topoi can be custom-made to categorical specifications. We can insist that there are three truth values, and then we can use a topos construction to determine what the resulting logical connectives look like. [3]

Finally, there are logical systems which are too weak to have topoi. These are the fragments, things like regular logic [4] or Presburger arithmetic.

To address your second argument, why do we work in a world with Boolean logic? Well, classical computers are Boolean. Why? Because we invented classical computing in a time where Boolean logic was the dominant logic, and it fits together well with information and signal theory, and most importantly because we discovered a not-quite-magical method for cooking rocks in a specific way which creates a highly-compact semiconductor-powered transistor-laden computer.

Computers could be non-Boolean. If you think that the brain is a creative computer, then the brain's model of computation is undeniably physical and non-classical. It's possible, just different.

Oh, and even if Boolean logic is the way of the world, does that really mean that all propositions are true or false? Gödel, Turing, Quine, etc. would have a word with you!

[0] https://en.wikipedia.org/wiki/Topos#Elementary_topoi_(topoi_...

[1] https://ncatlab.org/nlab/show/topos

[2] https://ncatlab.org/nlab/show/two-valued+logic

[3] https://toposblog.org/2018/01/15/some-simple-examples/

[4] https://arxiv.org/abs/1706.00526


There are a lot of three-valued logics that lots of programming languages implicitly follow by accident sometimes. For example:

  true || 3 ==3 || panic()
will, through short-circuiting, evaluate to “true” even though the final element doesn’t evaluate to either true or false.

(As you may observe, they also follow a weird logic where AND and OR are not commutative...)


> No one seems to know why C# behaves the way it does.

Operators are lifted[0] over nullability in C#. Every value type T can be converted implicitly to Nullable<T>. When `10 * null` is typechecked, both sides of * are typed as Nullable<T>. The * operator then acts like the pseudo-Haskell

  (*) <$> 10 <*> null
or, I guess

  liftA2 (*) (Just 10) Nothing
The semantics of Nullable<T> are similar to those of an optional type, with some implicit mapping and lifting. In that context, null acts less like null. (Thanks to convenient conversions.) Note that the following doesn't throw:

  int? value = null;
  value.HasValue // == false
[0]: https://blogs.msdn.microsoft.com/ericlippert/2007/06/27/what...


I came here to write this. What C# does in this case makes sense if you take the time to pick apart what it's actually doing.


If anyone wants to see for themselves what life is without null, TypeScript has an option to type-check for possible null/undefined values with strictNullChecks[0]. It's a game changer. The con: you have to check every variable that could be null/undefined before using it. The pro: you have to check every variable that could be null/undefined before using it :)

[0] https://basarat.gitbooks.io/typescript/docs/options/strictNu...


Our stack is Typescript and Kotlin, and I find handling of null simple and safe, especially in Kotlin there is decent supporting functionality https://kotlinlang.org/docs/reference/null-safety.html

With Java we used @NotNull annotations and IDE support to give warnings. Luckily we are converting our legacy code base to Kotlin.

With conditional type conditions Typescript will have better support than earlier:

    type NonNullable<T> = Diff<T, null | undefined>;  // Remove null and undefined from T
Source and the examples in: https://github.com/Microsoft/TypeScript/pull/21316


Just a followup point: for new TypeScript projects, strongly consider using the “strict” compiler flag. This flag enables strict null checks along with a number of other useful checks / restrictions, such as disallowing variables without known types from being treated as “any”. These checks will catch bugs and improve your code but trying to enable them after the fact can be painful


These laws apply to logical expressions about things that exist. In other words: you can’t apply these laws to the unknown, which also includes the future. This is where we arrive at the edge of the rabbit hole: null represents nothingness/unknownness, so what the hell is it doing in a deterministic system of 1s and 0s?

Computer systems are purely logical, but the applications that we write, if they embrace Nulls, are apparently not. Null is neither true nor false, though it can be coerced through a truthy operation, so it violates Identity and Contradiction. It also violates Excluded Middle for the same reason. So why is it even there?

And suddenly a bunch of type theorists just winced. It is possible to have trivalent logic that's coherent. SQL does this -- and it makes sense to do it in that application.


I really don't like throwing the word "pseudo-intellectual" around because its mostly used to mean "I think I'm smarter than you".

However, this person begins their article poisoning the well, saying,

> I find that the people I talk to about this instantly flip on the condescension switch and try to mansplain this shit to me as if.

Proceeds to speak in that very same condescending tone, presenting a smarter-than-thou trundle down from the mountain. All the while including some nonesense about logic.

This person clearly has a chip on their shoulder: tacking on things about community-regulation in an article about Null, like all technical public engagement authors do. Let's signal "what good behaviour we expect" in the midst of a book pitch and a confused discussion of a technical matter.

One of these goals can be failed in its attempt with good nature. When you smush all this together it seems an exercise in performative intellectualism, or more accurately, pseudo-intellectualism.


You can easily have n-valued logics.

Saying they don't work because Aristotle didn't use them is like throwing out Newtons laws because Aristotle didn't use them.

There have been whole schools of mathematics that reject the law of the excluded middle: https://en.wikipedia.org/wiki/Constructivism_(mathematics)

Newer logics also don't use that law but instead deal with undecidability, which is less than a century old but is the most import result in logic since it's invention.


I think this opinion makes some sense from a pure-functional standpoint, but not from an imperative one. In imperative programming, where you're managing lots of state, it's important to be able to represent the concept of a slot with nothing in it.

That said, I think a large portion of the problems caused by null could have been avoided by making one small change to its behavior: accessing a member (or element) of a null should evaluate to null instead of throwing. This is deeply intuitive and is the biggest cause of null pointer exceptions. If you try to access a.b, and a is null, it makes perfect sense that b is also null. Many languages have recently started adding an "optional chaining" operator that works this way, but a lot of pain could've been avoided if things were this way from the start.


This has nothing to do with functional vs imperative programming, of course functional programs also want to encode the absence of things.

The billion dollar mistake is about not being able to say that a type (specifically a reference/pointer) does not include NULL/nil/whatever as an element.


I don't agree that this is a good behavior. I rarely need that behavior. NPE means that I forgot to handle null. And handling null usually means something other than use null for a result. NPE is hard to miss and stacktrace points exactly to the code that should be fixed. Propagating null will hide the code that should be fixed. I think that NPE behavior is the best solution without changing type system (and null-aware type system is just the best solution).


That's just your personal experience. Here's one from me: I use it all the time. I wish this was how js worked or that chaining operator would become standard soon. Reason is, there are times where you need some deeply nested variable in state, but it may not be already initialized, so you need to check it every time you access it. Gets tiring really fast.


I would extend on this idea. Whenever exception were to be thrown we could result in null instead. We would end up with error-free programs instantly. Good luck looking for root cause of strange results.


Note that I didn't suggest this for method calls, only for property access. Again, it's extremely intuitive that a nothing's property is nothing. Making that logical extension is not sweeping errors under the rug, it's dovetailing them into a format where they can be implicitly handled by logic that already exists. I.e., if(a != null && a.b != null && a.b.c != null) becomes if(a.b.c != null). It doesn't make the problem go away, but it certainly mitigates it.


Fair point. This might be a trade off for instant error reporting / ease of exception handling. I would strongly forbid any coersion of null to other values if this idea was to be implemented.


You need to throw when and b is a method.

Solution to the problem is already there: Kotlin, TypeScript, Rust, Crystal, Swift, Haskell just to name few.

We just need to push the industry toward better and safer languages. We keep building an abstractions on top of unsafe languages and then we are suprised that it suck.


>> Logically-speaking, there is no such thing as Null (...)

Oh but there is.

The author is probably considering only propositional calculi with two truth values, such as Boolean algebra and, er, well, they're probably only considering Boolean algebra because that's what we use in computers, because it maps nicely to 0s and 1s.

However, two-valued logics are by no means the only possible logics, neither are they the only ones that have actually been described in the framework of mathematical logic. Probably the most well-known many-valued logics are Łukasiewicz's and Kleene's that have three truth values (i.e. values assigned to literals): true, false ...and unknown.

... which is to say, "null".

And just to blow your mind, there are also infinite-valued logics, like fuzzy logic and, I'd argue, the good old probability calculus of the reverend Bayes, which is nothing if not a many-valued logic.

The mistake is, I think, that the author is taking "logic" to mean Aristotelian logic, however that is not at all the logic we use in computers. Like I say above, computers use Boolean algebra which is an entirely different formal system with its own axioms, separate to grandpa Aristotle's own. For instance- Aristotle never said anything about functions, mappings from the set of literals to {0,1}, neither did he formally define the algebra of the Boolean operators AND, OR and NOT (a.k.a. conjunction, disjunction and negation). Although you can project Aristotelian logic onto Boolean logic, they are far from the same and I would really struggle to see how one would implement a programmable computer using Aristotelian logic.

P.S. Am I mansplaining now? Wouldn't that be a little ...weird?


>> Kleene's that have three truth values (i.e. values assigned to literals): true, false ...and unknown. ... which is to say, "null".

Regardless of what logics exist the statement you make - that "unknown" is "null" - is actually wrong and the heart of the problem. "Unknown" is only one semantic interpretation of null; there are many others, and problems arise in software development because of those (sometimes slight) differences in the interpretation of what "null" should reflect in the real world.

Some sources suggest as many as 129 possible semantic meanings for "null", which is why Date rails against "null" values in SQL. With Option/Maybe we're constrained to the universe of values of that type plus exactly one more value (None); with null who knows how many of the 129 possible meanings of the null value we are dealing with in addition to the base type? Null is computationally and mentally expensive to deal with.


The same is true for true and false. What meaning gets attached to Boolean values is immaterial to how they act.

That null is implemented oddly doesn't negate the fact it's a meaningful and well defined mathematical construct.

I don't see people saying false doesn't exist because forth implements it as -1.


This.

Edit: removed unnecessary blablah.


But how many of those 129 possible meanings for null are possible meanings for None in an Option type? And if you're going to say "only one", I'd suggest that you go through the list of 129 possible meanings for null very carefully to be sure...


The functional paradigm answer to this is to have an Option<T> type, or Result<TSuccess, TFailure>, which allow you to flexibly account for the “null case” while still having total functions (https://en.m.wikipedia.org/wiki/Total_functional_programming).

The only catch, is now you have to unwrap your function results somehow (functional languages provide ways to do that, like the scary monad).

I like this website, which demonstrates the concept: https://fsharpforfunandprofit.com/rop/


Option monads have the same semantics as C# null described in the article; they don't apply the function if the value isn't present, you get another non-present value back out.

Result<,> has the same semantics as checked exceptions, BTW. The conversion between the two representations of code is mechanical.

Checked exceptions are a poor idea the more dynamically bound your language is, and are generally anti-abstraction in any case (failure modes are implementation specific, i.e. non-conformant with an information hiding interface). An error result is only useful if you can make a decision based on the specific value; that's not the case for almost all sources of error in most user (i.e. non-system) programs, where complete coverage of error cases with specific handlers is outside their design parameters, and termination (of program or request or whatever) is preferable.


It’s about the ability to reason about the code. Option types force matching expressions whenever you encounter the wrapped value, forcing the developer to handle the failure case.

Equivalent C# code is to add checks for nulls all over the place, aka “defensive coding”. C# has an Option type sort of with nullable types, but they are for value types only, and developers can still reach in and just grab the value, eliminating the safety that option types provide.


I don't mean to defend Null.

> Null is neither true nor false, though it can be coerced through a truthy operation, so it violates Identity and Contradiction. It also violates Excluded Middle for the same reason.

Intuitionistic logic doesn't have the law of excluded middle and is the form of logic underlying the simply-typed lambda calculus.

There is a more nuanced relationship between logic and programming languages than is being discussed here.


> Aristotle was the only logician and if you point me towards Buddhist or Daoist philosophy, you're mansplaining!

OP should check out trinary logic and first-order predicate calculus. Just because OP doesn't appreciate "null" doesn't mean that it should be removed from programming languages or from existing programs.... false means not true, zero means no things, null means absence.


OP here - OP has checked out (and lived with) trinary logic. Just because you appreciate null doesn't mean it should be kept in programming languages and existing programs.

That there is a logical truism, isn't this fun?

Also: false means not true, zero is a number and null doesn't exist, by definition. We can model true/false/zero easily as they exist. Null is made up, so every language gets to think about what it means in an abstract made up way. Thus the pain, thus the post.


The last 100 years have been dealing with the fact that true and false are not enough to describe the world of computation. You also need undecidable, or null, for statements that can be proved to never terminate.

It is very much still an open question if a statement is absolutely undecidable and if we need to add something like null in all logic [0].

As for your arguments on why we don't need null, they sound exactly like the arguments against zero from the middle ages [1].

>Just as the rag doll wanted to be an eagle, the donkey a lion and the monkey a queen, the zero put on airs and pretended to be a digit.

[0] http://logic.harvard.edu/koellner/QAU_reprint.pdf

[1] Menninger, Karl (1969), Number Words and Number Symbols. Cambridge, Mass.: The M.I.T. Press.


Maybe instead you should check out the languages that make life easier without the null abomination?


As a programmer with assembly language in my blood, all I can think is "Why should I care that this silly person thinks 'zero' doesn't exist?"


It's not that 'zero' doesn't exist.

It's that in assembly Null's just a zero.


I like the fact that the author says: "don't be complacent and mansplain your feelings about null", but that is exactly what he does...

People claiming the industry is wrong and that we should use some logically pure language are usually impractical and deny the compromise that must be made for systems to be usable by the masses.


A language does not need to be logically pure to avoid null. Having null in a modern language is simply a boneheaded mistake. There are no compromises that need to be made to get rid of it. Optional types are superior in every way.


How are they superior?


C++ would be a lot better off if references were used more heavily. (And C ought to have references by now.) One still sees too many asterisks in C++ code. References are not supposed to be null. Sometimes they are, though, because there's no checking when a pointer is converted to a reference.

"this" should have been a reference, not a pointer. Strostrup admits that was a mistake.


technically references can't be null ever. To create a null reference you would have to deference a null pointer and then you are into nasal demons territory.

/pendantic


Pointers are easier to read than non-const references though (since they're more explicit).


How is it possibly mansplaining to respond to this with an opinion or alternate explanation? Also... isn't the other's gender fairly important to the mansplaining comment since the term references the phenomenon of women being explained things they already know?


When a man is condescended to by other men we call that "communication", when a woman is its called "mansplaining".


C# has null because all new memory is initialized as all zeros. The constraint is that any object or struct needs to be valid if it's in-memory content is all zero. Thus, null must be supported for object references because the pointer has to support zero.

Value types can work without null because 0 is valid.

Could the language designers implement object types without null? It would be very difficult for fields, especially in structs, because it's nearly impossible to force memory initialization. You couldn't do "default (StructType)" if it had a not null object as a field.


It would be nice if C# simply had a nullable type for reference types. So like int? you could also have string? or SomeClass?. Structs with non-nullable reference types would have to be initialized at creation time. This would solve a lot of errors. But you still need to have the option to have nullable references.


Clearly there's a place for null: most languages have it, most developers have no problem with it, it does represent something (lack of information). It is logical and mathematical--I suggest you read the GEB book for tons of more info on that.

That said, people are _also_ free to experiment with null-free languages, or to avoid null. If it works for their use case, fantastic!

Just the attitude of the post is terrible. For example the user creates a sock puppet and posts a trolling/edgy question to Stack Overflow... just to see if it's toxic.

Sorry, these posts are toxic, in my opinion.


Yeah, it was super weird how the subject migrated from programming theory to a rant about stack overflow.

On the actual topic, I think a good compromise is what typescript and I think rust do, which is, a value is only nullable if you explicitly declare it as such. It forces you to be judicious, while still allowing you the option when you need it.


I've been trying that, but so far I don't find a big advantage, but it _is_ a bit annoying to have to annotate all types that could be null.

This is to say: it could be good, or it could be another "exceptions in Java" moment


It's about leaky abstractions.

If programming is about managing complexity and status is ideally specified explicitly, then null is simply the option in a mathematical set defining that status which corresponds to the option commonly seen on surveys: Other (please specify): ...

While it can be handy to have this 'extra value' in a set, and it is most commonly used to denote special meanings within a carefully controlled context (eg. SQL column in a result set is empty, a variable is of no type - ie. not defined at all, etc.), issues arise when people accidentally carry context or presumptions about the meaning of null across contexts, creating a leaky abstraction.

Most of the author's article appears to deal with differences in these context-specific assumptions.

Perhaps the Java approach: throw an exception.

Unix approach? Nonzero return values and arbitrary stream or file data to clarify. In edge-case leakiness, very similar to the Java approach.

The functional and dedicated non-OO procedural programming approach: define exit parameters to your function, specifying complete precision and ending any ambiguity.

Since a type is a formal context (set), then using a typed language is another solution, although that adds overhead it brings benefits in rigor.

Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function. - John Carmack


Is "null" really the problem, or is not knowing the state of something the problem? I agree that it would be better for the world if we always knew what a "thing" was or referenced, but more often than not, we don't. Having a construct that expresses this uncertainty nicely models our own uncertainty.


I always found this[0] analogy quite accurate whenever I discussed this issue with others.

Setting aside how different languages (mis)treat null as a concept, I think the whole discussion is about convoluting boolean logic with memory addresses and some languages do a better/worse job at being 'intuitive' than others.

I had great success with this[1] whenever I felt things were going in the wrong direction or when the language constructs of dealing with nulls were in the way of the domain design.

[0] https://www.b4x.com/android/forum/attachments/unbenannt-jpg....

[1] https://en.wikipedia.org/wiki/Null_object_pattern


C# has (a subset of) SQL null semantics because of LINQ. That's the answer to his specific question.

Null values propagating through expressions SQL style is an approach to handling unknown values. Whether it's desirable is somewhat besides the point; databases have null values, and LINQ enables capturing expression trees that get converted into SQL, so keeping similar semantics makes a kind of logical sense.

Nulls are undesirable in a language until you want to initialise large structures in a simple language with a simple compiler. Staying away from the temptation then takes ingenuity and discipline.


Was the behavior of null with integers really changed when LINQ was added? What would his examples have done in C# 1.0?


In an integer column in a relational database, what is the alternative of having null?

0 is not an option, as that will return incorrect mean/median calculations. So what would be there? Or am I misunderstanding?


If there is missing data to occur within a column, then pragmatically, you split that column off into its own table with the associated key from the original and store only those keys that have valid values.

If you do anything else, such as using nulls, then you are just creating a problem that WILL come back and bite you in the nether regions of your psyche.

Of course, there will other opinions about how to handle this. As far as I am concerned, Nulls are a curse foisted on us by those vendors and standards bodies who took the the easy way out.


> you split that column off into its own table with the associated key from the original and store only those keys that have valid values.

That will create real problems and bite you immediately.


Demonstrate that it will bite you immediately. This technique has ensured that only valid data has been kept and that reporting works.

When nulls are stored there are no guarantees that anything you ask of the database will ever turn out right. Especially when there are millions of records stored. Two queries that should give you the same answer give different results when nulls exist. Seen it too often.


Table separation leads to record fragmentation and big IO cost on record fetching. Your ideal academic world will be crashed under production reality. "Millions of records". Bwa-ha-ha.


If your DBMS is so poorly written that record fragmentation is an issue then you need to change the DBMS. Since most (>99%) of the database design and implementation work that I have been involved since the mid-80's was business related and for a variety of different industries, I didn't find table separation to be a problem. The appropriate designs led to faster applications.

I have also worked for companies that didn't use relational database theory for their products and they had far more issues. In a couple, I was able to hive off the database designs from the main systems and got the applications to actually work and work properly.


> 0 is not an option, as that will return incorrect mean/median calculations. So what would be there? Or am I misunderstanding?

Ideally, your database system would support tagged unions, and have a strong type checker. Then, you may have a column of type 'Maybe Int'. If you want to take the median, the median function would be of type 'Agg Int Int' (or something like that), so you wouldn't be able to use this column directly as an input. Instead, you'd have to only supply rows whose values have been checked to be 'Just'.

So

    CREATE TABLE MyTable ( group TEXT, myColumn MAYBE INT )
    SELECT group, MEDIAN(myColumnValue) FROM MyTable WHERE Just MyColumnValue = myColumn GROUP BY group


Use an "Optional<integer>" column instead. Doesn't you RDBMS have the notion of "Optional"? There's the disaster!


Truly relational systems support relation-valued attributes and those are the perfect means to support Optional<anything> in a database.


What does a "relation-valued attribute" have in e.g. the "birthdate" column when we don't know the customer's birthdate? Is it... null?


IIUC, the parent is suggesting that you have a user table with a birthdate column which is itself a table with either no rows or one row.


I suspect you've done a bit of steelmanning here. Thanks though, as parent had been unable to come up with this short explanation despite writing a dozen vague-but-vitriolic posts. Perhaps he'll take note for next time he crawls out from under his rock.

I wonder, if one were to query the DB, what would show up in the "birthdate table" column? I'm kind of hoping it would be zero, which would in turn be interpreted as the epoch, but sadly it would probably be null...


> I suspect you've done a bit of steelmanning here.

Not deliberately, perhaps reflexively. It just seemed the most straightforward (and interesting) interpretation of the comment in question. I hadn't seen the other comments.

> I wonder, if one were to query the DB, what would show up in the "birthdate table" column?

As I understand it, what would show up in the "birthdate table" column is a table. You would have to do a select on that table to get actual data out, and in that select you'd get either zero or one row returned.


Haskell have the Maybe monad, which is defined as Just a | Nothing. One may argue then that languages with nullable integers implicitly define this type as Maybe Int.


That's the Maybe type. The Maybe monad is this:

  instance Monad Maybe where
    return = Just
    Nothing >>= _ = Nothing
    Just x >>= f = f x
(And is not really related to the discussion)


Right! that definition was the type, thanks for the correction :) still, the Maybe monad is used for computations that may fail, which is often the purpose of the null pointer.


Well, that just pushes the goalposts of the complaint: now Haskell has Int, and those other languages do not.


I meant that in some way one could think of, for instance, C's int as a sum type of just integer and null :) in practice the obvious difference is that a language like Haskell enforce considering both options with its type checker.


No, you can't think of C's int that way - it doesn't have a null.


You are completely right, not sure where my head was :P


> This leads to an inconsistency: if to_i will turn nil into a 0, why won’t that coercion happen when trying to multiply?

Because Ruby doesn't automatically coerce arbitrary types to numbers when you try to perform arithmetic on them. This is a Good Thing.

    2 * "4"
> TypeError

    2 * "4".to_i
> 8

Interestingly, it is possible to multiply a string by a number, but no coercion takes place.

    "2" * 4
> "2222"


OP here - Yes that's the operation the question wasn't supposed to be a literal one, rather a consistency issue, which illustrates the larger point that different languages deal with null differently because it's not logical and therefore confusing and a pain in the ass :)


Languages are definitely not consistent in what they do if you try to coerce null to various types, nor are they consistent (often even internally) about whether they perform coercions implicitly. The Ruby example seemed to deal mostly with the latter.

I think implicit coercion is bad, but that's a separate issue from null.


I have no clue why coding languages need null. I mean, they're deterministic, right? If something is missing, undefined or whatever, that's an error. What does the null concept gain you?

In data, on the other hand, null is a very useful concept. It explicitly means that there is no value in a field. And that prevents such mischief as interpreting missing data as zeros.


In a language like C, NULL solves problems that are otherwise awkward to solve.

1. What should `malloc()` return on failure?

2. How should I indicate that I don't care about certain out parameters, e.g. the parameter to `time()`?

3. How should I initialize variables that are going to be used as out parameters, e.g. in `strtol()`?

Other languages address these at the cost of significantly complicating their type system and ABI: generics, multiple return values, etc. In a language intended to be small like C, I'm not sure how you can do better.


This point of failure is exactly why exceptions were invented in C++, with low overhead for hot paths.

Initialization for out parameters does not matter. In C++ you'd return a tuple or structure or class instead of having multiple return values.

It does complicate ABI some.


Thanks. But ...

1. Why not just return meaningful values? Wouldn't that help in debugging?

2) ???

3) That makes sense to me.


C functions can only return one value. Since the stdlib generally adheres to the idea of minimizing overhead, that would mean malloc would have to accept a mutable reference to a reference, so it could assign the reference it returns. Since null references are unusable, using that impossible value as your error flag is a pretty clever optimization.


Thanks, I see that.

But isn't ambiguity the tradeoff?


But code is data?


Yes, it's data. But when it's used, it processes data.


Null/nil tends to represent the absence of a thing. That's perfectly logical. Things can be absent.


Kotlin solves this problem nicely, assuming your codebase is all new or has been annotated with @Nullable/@NotNull

https://kotlinlang.org/docs/reference/null-safety.html


It doesn’t seem that nice to me compared to making it impossible to assign nulls everywhere, without the need of annotating anything.


Surprised there's no mention of Kotlin here. It handles this really nicely, forcing you to declare if a value can be null and if it is null it requires a null check before it can be used (with some syntax sugar to make that cleaner)


Even if you eradicate Null, programmers will create a singleton object to represent it.


Yeah, just like they invented Unit/Void singleton in functional languages.


Unknowns are not even remotely problematic for logic. Not only SQL, but illustrious languages such as VB6, include built-in trivalent logic.


Yeah and the results are so massively intuitive.


I suppose this is sarcasm but the results are quite intuitive to me. From my perspective, it's trivial to construct the truth tables for trivalent logic based on a sense of what "should" be in them. Maybe give it a try.


Well guess what. Your "sense" of what "should" be in them has no place in computer SCIENCE.

Computer SCIENCE is that ugly thing that tells us that three-valued logic gives rise to 19683 distinct binary logical operators, while two-valued has 16.

Computer SCIENCE is that ugly thing that tells us that if you want a computer language over a three-valued logic to be expressively complete, then you need to implement all of those 19683 logical binary operators one way or another. In the worst case, that's 19683 operator names for the programmer to remember. And you come here claiming that it's "trivial" because you have a "sense" of what the results ought to be ? That proves just one thing but site policy probably won't allow me to spell that out.

(In case you were wondering what the 16 names are in two-valued logic : they aren't needed because the system being two-valued gives rise to certain symmetries that gracefully allow us to reduce the set we need to remember to just {AND OR NOT} (or some such) which beautifully parallels the way we communicate in everyday life.)


Ah, you seem to be a bit nutters. So I'm not going to engage further but I do want to give you a serious response to the argument you seem to be making.

First, the same argument above is also an argument that, say, integers, are not "Computer SCIENCE".

More to the point, you might enjoy reading the work of Charles Pierce and other logicians of that era who began to explore many variations on formal logic. Note that just as many operations arise from trinary relations in bivalent logic. Are binary relations "Computer SCIENCE", but not trinary or higher relations? Before you answer, you might want to look into whether all possible relations can be expressed using only binary relations (hint: nope).

Look deeper into the concept of functional completeness (with respect to a subset of operators), which you reference above without naming. You might be able to understand how many of those many trivalent operators are actually necessary to reason with (hint: not very many, hardly more than for bivalent logic, where, as you note, we only tend to use a few, and need not worry about it).

Consider also the relationship between operators folks have identified as useful in bivalent vs trivalent logic (hint: they not picking at random).

Could it be that just as with the 16 binary operators, many of which have relations to one another (e.g. inverses and complements, among others) that the trinary operators could fall into similar groups, which, making the 3^9 number you mentioned seem a whole lot less complex? Could that be why it's neither necessary nor customary to work with all the operators in either sort of logic?

Once you've caught up to state of the art in formal logic as of the 1930's you might have a new perspective -- perhaps you might even begin to let us know when "Computer SCIENCE" will catch up!


If you wanted that to be a real argument then you should have done the maths yourself. Without those maths you are doing nothing but gratuitious handwaving. I've done them for you and they prove me right (and you wrong at least where you say "hardly more than for bivalent logic").

Oh, and if you want to know why people don't want to find more "useful" operators than what they're used to from good old two-valued logic then I have a hint for you too : it's because they all immediately sense that their brains are not up to it as soon as they actually try (and my actually doing the maths has very clearly shown me why - so as you suggested to me "perhaps give it a try").


Clearly you are a theorist and not a practitioner. Dealing with the real world is far more complex than simply a YES or NO. Take a look at something called 9 valued logic - it will blow your mind if you are unable to handle 3 valued logic.

Don’t get me wrong - theory is important. The models that you develop though are only an approximation of the real world.


Well FWIW the real theorists would be utterly offended if they knew I was being put in the same league as they.

The way I see it there are practitioners who care about theory (and take the bother to try and understand some of it, and even more so the consequences on their practical tasks) and there are those who don't.

(Aside : the world of fact-oriented modeling and especially FCO-IM - that's communication oriented modeling - has this stance that our databases are actually not modeled according to how the world is, they are modeled after how our comunications about the world are. An interesting distinction you might want to ponder a bit more deeply.)


  > Logically speaking, there is no such thing as Null
Yes, there is:

  "Did you pass your test?"
  "I haven't taken it yet."
  "Okay, will you pass it?"
  "Null."
Suppose there was a database table of students, with a column called "passed," which is boolean. If it's in the middle of the semester, that column must be null. True means they passed. False is put there when they fail.

---

Another example, in a table of help tickets, suppose there is a timestamp field called "closed," for when the ticket was closed. If the ticket is open, then that column must be null.

---

  "Which show is currently playing on channel 3?"
  "It's just static. It's midnight, and channel 3 has stopped broadcasting."
(This of course must have been back in the '80s.)

If the TV guide were a database table, with time slots for each channel, then the columns for which show is scheduled for midnight on channel 3 must be null.

In fact, that's how I picture null: ever-shifting static inside the little cell in my database table. That's why I'm fine with how if you ask if two null values equal, the answer is null (at least in Postgres).

Someone may say that there is a show currently on channel 3, and the name of the show is "Static." But that's not quite true. If channel 3 recorded static and broadcast it, then yes. But channel 3 has literally turned off its tower. It is broadcasting nothing. Your TV is showing static because you tuned to a station that is not there.

---

The writer complains about inconsistent implementations of null, like in JavaScript. I agree that there are misimplementations. I would not have had both null and undefined.

In fact, that's another way of understanding what null is. Suppose you say my earlier examples with SQL are flawed, because SQL is flawed, because it insists on every row having columns it doesn't need, that the field for "passed" should even exist in that row until the class is over. Say instead you were using objects. Within the semester it may look like this:

  {
     name: "Edgar Smith"
  }
Then at the end you add:

  {
     name: "Edgar Smith",
     passed: true
  }
Well, what if you asked for the value of student.passed in the middle of the semester? You might say it is a mistake, but it is a question that can be grammatically formed: "What is the value of the 'passed' key in the object?" The only answer is null. To me null often means "not applicable."

---

Again, another example. In school we played a game of guess-the-object, where you could ask only yes-or-no questions. There were three possible answers: yes, no, or "does not compute." Clearly the answer of "does not compute" was needed for when the answer was neither yes or no. "Does not compute" was a fancy synonym for "null."

---

Null is the answer to the infamous loaded question, "When did you stop beating your wife?"

---

Null is quantum foam. It is what is happening at the base of the universe, if you look close enough. Is the particle here or there? Null.

You may say that doesn't make sense, that it's just that we don't have the right instruments. But that is Quantum Mechanics. If you wish to disprove Quantum Mechanics, Einstein and I would both be interested, as he was uneasy with it too. But going after quantum mechanics would be a better use of your time than going after null. Right now, null and the scientific consensus about physical reality agree.


I've been coding since around 1981. What changed my coding style most, is Option when I've started Scala 10y ago.


Scala is a very confused language and specially about the use of Options vs null. I find that Rust and Swift enforce Options a lot better.


This was a decent post until he went on the stack overflow rant.


Someone needs to write an essay on the productivity disaster of "considered harmful" essays.

We traded goto for callback hell and ten-layer inheritance hierarchies. A null-purge would probably end similarly.


What I'd like to tell the author.

Don't like null? Don't use it.. and you find null propagation ruining your abstraction then treat it as an error and fix your program.


It actually turns out that there are better ways of solving problems with null that don't involve just telling programmers to go fix their programs. There are alternatives, like encoding whether a value can be NULL (or some semantic equivalent) into the type system. Rust and Haskell are examples of languages where this is the only way to do things, and C# and TypeScript are examples of languages where you can selectively (and under certain condutions) make distinctions between values which can be null and which cannot be null.


> Rust and Haskell are examples of languages where this is the only way to do things,

You know Rust has null too, right? https://doc.rust-lang.org/std/ptr/fn.null.html

Rust has both references (which can't be null†) and pointers (which can be null, but can only be dereferenced within "unsafe" blocks or functions).

† Actually, they sort of can: an Option<&T> takes the same space as a &T, with the None variant of the Option being represented as a null behind the covers.


Read the statement as "the only typical way to do things" or "the only safe way to do things" or "the only way to do things without using a completely different type which supports NULL". The fact that Rust has unsafe non-GC'd pointers which can be NULL or otherwise invalid puts it in the same camp as Haskell, Go, C#, etc. but in Rust you are more likely to use references which can't be NULL.


> Actually, they sort of can: an Option<&T> takes the same space as a &T, with the None variant of the Option being represented as a null behind the covers.

There is no “sort of can” here. It just so happens that None uses the same representation as null, but by that logic you’d say that u64 can be null because 0 has the same representation as null.


Perhaps possible in isolation or on a small team. I don't think this is practically achievable on teams that have grown past a certain size, though.

To paraphrase Carmack, "Any syntactically valid code, that the compiler will accept, will eventually make it into your code base." [1]

[1] https://www.youtube.com/watch?v=Uooh0Y9fC_M


Not if you review your code reasonably well and use static analysis plus some testing, making it a process to not get bad code into your codebase.


At some point one will include external libraries, written with different styles and standards. And then will modify the libraries, making them effectively part of the core code base.


OP here - as a matter of fact I try to do just this, starting with the database. I'm mostly a data person so I try to think through, as deeply as I can, what I should expect in every table - there has to be a sensible default and if I can't find one then I rethink my design. You'll probably disagree with me and grunt out another single sentence missive, which is fine, but I think it's worth taking some extra time and using Null as a bit of a warning. It's a crutch! A way to stop thinking and say "whatever I don't know what this value is supposed to be so... it's null. Let's go shopping!"


Frankly, the idea that there must be sensible default worries me.

Take a database of people. There is literally no sensible default for name, age, gender, height, weight, social security...

If you’re amazon, your products have no sensible default for manufacturer, shipping weight/size, delivery address...

In fact, for just about any real-world data, there simply is no sensible default for anything at all. Most “sensible” defaults will eventually bite you in the arse. The only sane way to keep nulls from your DB is to refuse inserting incomplete data in the first place, and propagate the error to the user. Heavens save your team if you’re dealing with batch data and insist on not allowing nulls in the DB, though.

You can sweep this mess under a rug and pretend you have no nulls by turning things into relations that are allowed to be empty — “there are no delivery_address rows for this user” — but that’s a null in sheep’s clothing. Either your application knows how to deal with the query coming up empty, or it doesn’t.


What do you use in a database when you have a field where you literally do not know what the value should be?


If you have a PEOPLE table and some birthdates are unknown, then remove the "birthdate" column and make another table called PEOPLE_BIRTHDATES with a "birthdate" column and a foreign key pointing to PEOPLE. Now your queries can have lots of left joins. The results will still have nulls, however.


Which is the reason why you shouldn't write outer joins.


So if we don't know the customer's birthdate we can't serve her? I can imagine a problem with that...


Sigh.

Where have I said any such thing ?


If there's no row for the customer in the joined table, the customer won't show up in an inner join.


Great. Now if you can explain to me where you got the idea that a join (inner or otherwise) is the only possible way to query two tables then we might get somewhere. Because you can also just do two queries. And no, that does not necessarily mean "two roundtrips to the DBMS" (which I know perfectly well is undesirable). There are techniques for avoiding that. Perhaps not in SQL, but that's a reason you should be pressing the vendors to improve SQL. Not for you to agree to the status quo of sticking with the vendors' old bypasses-and-hacks cheating bag.


Haha OK then use a UNION... oh wait we're gonna have NULLs with that too. One suspects you'll also have some vague objection to this point, but if the only way to address that is to wait on somebody to invent an "improved" SQL, one won't worry about it too much.


That "improved SQL" was already defined in the previous century, and has been implemented as well. Your ignorance drips off of every word you write.


E.F. Codd literally designed null into the relational model. I don't think calling someone ignorant is very helpful.


But the demeaning ridicule that gets thrown at me is ?

(BTW I doubt very much that "Codd designed null into the RM". Even his 12 rules mention only "a systemic way to deal with missing information", not "null".)


> What do you use in a database when you have a field where you literally do not know what the value should be?

You don't.

If a value may not be present for an entity, it's not an attribute of the entity in question, it's an attribute of another entity that has a (0..1):1 relationship to the entity in question.

Normalization eliminates NULL.


That's great. Now I do a query. Maybe I use a join. If a row has the "0" case of that (0..1):1 relationship, what do I get?

Or maybe I don't do a join. Maybe I do a separate query. If the query comes back with zero rows, then I... what?


What do I get ? You get what you ask for.

Then I ... what ? Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.


> What do I get ? You get what you ask for.

In the join case, don't I get a NULL in the row that comes back if there isn't an entry in the other table? Or do I just not get a row?

> Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.

Sure, but how do I represent that condition in my software? With a different class/structure? With a flag that indicates that the other field isn't valid? Or with a null?

From where I sit, normalization doesn't make the problem go away at all.


So, no use any OS, library or code-base on top of C, C++, Java, .NET, Javascript, Sql...???




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: